From: Yicong Yang <[email protected]>
If user doesn't specify the CPUs, perf will try to open events on CPUs
of the PMU which is initialized from the PMU's "cpumask" or "cpus" sysfs
attributes if provided. But we doesn't check whether the CPUs provided
by the PMU are all online. So we may open events on offline CPUs if PMU
driver provide offline CPUs and then we'll be rejected by the kernel:
[root@localhost yang]# echo 0 > /sys/devices/system/cpu/cpu0/online
[root@localhost yang]# ./perf_static stat -e armv8_pmuv3_0/cycles/ --timeout 100
Error:
The sys_perf_event_open() syscall returned with 19 (No such device) for event (cpu-clock).
/bin/dmesg | grep -i perf may provide additional information.
This patchset tries to avoid this case by:
- Double check the PMU's cpumask in the perf tool and only include online CPUs
- Trying to make the PMU drivers only export online CPUs in its "cpus" or "cpumask"
attributes
Previously discussion can be found at [1]. Will suggested to do it in userspace.
I think it makes sense to do a double check in the perf tool in case the driver
doesn't do this. So PATCH 1/3 is added in this version.
[1] https://lore.kernel.org/linux-arm-kernel/[email protected]/
Yicong Yang (3):
perf pmu: Limit PMU cpumask to online CPUs
perf: arm_pmu: Only show online CPUs in device's "cpus" attribute
perf: arm_spe: Only show online CPUs in device's "cpumask" attribute
drivers/perf/arm_pmu.c | 24 +++++++++++++++++++++++-
drivers/perf/arm_spe_pmu.c | 22 +++++++++++++++++++++-
tools/perf/util/pmu.c | 13 +++++++++++--
3 files changed, 55 insertions(+), 4 deletions(-)
--
2.24.0
From: Yicong Yang <[email protected]>
When there're CPUs offline after system booting, perf will failed:
[root@localhost ~]# /home/yang/perf record -e arm_spe_0//
Error:
The sys_perf_event_open() syscall returned with 19 (No such device) for event (arm_spe_0//).
/bin/dmesg | grep -i perf may provide additional information.
This is due to PMU's "cpumask" is not updated and still contains offline
CPUs and perf will try to open perf event on the offlined CPUs.
Make "cpumask" attribute only shows online CPUs and introduced a new
"supported_cpus" where users can get the range of the CPUs this
PMU supported monitoring.
Signed-off-by: Yicong Yang <[email protected]>
---
drivers/perf/arm_spe_pmu.c | 22 +++++++++++++++++++++-
1 file changed, 21 insertions(+), 1 deletion(-)
diff --git a/drivers/perf/arm_spe_pmu.c b/drivers/perf/arm_spe_pmu.c
index 9100d82bfabc..2182f214c587 100644
--- a/drivers/perf/arm_spe_pmu.c
+++ b/drivers/perf/arm_spe_pmu.c
@@ -254,13 +254,33 @@ static ssize_t cpumask_show(struct device *dev,
struct device_attribute *attr, char *buf)
{
struct arm_spe_pmu *spe_pmu = dev_get_drvdata(dev);
+ cpumask_var_t mask;
+ ssize_t n;
- return cpumap_print_to_pagebuf(true, buf, &spe_pmu->supported_cpus);
+ /* If allocation failed then show the supported_cpus */
+ if (!alloc_cpumask_var(&mask, GFP_KERNEL))
+ return cpumap_print_to_pagebuf(true, buf, &spe_pmu->supported_cpus);
+
+ cpumask_and(mask, &spe_pmu->supported_cpus, cpu_online_mask);
+ n = cpumap_print_to_pagebuf(true, buf, mask);
+ free_cpumask_var(mask);
+
+ return n;
}
static DEVICE_ATTR_RO(cpumask);
+static ssize_t supported_cpus_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ struct arm_spe_pmu *spe_pmu = dev_get_drvdata(dev);
+
+ return cpumap_print_to_pagebuf(true, buf, &spe_pmu->supported_cpus);
+}
+static DEVICE_ATTR_RO(supported_cpus);
+
static struct attribute *arm_spe_pmu_attrs[] = {
&dev_attr_cpumask.attr,
+ &dev_attr_supported_cpus.attr,
NULL,
};
--
2.24.0
From: Yicong Yang <[email protected]>
When there're CPUs offline after system booting, perf will failed:
[root@localhost ~]# /home/yang/perf stat -a -e armv8_pmuv3_0/cycles/
Error:
The sys_perf_event_open() syscall returned with 19 (No such device) for event (cpu-clock).
/bin/dmesg | grep -i perf may provide additional information.
This is due to PMU's "cpus" is not updated and still contains offline
CPUs and perf will try to open perf event on the offlined CPUs.
Make "cpus" attribute only shows online CPUs and introduced a new
"supported_cpus" where users can get the range of the CPUs this
PMU supported monitoring.
Signed-off-by: Yicong Yang <[email protected]>
---
drivers/perf/arm_pmu.c | 24 +++++++++++++++++++++++-
1 file changed, 23 insertions(+), 1 deletion(-)
diff --git a/drivers/perf/arm_pmu.c b/drivers/perf/arm_pmu.c
index 8458fe2cebb4..acbb0e1d0414 100644
--- a/drivers/perf/arm_pmu.c
+++ b/drivers/perf/arm_pmu.c
@@ -558,13 +558,35 @@ static ssize_t cpus_show(struct device *dev,
struct device_attribute *attr, char *buf)
{
struct arm_pmu *armpmu = to_arm_pmu(dev_get_drvdata(dev));
- return cpumap_print_to_pagebuf(true, buf, &armpmu->supported_cpus);
+ cpumask_var_t mask;
+ ssize_t n;
+
+ /* If allocation failed then show the supported_cpus */
+ if (!alloc_cpumask_var(&mask, GFP_KERNEL))
+ return cpumap_print_to_pagebuf(true, buf, &armpmu->supported_cpus);
+
+ cpumask_and(mask, &armpmu->supported_cpus, cpu_online_mask);
+ n = cpumap_print_to_pagebuf(true, buf, mask);
+ free_cpumask_var(mask);
+
+ return n;
}
static DEVICE_ATTR_RO(cpus);
+static ssize_t supported_cpus_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ struct arm_pmu *armpmu = to_arm_pmu(dev_get_drvdata(dev));
+
+ return cpumap_print_to_pagebuf(true, buf, &armpmu->supported_cpus);
+}
+
+static DEVICE_ATTR_RO(supported_cpus);
+
static struct attribute *armpmu_common_attrs[] = {
&dev_attr_cpus.attr,
+ &dev_attr_supported_cpus.attr,
NULL,
};
--
2.24.0
From: Yicong Yang <[email protected]>
We'll initialize the PMU's cpumask from "cpumask" or "cpus" sysfs
attributes if provided by the driver without checking the CPUs
are online or not. In such case that CPUs provided by the driver
contains the offline CPUs, we'll try to open event on the offline
CPUs and then rejected by the kernel:
[root@localhost yang]# echo 0 > /sys/devices/system/cpu/cpu0/online
[root@localhost yang]# ./perf_static stat -e armv8_pmuv3_0/cycles/ --timeout 100
Error:
The sys_perf_event_open() syscall returned with 19 (No such device) for event (cpu-clock).
/bin/dmesg | grep -i perf may provide additional information.
So it's better to do a double check in the userspace and only include
the online CPUs from "cpumask" or "cpus" to avoid opening events on
offline CPUs.
Signed-off-by: Yicong Yang <[email protected]>
---
tools/perf/util/pmu.c | 13 +++++++++++--
1 file changed, 11 insertions(+), 2 deletions(-)
diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c
index 888ce9912275..51e8d10ee28b 100644
--- a/tools/perf/util/pmu.c
+++ b/tools/perf/util/pmu.c
@@ -771,8 +771,17 @@ static struct perf_cpu_map *pmu_cpumask(int dirfd, const char *name, bool is_cor
continue;
cpus = perf_cpu_map__read(file);
fclose(file);
- if (cpus)
- return cpus;
+ if (cpus) {
+ struct perf_cpu_map *intersect __maybe_unused;
+
+ if (perf_cpu_map__is_subset(cpu_map__online(), cpus))
+ return cpus;
+
+ intersect = perf_cpu_map__intersect(cpus, cpu_map__online());
+ perf_cpu_map__put(cpus);
+ if (intersect)
+ return intersect;
+ }
}
/* Nothing found, for core PMUs assume this means all CPUs. */
--
2.24.0
On Mon, Jun 3, 2024 at 2:33 AM Yicong Yang <[email protected]> wrote:
>
> From: Yicong Yang <[email protected]>
>
> When there're CPUs offline after system booting, perf will failed:
> [root@localhost ~]# /home/yang/perf stat -a -e armv8_pmuv3_0/cycles/
> Error:
> The sys_perf_event_open() syscall returned with 19 (No such device) for event (cpu-clock).
> /bin/dmesg | grep -i perf may provide additional information.
Thanks for debugging this Yicong! The fact cycles is falling back on
cpu-clock I'm confused by, on ARM the PMU type generally isn't
PERF_TYPE_HARDWARE and so this fallback shouldn't happen:
https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/evsel.c?h=perf-tools-next#n2900
I note your command line is for system wide event opening rather than
for a process. It is more strange the fallback is giving "No such
device".
> This is due to PMU's "cpus" is not updated and still contains offline
> CPUs and perf will try to open perf event on the offlined CPUs.
The perf tool will try to only open online CPUs. Unfortunately the
naming around this is confusing:
- any - an event may follow a task and schedule on "any" CPU. We
encode this with -1.
https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/lib/perf/include/perf/cpumap.h?h=perf-tools-next#n24
- online - the set of online CPU.
https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/lib/perf/include/perf/cpumap.h?h=perf-tools-next#n33
- all - I try to avoid this but it may still be in the code, "all"
usually is another name for online. Hopefully when we use this name it
is clearly defined:
https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/lib/perf/include/internal/evlist.h?h=perf-tools-next#n23
> Make "cpus" attribute only shows online CPUs and introduced a new
> "supported_cpus" where users can get the range of the CPUs this
> PMU supported monitoring.
I don't think this should be necessary as by default the CPUs the perf
tool will use will be the "online" CPUs:
https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/lib/perf/evlist.c?h=perf-tools-next#n40
Could you create a reproduction of the problem you are encountering?
My expectation is for a core PMU to advertise the online and offline
CPUs it is valid for, and for perf to intersect:
https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/lib/perf/include/perf/cpumap.h?h=perf-tools-next#n44
those CPUs with the online CPUs so the perf_event_open doesn't fail.
Thanks,
Ian
> Signed-off-by: Yicong Yang <[email protected]>
> ---
> drivers/perf/arm_pmu.c | 24 +++++++++++++++++++++++-
> 1 file changed, 23 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/perf/arm_pmu.c b/drivers/perf/arm_pmu.c
> index 8458fe2cebb4..acbb0e1d0414 100644
> --- a/drivers/perf/arm_pmu.c
> +++ b/drivers/perf/arm_pmu.c
> @@ -558,13 +558,35 @@ static ssize_t cpus_show(struct device *dev,
> struct device_attribute *attr, char *buf)
> {
> struct arm_pmu *armpmu = to_arm_pmu(dev_get_drvdata(dev));
> - return cpumap_print_to_pagebuf(true, buf, &armpmu->supported_cpus);
> + cpumask_var_t mask;
> + ssize_t n;
> +
> + /* If allocation failed then show the supported_cpus */
> + if (!alloc_cpumask_var(&mask, GFP_KERNEL))
> + return cpumap_print_to_pagebuf(true, buf, &armpmu->supported_cpus);
> +
> + cpumask_and(mask, &armpmu->supported_cpus, cpu_online_mask);
> + n = cpumap_print_to_pagebuf(true, buf, mask);
> + free_cpumask_var(mask);
> +
> + return n;
> }
>
> static DEVICE_ATTR_RO(cpus);
>
> +static ssize_t supported_cpus_show(struct device *dev,
> + struct device_attribute *attr, char *buf)
> +{
> + struct arm_pmu *armpmu = to_arm_pmu(dev_get_drvdata(dev));
> +
> + return cpumap_print_to_pagebuf(true, buf, &armpmu->supported_cpus);
> +}
> +
> +static DEVICE_ATTR_RO(supported_cpus);
> +
> static struct attribute *armpmu_common_attrs[] = {
> &dev_attr_cpus.attr,
> + &dev_attr_supported_cpus.attr,
> NULL,
> };
>
> --
> 2.24.0
>
On Mon, Jun 3, 2024 at 2:33 AM Yicong Yang <[email protected]> wrote:
>
> From: Yicong Yang <[email protected]>
>
> If user doesn't specify the CPUs, perf will try to open events on CPUs
> of the PMU which is initialized from the PMU's "cpumask" or "cpus" sysfs
> attributes if provided. But we doesn't check whether the CPUs provided
> by the PMU are all online. So we may open events on offline CPUs if PMU
> driver provide offline CPUs and then we'll be rejected by the kernel:
>
> [root@localhost yang]# echo 0 > /sys/devices/system/cpu/cpu0/online
Generally Linux won't let you take CPU0 off line, I'm not able to
follow this step on x86 Linux. Fwiw, I routinely run perf with the
core hyperthread siblings offline.
Thanks,
Ian
> [root@localhost yang]# ./perf_static stat -e armv8_pmuv3_0/cycles/ --timeout 100
> Error:
> The sys_perf_event_open() syscall returned with 19 (No such device) for event (cpu-clock).
> /bin/dmesg | grep -i perf may provide additional information.
>
> This patchset tries to avoid this case by:
> - Double check the PMU's cpumask in the perf tool and only include online CPUs
> - Trying to make the PMU drivers only export online CPUs in its "cpus" or "cpumask"
> attributes
>
> Previously discussion can be found at [1]. Will suggested to do it in userspace.
> I think it makes sense to do a double check in the perf tool in case the driver
> doesn't do this. So PATCH 1/3 is added in this version.
>
> [1] https://lore.kernel.org/linux-arm-kernel/[email protected]/
>
> Yicong Yang (3):
> perf pmu: Limit PMU cpumask to online CPUs
> perf: arm_pmu: Only show online CPUs in device's "cpus" attribute
> perf: arm_spe: Only show online CPUs in device's "cpumask" attribute
>
> drivers/perf/arm_pmu.c | 24 +++++++++++++++++++++++-
> drivers/perf/arm_spe_pmu.c | 22 +++++++++++++++++++++-
> tools/perf/util/pmu.c | 13 +++++++++++--
> 3 files changed, 55 insertions(+), 4 deletions(-)
>
> --
> 2.24.0
>
On Mon, Jun 3, 2024 at 2:33 AM Yicong Yang <[email protected]> wrote:
>
> From: Yicong Yang <[email protected]>
>
> We'll initialize the PMU's cpumask from "cpumask" or "cpus" sysfs
> attributes if provided by the driver without checking the CPUs
> are online or not. In such case that CPUs provided by the driver
> contains the offline CPUs, we'll try to open event on the offline
> CPUs and then rejected by the kernel:
>
> [root@localhost yang]# echo 0 > /sys/devices/system/cpu/cpu0/online
> [root@localhost yang]# ./perf_static stat -e armv8_pmuv3_0/cycles/ --timeout 100
> Error:
> The sys_perf_event_open() syscall returned with 19 (No such device) for event (cpu-clock).
> /bin/dmesg | grep -i perf may provide additional information.
>
> So it's better to do a double check in the userspace and only include
> the online CPUs from "cpumask" or "cpus" to avoid opening events on
> offline CPUs.
I see where you are coming from with this but I think it is wrong. The
cpus for an uncore PMU are a hint of the CPU to open on rather than
the set of valid CPUs. For example:
```
$ cat /sys/devices/uncore_imc_free_running_0/cpumask
0
$ perf stat -vv -e uncore_imc_free_running_0/data_read/ -C 1 -a sleep 0.1
Using CPUID GenuineIntel-6-8D-1
Attempt to add: uncore_imc_free_running_0/data_read/
..after resolving event: uncore_imc_free_running_0/event=0xff,umask=0x20/
Control descriptor is not initialized
------------------------------------------------------------
perf_event_attr:
type 24 (uncore_imc_free_running_0)
size 136
config 0x20ff (data_read)
sample_type IDENTIFIER
read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
disabled 1
inherit 1
exclude_guest 1
------------------------------------------------------------
sys_perf_event_open: pid -1 cpu 1 group_fd -1 flags 0x8
sys_perf_event_open failed, error -22
switching off cloexec flag
------------------------------------------------------------
perf_event_attr:
type 24 (uncore_imc_free_running_0)
size 136
config 0x20ff (data_read)
sample_type IDENTIFIER
read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
disabled 1
inherit 1
exclude_guest 1
------------------------------------------------------------
sys_perf_event_open: pid -1 cpu 1 group_fd -1 flags 0
sys_perf_event_open failed, error -22
switching off exclude_guest, exclude_host
------------------------------------------------------------
perf_event_attr:
type 24 (uncore_imc_free_running_0)
size 136
config 0x20ff (data_read)
sample_type IDENTIFIER
read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
disabled 1
inherit 1
------------------------------------------------------------
sys_perf_event_open: pid -1 cpu 1 group_fd -1 flags 0 = 3
uncore_imc_free_running_0/data_read/: 1: 4005984 102338957 102338957
uncore_imc_free_running_0/data_read/: 4005984 102338957 102338957
Performance counter stats for 'system wide':
244.51 MiB uncore_imc_free_running_0/data_read/
0.102320376 seconds time elapsed
```
So the CPU mask of the PMU says to open on CPU 0, but on the command
line when I passed "-C 1" it opened it on CPU 1. If the cpumask file
contained an offline CPU then this change would make it so the CPU map
in the tool were empty, however, a different CPU may be programmable
and online.
Fwiw, the tool will determine whether the mask is for all valid or a
hint by using the notion of a PMU being "core" or not. That notion
considers whether the mask was loading from a "cpumask" or "cpus"
file:
https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/pmu.c?h=perf-tools-next#n810
Thanks,
Ian
> Signed-off-by: Yicong Yang <[email protected]>
> ---
> tools/perf/util/pmu.c | 13 +++++++++++--
> 1 file changed, 11 insertions(+), 2 deletions(-)
>
> diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c
> index 888ce9912275..51e8d10ee28b 100644
> --- a/tools/perf/util/pmu.c
> +++ b/tools/perf/util/pmu.c
> @@ -771,8 +771,17 @@ static struct perf_cpu_map *pmu_cpumask(int dirfd, const char *name, bool is_cor
> continue;
> cpus = perf_cpu_map__read(file);
> fclose(file);
> - if (cpus)
> - return cpus;
> + if (cpus) {
> + struct perf_cpu_map *intersect __maybe_unused;
> +
> + if (perf_cpu_map__is_subset(cpu_map__online(), cpus))
> + return cpus;
> +
> + intersect = perf_cpu_map__intersect(cpus, cpu_map__online());
> + perf_cpu_map__put(cpus);
> + if (intersect)
> + return intersect;
> + }
> }
>
> /* Nothing found, for core PMUs assume this means all CPUs. */
> --
> 2.24.0
>
Hi Ian,
On 2024/6/4 0:20, Ian Rogers wrote:
> On Mon, Jun 3, 2024 at 2:33 AM Yicong Yang <[email protected]> wrote:
>>
>> From: Yicong Yang <[email protected]>
>>
>> When there're CPUs offline after system booting, perf will failed:
>> [root@localhost ~]# /home/yang/perf stat -a -e armv8_pmuv3_0/cycles/
>> Error:
>> The sys_perf_event_open() syscall returned with 19 (No such device) for event (cpu-clock).
>> /bin/dmesg | grep -i perf may provide additional information.
>
> Thanks for debugging this Yicong! The fact cycles is falling back on
> cpu-clock I'm confused by, on ARM the PMU type generally isn't
> PERF_TYPE_HARDWARE and so this fallback shouldn't happen:
> https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/evsel.c?h=perf-tools-next#n2900
> I note your command line is for system wide event opening rather than
> for a process. It is more strange the fallback is giving "No such
> device".
>
>> This is due to PMU's "cpus" is not updated and still contains offline
>> CPUs and perf will try to open perf event on the offlined CPUs.
>
> The perf tool will try to only open online CPUs. Unfortunately the
> naming around this is confusing:
>
> - any - an event may follow a task and schedule on "any" CPU. We
> encode this with -1.
> https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/lib/perf/include/perf/cpumap.h?h=perf-tools-next#n24
> - online - the set of online CPU.
> https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/lib/perf/include/perf/cpumap.h?h=perf-tools-next#n33
> - all - I try to avoid this but it may still be in the code, "all"
> usually is another name for online. Hopefully when we use this name it
> is clearly defined:
> https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/lib/perf/include/internal/evlist.h?h=perf-tools-next#n23
>
>> Make "cpus" attribute only shows online CPUs and introduced a new
>> "supported_cpus" where users can get the range of the CPUs this
>> PMU supported monitoring.
>
> I don't think this should be necessary as by default the CPUs the perf
> tool will use will be the "online" CPUs:
> https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/lib/perf/evlist.c?h=perf-tools-next#n40
>
> Could you create a reproduction of the problem you are encountering?
> My expectation is for a core PMU to advertise the online and offline
> CPUs it is valid for, and for perf to intersect:
> https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/lib/perf/include/perf/cpumap.h?h=perf-tools-next#n44
> those CPUs with the online CPUs so the perf_event_open doesn't fail.
>
Thanks for the comments and detailed illustration!
Actually it can be reproduced easily using the armv8_pmuv3's events. Tested on 6.10-rc1 with
perf version 6.10.rc1.g1613e604df0c:
# offline any CPU
[root@localhost tmp]# echo 0 > /sys//devices/system/cpu/cpu1/online
# try any event of armv8_pmuv3, with -a or without
[root@localhost tmp]# ./perf stat -e armv8_pmuv3_0/ll_cache/ -vvv
Control descriptor is not initialized
Opening: armv8_pmuv3_0/ll_cache/
------------------------------------------------------------
perf_event_attr:
type 10 (armv8_pmuv3_0)
size 136
config 0x32 (ll_cache)
sample_type IDENTIFIER
read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
disabled 1
inherit 1
exclude_guest 1
------------------------------------------------------------
sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8 = 3
Opening: armv8_pmuv3_0/ll_cache/
------------------------------------------------------------
perf_event_attr:
type 10 (armv8_pmuv3_0)
size 136
config 0x32 (ll_cache)
sample_type IDENTIFIER
read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
disabled 1
inherit 1
exclude_guest 1
------------------------------------------------------------
sys_perf_event_open: pid -1 cpu 1 group_fd -1 flags 0x8
sys_perf_event_open failed, error -19
Error:
The sys_perf_event_open() syscall returned with 19 (No such device) for event (armv8_pmuv3_0/ll_cache/).
/bin/dmesg | grep -i perf may provide additional information.
As you can see, we're trying to open event on CPU 0 first (succeed) and then CPU 1 but
failed on CPU1. Actually we don't enter any branch which handle the evsel->cpus in
__perf_evlist__propagate_maps() so the evsel->cpus keeps as is which should be initialized
from the pmu->cpumask. You referenced to [1] but in this case perf_evsel->system_wide == false
as I checked. perf_evsel->system_wide will set to true in perf_evlist__go_system_wide() and it
maybe only set for dummy events. Please correct me if I misread here.
[1] https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/lib/perf/evlist.c?h=perf-tools-next#n40
Thanks.
On 2024/6/4 0:42, Ian Rogers wrote:
> On Mon, Jun 3, 2024 at 2:33 AM Yicong Yang <[email protected]> wrote:
>>
>> From: Yicong Yang <[email protected]>
>>
>> If user doesn't specify the CPUs, perf will try to open events on CPUs
>> of the PMU which is initialized from the PMU's "cpumask" or "cpus" sysfs
>> attributes if provided. But we doesn't check whether the CPUs provided
>> by the PMU are all online. So we may open events on offline CPUs if PMU
>> driver provide offline CPUs and then we'll be rejected by the kernel:
>>
>> [root@localhost yang]# echo 0 > /sys/devices/system/cpu/cpu0/online
>
> Generally Linux won't let you take CPU0 off line, I'm not able to
> follow this step on x86 Linux. Fwiw, I routinely run perf with the
> core hyperthread siblings offline.
>
It doesn't matter if it's the CPU0 offline or other CPUs. There's no restriction
for CPU0 can go offline or not on arm64 and I just use this for example.
I cannot reproduce it on x86. I think it may because we're initializing the
pmu->cpus in different routines in pmu_cpumask(). There's no "cpus"
for x86's core pmu on my x86 machine:
root@ubuntu204:~# ls /sys/bus/event_source/devices/cpu/
allow_tsx_force_abort caps events format freeze_on_smi perf_event_mux_interval_ms power rdpmc subsystem type uevent
So pmu_cpumask() will infer it as an core pmu and initialize the cpus
with online CPUs [1]. For arm64 there lies a "cpus" sysfs attributes
so pmu->cpus are initialized from the "cpus" without checking each
CPUs is online or not. That's what proposed in Patch 1/3.
There's a "cpus" sysfs for x86's hybrid machine, reading from the code [2].
And it seems always reflect the online CPUs supported by that PMU.
[1] https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/pmu.c?h=perf-tools-next#n779
[2] https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree//arch/x86/events/intel/core.c#n5736
Thanks.
On 2024/6/4 0:52, Ian Rogers wrote:
> On Mon, Jun 3, 2024 at 2:33 AM Yicong Yang <[email protected]> wrote:
>>
>> From: Yicong Yang <[email protected]>
>>
>> We'll initialize the PMU's cpumask from "cpumask" or "cpus" sysfs
>> attributes if provided by the driver without checking the CPUs
>> are online or not. In such case that CPUs provided by the driver
>> contains the offline CPUs, we'll try to open event on the offline
>> CPUs and then rejected by the kernel:
>>
>> [root@localhost yang]# echo 0 > /sys/devices/system/cpu/cpu0/online
>> [root@localhost yang]# ./perf_static stat -e armv8_pmuv3_0/cycles/ --timeout 100
>> Error:
>> The sys_perf_event_open() syscall returned with 19 (No such device) for event (cpu-clock).
>> /bin/dmesg | grep -i perf may provide additional information.
>>
>> So it's better to do a double check in the userspace and only include
>> the online CPUs from "cpumask" or "cpus" to avoid opening events on
>> offline CPUs.
>
> I see where you are coming from with this but I think it is wrong. The
> cpus for an uncore PMU are a hint of the CPU to open on rather than
> the set of valid CPUs. For example:
> ```
> $ cat /sys/devices/uncore_imc_free_running_0/cpumask
> 0
> $ perf stat -vv -e uncore_imc_free_running_0/data_read/ -C 1 -a sleep 0.1
> Using CPUID GenuineIntel-6-8D-1
> Attempt to add: uncore_imc_free_running_0/data_read/
> ..after resolving event: uncore_imc_free_running_0/event=0xff,umask=0x20/
> Control descriptor is not initialized
> ------------------------------------------------------------
> perf_event_attr:
> type 24 (uncore_imc_free_running_0)
> size 136
> config 0x20ff (data_read)
> sample_type IDENTIFIER
> read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
> disabled 1
> inherit 1
> exclude_guest 1
> ------------------------------------------------------------
> sys_perf_event_open: pid -1 cpu 1 group_fd -1 flags 0x8
> sys_perf_event_open failed, error -22
> switching off cloexec flag
> ------------------------------------------------------------
> perf_event_attr:
> type 24 (uncore_imc_free_running_0)
> size 136
> config 0x20ff (data_read)
> sample_type IDENTIFIER
> read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
> disabled 1
> inherit 1
> exclude_guest 1
> ------------------------------------------------------------
> sys_perf_event_open: pid -1 cpu 1 group_fd -1 flags 0
> sys_perf_event_open failed, error -22
> switching off exclude_guest, exclude_host
> ------------------------------------------------------------
> perf_event_attr:
> type 24 (uncore_imc_free_running_0)
> size 136
> config 0x20ff (data_read)
> sample_type IDENTIFIER
> read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
> disabled 1
> inherit 1
> ------------------------------------------------------------
> sys_perf_event_open: pid -1 cpu 1 group_fd -1 flags 0 = 3
> uncore_imc_free_running_0/data_read/: 1: 4005984 102338957 102338957
> uncore_imc_free_running_0/data_read/: 4005984 102338957 102338957
>
> Performance counter stats for 'system wide':
>
> 244.51 MiB uncore_imc_free_running_0/data_read/
>
> 0.102320376 seconds time elapsed
> ```
> So the CPU mask of the PMU says to open on CPU 0, but on the command
> line when I passed "-C 1" it opened it on CPU 1. If the cpumask file
> contained an offline CPU then this change would make it so the CPU map
> in the tool were empty, however, a different CPU may be programmable
> and online.
>
> Fwiw, the tool will determine whether the mask is for all valid or a
> hint by using the notion of a PMU being "core" or not. That notion
> considers whether the mask was loading from a "cpumask" or "cpus"
> file:
> https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/pmu.c?h=perf-tools-next#n810
>
I see, you're correct on this. Maybe I didn't mention it clearly in the commit,
this patch doesn't intend to change the case that user specify the CPUs explicitly.
It'll have difference in case user doesn't specify the CPU list while the PMU's
'cpus' or 'cpumask' contains offline CPUs: with this patch we'll use the online
CPUs in the PMU's 'cpus' or 'cpumask' but before this we may use the offline CPUs.
We still honor the user's input by the handling in __perf_evlist__propagate_maps().
Thanks.
On 2024/6/4 0:20, Ian Rogers wrote:
> On Mon, Jun 3, 2024 at 2:33 AM Yicong Yang <[email protected]> wrote:
>>
>> From: Yicong Yang <[email protected]>
>>
>> When there're CPUs offline after system booting, perf will failed:
>> [root@localhost ~]# /home/yang/perf stat -a -e armv8_pmuv3_0/cycles/
>> Error:
>> The sys_perf_event_open() syscall returned with 19 (No such device) for event (cpu-clock).
>> /bin/dmesg | grep -i perf may provide additional information.
>
> Thanks for debugging this Yicong! The fact cycles is falling back on
> cpu-clock I'm confused by, on ARM the PMU type generally isn't
> PERF_TYPE_HARDWARE and so this fallback shouldn't happen:
> https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/evsel.c?h=perf-tools-next#n2900
It should be brought by config_term_pmu() in [1]. If it's a hardware event term
and not found in the PMU's sysfs, we'll make attr->type to PERF_TYPE_HARDWARE.
[1] https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/parse-events.c?h=perf-tools-next#n1052
Thanks.
On Tue, Jun 4, 2024 at 4:12 AM Yicong Yang <[email protected]> wrote:
>
> On 2024/6/4 0:52, Ian Rogers wrote:
> > On Mon, Jun 3, 2024 at 2:33 AM Yicong Yang <[email protected]> wrote:
> > Fwiw, the tool will determine whether the mask is for all valid or a
> > hint by using the notion of a PMU being "core" or not. That notion
> > considers whether the mask was loading from a "cpumask" or "cpus"
> > file:
> > https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/pmu.c?h=perf-tools-next#n810
> >
>
> I see, you're correct on this. Maybe I didn't mention it clearly in the commit,
> this patch doesn't intend to change the case that user specify the CPUs explicitly.
> It'll have difference in case user doesn't specify the CPU list while the PMU's
> 'cpus' or 'cpumask' contains offline CPUs: with this patch we'll use the online
> CPUs in the PMU's 'cpus' or 'cpumask' but before this we may use the offline CPUs.
> We still honor the user's input by the handling in __perf_evlist__propagate_maps().
Thanks Yicong,
I'm having a hard time believing this bug has been there all along
(forever) and also what's happening with BIG.little/hybrid that
advertise CPUs in similar ways. I do see that on armv8_pmuv3_0 the
"cpus" file will have offline cpus while on x86 a lack of such a file
causes a fallback on using the set of online CPUs. So there is a
divergence in the PMUs behavior that may well manifest itself in the
perf tool like this. If the "cpus" file were empty we might fix the
behavior but I don't think that's what's wanted. I also wonder that a
PMU driver change will mean the perf tool is still not working
correctly for people who don't rebuild their kernel.
I'm still thinking about this but thanks for the updates,
Ian
On Tue, Jun 4, 2024 at 1:04 AM Yicong Yang <[email protected]> wrote:
>
> On 2024/6/4 0:42, Ian Rogers wrote:
> > On Mon, Jun 3, 2024 at 2:33 AM Yicong Yang <[email protected]> wrote:
> >>
> >> From: Yicong Yang <[email protected]>
> >>
> >> If user doesn't specify the CPUs, perf will try to open events on CPUs
> >> of the PMU which is initialized from the PMU's "cpumask" or "cpus" sysfs
> >> attributes if provided. But we doesn't check whether the CPUs provided
> >> by the PMU are all online. So we may open events on offline CPUs if PMU
> >> driver provide offline CPUs and then we'll be rejected by the kernel:
> >>
> >> [root@localhost yang]# echo 0 > /sys/devices/system/cpu/cpu0/online
> >
> > Generally Linux won't let you take CPU0 off line, I'm not able to
> > follow this step on x86 Linux. Fwiw, I routinely run perf with the
> > core hyperthread siblings offline.
> >
>
> It doesn't matter if it's the CPU0 offline or other CPUs. There's no restriction
> for CPU0 can go offline or not on arm64 and I just use this for example.
>
> I cannot reproduce it on x86. I think it may because we're initializing the
> pmu->cpus in different routines in pmu_cpumask(). There's no "cpus"
> for x86's core pmu on my x86 machine:
> root@ubuntu204:~# ls /sys/bus/event_source/devices/cpu/
> allow_tsx_force_abort caps events format freeze_on_smi perf_event_mux_interval_ms power rdpmc subsystem type uevent
>
> So pmu_cpumask() will infer it as an core pmu and initialize the cpus
> with online CPUs [1]. For arm64 there lies a "cpus" sysfs attributes
> so pmu->cpus are initialized from the "cpus" without checking each
> CPUs is online or not. That's what proposed in Patch 1/3.
>
> There's a "cpus" sysfs for x86's hybrid machine, reading from the code [2].
> And it seems always reflect the online CPUs supported by that PMU.
Thanks Yicong, looking on a hybrid machine and taking cpu1 offline I
see the PMU's "cpus" not containing the offline CPU. I think this
supports that the PMU driver should be fixed for ARM, but we'll also
need a workaround in the perf tool for older kernels.
Thanks,
Ian
> [1] https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/pmu.c?h=perf-tools-next#n779
> [2] https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree//arch/x86/events/intel/core.c#n5736
>
> Thanks.
Hello,
On Mon, Jun 03, 2024 at 09:52:15AM -0700, Ian Rogers wrote:
> On Mon, Jun 3, 2024 at 2:33 AM Yicong Yang <[email protected]> wrote:
> >
> > From: Yicong Yang <[email protected]>
> >
> > We'll initialize the PMU's cpumask from "cpumask" or "cpus" sysfs
> > attributes if provided by the driver without checking the CPUs
> > are online or not. In such case that CPUs provided by the driver
> > contains the offline CPUs, we'll try to open event on the offline
> > CPUs and then rejected by the kernel:
> >
> > [root@localhost yang]# echo 0 > /sys/devices/system/cpu/cpu0/online
> > [root@localhost yang]# ./perf_static stat -e armv8_pmuv3_0/cycles/ --timeout 100
> > Error:
> > The sys_perf_event_open() syscall returned with 19 (No such device) for event (cpu-clock).
> > /bin/dmesg | grep -i perf may provide additional information.
> >
> > So it's better to do a double check in the userspace and only include
> > the online CPUs from "cpumask" or "cpus" to avoid opening events on
> > offline CPUs.
>
> I see where you are coming from with this but I think it is wrong. The
> cpus for an uncore PMU are a hint of the CPU to open on rather than
> the set of valid CPUs. For example:
> ```
> $ cat /sys/devices/uncore_imc_free_running_0/cpumask
> 0
> $ perf stat -vv -e uncore_imc_free_running_0/data_read/ -C 1 -a sleep 0.1
> Using CPUID GenuineIntel-6-8D-1
> Attempt to add: uncore_imc_free_running_0/data_read/
> ..after resolving event: uncore_imc_free_running_0/event=0xff,umask=0x20/
> Control descriptor is not initialized
> ------------------------------------------------------------
> perf_event_attr:
> type 24 (uncore_imc_free_running_0)
> size 136
> config 0x20ff (data_read)
> sample_type IDENTIFIER
> read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
> disabled 1
> inherit 1
> exclude_guest 1
> ------------------------------------------------------------
> sys_perf_event_open: pid -1 cpu 1 group_fd -1 flags 0x8
> sys_perf_event_open failed, error -22
> switching off cloexec flag
> ------------------------------------------------------------
> perf_event_attr:
> type 24 (uncore_imc_free_running_0)
> size 136
> config 0x20ff (data_read)
> sample_type IDENTIFIER
> read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
> disabled 1
> inherit 1
> exclude_guest 1
> ------------------------------------------------------------
> sys_perf_event_open: pid -1 cpu 1 group_fd -1 flags 0
> sys_perf_event_open failed, error -22
> switching off exclude_guest, exclude_host
> ------------------------------------------------------------
> perf_event_attr:
> type 24 (uncore_imc_free_running_0)
> size 136
> config 0x20ff (data_read)
> sample_type IDENTIFIER
> read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
> disabled 1
> inherit 1
> ------------------------------------------------------------
> sys_perf_event_open: pid -1 cpu 1 group_fd -1 flags 0 = 3
> uncore_imc_free_running_0/data_read/: 1: 4005984 102338957 102338957
> uncore_imc_free_running_0/data_read/: 4005984 102338957 102338957
>
> Performance counter stats for 'system wide':
>
> 244.51 MiB uncore_imc_free_running_0/data_read/
>
> 0.102320376 seconds time elapsed
> ```
> So the CPU mask of the PMU says to open on CPU 0, but on the command
> line when I passed "-C 1" it opened it on CPU 1. If the cpumask file
> contained an offline CPU then this change would make it so the CPU map
> in the tool were empty, however, a different CPU may be programmable
> and online.
I think Intel uncore PMU driver ignores the CPU parameter and set it to
CPU 0 in this case internally. See uncore_pmu_event_init() at
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/x86/events/intel/uncore.c#n761
>
> Fwiw, the tool will determine whether the mask is for all valid or a
> hint by using the notion of a PMU being "core" or not. That notion
> considers whether the mask was loading from a "cpumask" or "cpus"
> file:
> https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/pmu.c?h=perf-tools-next#n810
>
> Thanks,
> Ian
>
> > Signed-off-by: Yicong Yang <[email protected]>
> > ---
> > tools/perf/util/pmu.c | 13 +++++++++++--
> > 1 file changed, 11 insertions(+), 2 deletions(-)
> >
> > diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c
> > index 888ce9912275..51e8d10ee28b 100644
> > --- a/tools/perf/util/pmu.c
> > +++ b/tools/perf/util/pmu.c
> > @@ -771,8 +771,17 @@ static struct perf_cpu_map *pmu_cpumask(int dirfd, const char *name, bool is_cor
> > continue;
> > cpus = perf_cpu_map__read(file);
> > fclose(file);
> > - if (cpus)
> > - return cpus;
> > + if (cpus) {
> > + struct perf_cpu_map *intersect __maybe_unused;
> > +
> > + if (perf_cpu_map__is_subset(cpu_map__online(), cpus))
> > + return cpus;
> > +
> > + intersect = perf_cpu_map__intersect(cpus, cpu_map__online());
So IIUC this is for core PMUs with "cpus" file, right? I guess uncore
drivers already handles "cpumask" properly..
Thanks,
Namhyung
> > + perf_cpu_map__put(cpus);
> > + if (intersect)
> > + return intersect;
> > + }
> > }
> >
> > /* Nothing found, for core PMUs assume this means all CPUs. */
> > --
> > 2.24.0
> >
On Tue, Jun 04, 2024 at 03:43:35PM +0800, Yicong Yang wrote:
> Hi Ian,
>
> On 2024/6/4 0:20, Ian Rogers wrote:
> > On Mon, Jun 3, 2024 at 2:33 AM Yicong Yang <[email protected]> wrote:
> >>
> >> From: Yicong Yang <[email protected]>
> >>
> >> When there're CPUs offline after system booting, perf will failed:
> >> [root@localhost ~]# /home/yang/perf stat -a -e armv8_pmuv3_0/cycles/
> >> Error:
> >> The sys_perf_event_open() syscall returned with 19 (No such device) for event (cpu-clock).
> >> /bin/dmesg | grep -i perf may provide additional information.
> >
> > Thanks for debugging this Yicong! The fact cycles is falling back on
> > cpu-clock I'm confused by, on ARM the PMU type generally isn't
> > PERF_TYPE_HARDWARE and so this fallback shouldn't happen:
> > https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/evsel.c?h=perf-tools-next#n2900
> > I note your command line is for system wide event opening rather than
> > for a process. It is more strange the fallback is giving "No such
> > device".
> >
> >> This is due to PMU's "cpus" is not updated and still contains offline
> >> CPUs and perf will try to open perf event on the offlined CPUs.
> >
> > The perf tool will try to only open online CPUs. Unfortunately the
> > naming around this is confusing:
> >
> > - any - an event may follow a task and schedule on "any" CPU. We
> > encode this with -1.
> > https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/lib/perf/include/perf/cpumap.h?h=perf-tools-next#n24
> > - online - the set of online CPU.
> > https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/lib/perf/include/perf/cpumap.h?h=perf-tools-next#n33
> > - all - I try to avoid this but it may still be in the code, "all"
> > usually is another name for online. Hopefully when we use this name it
> > is clearly defined:
> > https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/lib/perf/include/internal/evlist.h?h=perf-tools-next#n23
> >
> >> Make "cpus" attribute only shows online CPUs and introduced a new
> >> "supported_cpus" where users can get the range of the CPUs this
> >> PMU supported monitoring.
> >
> > I don't think this should be necessary as by default the CPUs the perf
> > tool will use will be the "online" CPUs:
> > https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/lib/perf/evlist.c?h=perf-tools-next#n40
> >
> > Could you create a reproduction of the problem you are encountering?
> > My expectation is for a core PMU to advertise the online and offline
> > CPUs it is valid for, and for perf to intersect:
> > https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/lib/perf/include/perf/cpumap.h?h=perf-tools-next#n44
> > those CPUs with the online CPUs so the perf_event_open doesn't fail.
> >
>
> Thanks for the comments and detailed illustration!
>
> Actually it can be reproduced easily using the armv8_pmuv3's events. Tested on 6.10-rc1 with
> perf version 6.10.rc1.g1613e604df0c:
> # offline any CPU
> [root@localhost tmp]# echo 0 > /sys//devices/system/cpu/cpu1/online
> # try any event of armv8_pmuv3, with -a or without
> [root@localhost tmp]# ./perf stat -e armv8_pmuv3_0/ll_cache/ -vvv
> Control descriptor is not initialized
> Opening: armv8_pmuv3_0/ll_cache/
> ------------------------------------------------------------
> perf_event_attr:
> type 10 (armv8_pmuv3_0)
> size 136
> config 0x32 (ll_cache)
> sample_type IDENTIFIER
> read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
> disabled 1
> inherit 1
> exclude_guest 1
> ------------------------------------------------------------
> sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8 = 3
> Opening: armv8_pmuv3_0/ll_cache/
> ------------------------------------------------------------
> perf_event_attr:
> type 10 (armv8_pmuv3_0)
> size 136
> config 0x32 (ll_cache)
> sample_type IDENTIFIER
> read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
> disabled 1
> inherit 1
> exclude_guest 1
> ------------------------------------------------------------
> sys_perf_event_open: pid -1 cpu 1 group_fd -1 flags 0x8
> sys_perf_event_open failed, error -19
> Error:
> The sys_perf_event_open() syscall returned with 19 (No such device) for event (armv8_pmuv3_0/ll_cache/).
> /bin/dmesg | grep -i perf may provide additional information.
>
> As you can see, we're trying to open event on CPU 0 first (succeed) and then CPU 1 but
> failed on CPU1. Actually we don't enter any branch which handle the evsel->cpus in
> __perf_evlist__propagate_maps() so the evsel->cpus keeps as is which should be initialized
> from the pmu->cpumask. You referenced to [1] but in this case perf_evsel->system_wide == false
> as I checked. perf_evsel->system_wide will set to true in perf_evlist__go_system_wide() and it
> maybe only set for dummy events. Please correct me if I misread here.
Yes, this is confusing. IIRC evsel->system_wide is for tracking (dummy)
events to collect side band events (like for Intel-PT) regardless of
command line options (like -a or -C).
Thanks,
Namhyung
>
> [1] https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/lib/perf/evlist.c?h=perf-tools-next#n40
>
> Thanks.
>
On Thu, Jun 6, 2024 at 5:09 PM Namhyung Kim <[email protected]> wrote:
>
> Hello,
>
> On Mon, Jun 03, 2024 at 09:52:15AM -0700, Ian Rogers wrote:
> > On Mon, Jun 3, 2024 at 2:33 AM Yicong Yang <[email protected]> wrote:
> > >
> > > From: Yicong Yang <[email protected]>
> > >
> > > We'll initialize the PMU's cpumask from "cpumask" or "cpus" sysfs
> > > attributes if provided by the driver without checking the CPUs
> > > are online or not. In such case that CPUs provided by the driver
> > > contains the offline CPUs, we'll try to open event on the offline
> > > CPUs and then rejected by the kernel:
> > >
> > > [root@localhost yang]# echo 0 > /sys/devices/system/cpu/cpu0/online
> > > [root@localhost yang]# ./perf_static stat -e armv8_pmuv3_0/cycles/ --timeout 100
> > > Error:
> > > The sys_perf_event_open() syscall returned with 19 (No such device) for event (cpu-clock).
> > > /bin/dmesg | grep -i perf may provide additional information.
> > >
> > > So it's better to do a double check in the userspace and only include
> > > the online CPUs from "cpumask" or "cpus" to avoid opening events on
> > > offline CPUs.
> >
> > I see where you are coming from with this but I think it is wrong. The
> > cpus for an uncore PMU are a hint of the CPU to open on rather than
> > the set of valid CPUs. For example:
> > ```
> > $ cat /sys/devices/uncore_imc_free_running_0/cpumask
> > 0
> > $ perf stat -vv -e uncore_imc_free_running_0/data_read/ -C 1 -a sleep 0.1
> > Using CPUID GenuineIntel-6-8D-1
> > Attempt to add: uncore_imc_free_running_0/data_read/
> > ..after resolving event: uncore_imc_free_running_0/event=0xff,umask=0x20/
> > Control descriptor is not initialized
> > ------------------------------------------------------------
> > perf_event_attr:
> > type 24 (uncore_imc_free_running_0)
> > size 136
> > config 0x20ff (data_read)
> > sample_type IDENTIFIER
> > read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
> > disabled 1
> > inherit 1
> > exclude_guest 1
> > ------------------------------------------------------------
> > sys_perf_event_open: pid -1 cpu 1 group_fd -1 flags 0x8
> > sys_perf_event_open failed, error -22
> > switching off cloexec flag
> > ------------------------------------------------------------
> > perf_event_attr:
> > type 24 (uncore_imc_free_running_0)
> > size 136
> > config 0x20ff (data_read)
> > sample_type IDENTIFIER
> > read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
> > disabled 1
> > inherit 1
> > exclude_guest 1
> > ------------------------------------------------------------
> > sys_perf_event_open: pid -1 cpu 1 group_fd -1 flags 0
> > sys_perf_event_open failed, error -22
> > switching off exclude_guest, exclude_host
> > ------------------------------------------------------------
> > perf_event_attr:
> > type 24 (uncore_imc_free_running_0)
> > size 136
> > config 0x20ff (data_read)
> > sample_type IDENTIFIER
> > read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
> > disabled 1
> > inherit 1
> > ------------------------------------------------------------
> > sys_perf_event_open: pid -1 cpu 1 group_fd -1 flags 0 = 3
> > uncore_imc_free_running_0/data_read/: 1: 4005984 102338957 102338957
> > uncore_imc_free_running_0/data_read/: 4005984 102338957 102338957
> >
> > Performance counter stats for 'system wide':
> >
> > 244.51 MiB uncore_imc_free_running_0/data_read/
> >
> > 0.102320376 seconds time elapsed
> > ```
> > So the CPU mask of the PMU says to open on CPU 0, but on the command
> > line when I passed "-C 1" it opened it on CPU 1. If the cpumask file
> > contained an offline CPU then this change would make it so the CPU map
> > in the tool were empty, however, a different CPU may be programmable
> > and online.
>
> I think Intel uncore PMU driver ignores the CPU parameter and set it to
> CPU 0 in this case internally. See uncore_pmu_event_init() at
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/x86/events/intel/uncore.c#n761
Hmm.. maybe that's just the option if not set. Wrt hot plugging, on a
2 socket skylake:
```
$ cat /sys/devices/uncore_imc_0/cpumask
0,18
$ echo 0 > /sys/devices/system/cpu/cpu18/online
$ cat /sys/devices/uncore_imc_0/cpumask
0,19
```
So the cpumask should be reflecting the online/offline nature of CPUs.
> >
> > Fwiw, the tool will determine whether the mask is for all valid or a
> > hint by using the notion of a PMU being "core" or not. That notion
> > considers whether the mask was loading from a "cpumask" or "cpus"
> > file:
> > https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/pmu.c?h=perf-tools-next#n810
> >
> > Thanks,
> > Ian
> >
> > > Signed-off-by: Yicong Yang <[email protected]>
> > > ---
> > > tools/perf/util/pmu.c | 13 +++++++++++--
> > > 1 file changed, 11 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c
> > > index 888ce9912275..51e8d10ee28b 100644
> > > --- a/tools/perf/util/pmu.c
> > > +++ b/tools/perf/util/pmu.c
> > > @@ -771,8 +771,17 @@ static struct perf_cpu_map *pmu_cpumask(int dirfd, const char *name, bool is_cor
> > > continue;
> > > cpus = perf_cpu_map__read(file);
> > > fclose(file);
> > > - if (cpus)
> > > - return cpus;
> > > + if (cpus) {
> > > + struct perf_cpu_map *intersect __maybe_unused;
> > > +
> > > + if (perf_cpu_map__is_subset(cpu_map__online(), cpus))
> > > + return cpus;
> > > +
> > > + intersect = perf_cpu_map__intersect(cpus, cpu_map__online());
>
> So IIUC this is for core PMUs with "cpus" file, right? I guess uncore
> drivers already handles "cpumask" properly..
So I think this is an ARM specific bug:
Core PMUs:
x86 uses /sys/devices/cpu, s390 uses cpum_cf, these lack a cpus or
cpumask and so we default to opening events on all online processors.
The fact these are core PMUs is hardcoded in the tool:
https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/pmu.c?h=perf-tools-next#n1747
x86 hybrid /sys/devices/cpu_(core|atom)/cpus - the set of CPUs is
updated to reflect online and offline
ARM the /sys/devices/armv8_pmuv3_0 isn't a hardcoded core PMU and so
we expect the cpus to contain online CPUs, but it currently also
erroneously contains offline ones.
Uncore PMUs:
x86 has /sys/devices/<uncore...>/cpumask where the cpumask reflects
online and offline CPUs
ARM has things like the dmc620 PMU, it appears to be missing cpumask
in certain cases leading to the perf tool treating it like the x86
core PMU and opening events for it on every CPU:
```
# ls /sys/devices/arm_dmc620_10008c000/
events format perf_event_mux_interval_ms power subsystem type uevent
```
I think we need a PMU test so that bugs like this can be reported, but
we may also need to add tool workarounds for PMUs that are broken. I
can imagine it will be tricky to test uncore PMUs that default to CPU0
as often we can't offline that.
Thanks,
Ian
> Thanks,
> Namhyung
>
>
> > > + perf_cpu_map__put(cpus);
> > > + if (intersect)
> > > + return intersect;
> > > + }
> > > }
> > >
> > > /* Nothing found, for core PMUs assume this means all CPUs. */
> > > --
> > > 2.24.0
> > >
On Thu, Jun 6, 2024 at 5:36 PM Ian Rogers <[email protected]> wrote:
>
> On Thu, Jun 6, 2024 at 5:09 PM Namhyung Kim <[email protected]> wrote:
> >
> > Hello,
> >
> > On Mon, Jun 03, 2024 at 09:52:15AM -0700, Ian Rogers wrote:
> > > On Mon, Jun 3, 2024 at 2:33 AM Yicong Yang <[email protected]> wrote:
> > > >
> > > > From: Yicong Yang <[email protected]>
> > > >
> > > > We'll initialize the PMU's cpumask from "cpumask" or "cpus" sysfs
> > > > attributes if provided by the driver without checking the CPUs
> > > > are online or not. In such case that CPUs provided by the driver
> > > > contains the offline CPUs, we'll try to open event on the offline
> > > > CPUs and then rejected by the kernel:
> > > >
> > > > [root@localhost yang]# echo 0 > /sys/devices/system/cpu/cpu0/online
> > > > [root@localhost yang]# ./perf_static stat -e armv8_pmuv3_0/cycles/ --timeout 100
> > > > Error:
> > > > The sys_perf_event_open() syscall returned with 19 (No such device) for event (cpu-clock).
> > > > /bin/dmesg | grep -i perf may provide additional information.
> > > >
> > > > So it's better to do a double check in the userspace and only include
> > > > the online CPUs from "cpumask" or "cpus" to avoid opening events on
> > > > offline CPUs.
> > >
> > > I see where you are coming from with this but I think it is wrong. The
> > > cpus for an uncore PMU are a hint of the CPU to open on rather than
> > > the set of valid CPUs. For example:
> > > ```
> > > $ cat /sys/devices/uncore_imc_free_running_0/cpumask
> > > 0
> > > $ perf stat -vv -e uncore_imc_free_running_0/data_read/ -C 1 -a sleep 0.1
> > > Using CPUID GenuineIntel-6-8D-1
> > > Attempt to add: uncore_imc_free_running_0/data_read/
> > > ..after resolving event: uncore_imc_free_running_0/event=0xff,umask=0x20/
> > > Control descriptor is not initialized
> > > ------------------------------------------------------------
> > > perf_event_attr:
> > > type 24 (uncore_imc_free_running_0)
> > > size 136
> > > config 0x20ff (data_read)
> > > sample_type IDENTIFIER
> > > read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
> > > disabled 1
> > > inherit 1
> > > exclude_guest 1
> > > ------------------------------------------------------------
> > > sys_perf_event_open: pid -1 cpu 1 group_fd -1 flags 0x8
> > > sys_perf_event_open failed, error -22
> > > switching off cloexec flag
> > > ------------------------------------------------------------
> > > perf_event_attr:
> > > type 24 (uncore_imc_free_running_0)
> > > size 136
> > > config 0x20ff (data_read)
> > > sample_type IDENTIFIER
> > > read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
> > > disabled 1
> > > inherit 1
> > > exclude_guest 1
> > > ------------------------------------------------------------
> > > sys_perf_event_open: pid -1 cpu 1 group_fd -1 flags 0
> > > sys_perf_event_open failed, error -22
> > > switching off exclude_guest, exclude_host
> > > ------------------------------------------------------------
> > > perf_event_attr:
> > > type 24 (uncore_imc_free_running_0)
> > > size 136
> > > config 0x20ff (data_read)
> > > sample_type IDENTIFIER
> > > read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
> > > disabled 1
> > > inherit 1
> > > ------------------------------------------------------------
> > > sys_perf_event_open: pid -1 cpu 1 group_fd -1 flags 0 = 3
> > > uncore_imc_free_running_0/data_read/: 1: 4005984 102338957 102338957
> > > uncore_imc_free_running_0/data_read/: 4005984 102338957 102338957
> > >
> > > Performance counter stats for 'system wide':
> > >
> > > 244.51 MiB uncore_imc_free_running_0/data_read/
> > >
> > > 0.102320376 seconds time elapsed
> > > ```
> > > So the CPU mask of the PMU says to open on CPU 0, but on the command
> > > line when I passed "-C 1" it opened it on CPU 1. If the cpumask file
> > > contained an offline CPU then this change would make it so the CPU map
> > > in the tool were empty, however, a different CPU may be programmable
> > > and online.
> >
> > I think Intel uncore PMU driver ignores the CPU parameter and set it to
> > CPU 0 in this case internally. See uncore_pmu_event_init() at
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/x86/events/intel/uncore.c#n761
>
> Hmm.. maybe that's just the option if not set. Wrt hot plugging, on a
> 2 socket skylake:
> ```
> $ cat /sys/devices/uncore_imc_0/cpumask
> 0,18
> $ echo 0 > /sys/devices/system/cpu/cpu18/online
> $ cat /sys/devices/uncore_imc_0/cpumask
> 0,19
> ```
> So the cpumask should be reflecting the online/offline nature of CPUs.
>
> > >
> > > Fwiw, the tool will determine whether the mask is for all valid or a
> > > hint by using the notion of a PMU being "core" or not. That notion
> > > considers whether the mask was loading from a "cpumask" or "cpus"
> > > file:
> > > https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/pmu.c?h=perf-tools-next#n810
> > >
> > > Thanks,
> > > Ian
> > >
> > > > Signed-off-by: Yicong Yang <[email protected]>
> > > > ---
> > > > tools/perf/util/pmu.c | 13 +++++++++++--
> > > > 1 file changed, 11 insertions(+), 2 deletions(-)
> > > >
> > > > diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c
> > > > index 888ce9912275..51e8d10ee28b 100644
> > > > --- a/tools/perf/util/pmu.c
> > > > +++ b/tools/perf/util/pmu.c
> > > > @@ -771,8 +771,17 @@ static struct perf_cpu_map *pmu_cpumask(int dirfd, const char *name, bool is_cor
> > > > continue;
> > > > cpus = perf_cpu_map__read(file);
> > > > fclose(file);
> > > > - if (cpus)
> > > > - return cpus;
> > > > + if (cpus) {
> > > > + struct perf_cpu_map *intersect __maybe_unused;
> > > > +
> > > > + if (perf_cpu_map__is_subset(cpu_map__online(), cpus))
> > > > + return cpus;
> > > > +
> > > > + intersect = perf_cpu_map__intersect(cpus, cpu_map__online());
> >
> > So IIUC this is for core PMUs with "cpus" file, right? I guess uncore
> > drivers already handles "cpumask" properly..
>
> So I think this is an ARM specific bug:
>
> Core PMUs:
> x86 uses /sys/devices/cpu, s390 uses cpum_cf, these lack a cpus or
> cpumask and so we default to opening events on all online processors.
> The fact these are core PMUs is hardcoded in the tool:
> https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/pmu.c?h=perf-tools-next#n1747
> x86 hybrid /sys/devices/cpu_(core|atom)/cpus - the set of CPUs is
> updated to reflect online and offline
> ARM the /sys/devices/armv8_pmuv3_0 isn't a hardcoded core PMU and so
> we expect the cpus to contain online CPUs, but it currently also
> erroneously contains offline ones.
>
> Uncore PMUs:
> x86 has /sys/devices/<uncore...>/cpumask where the cpumask reflects
> online and offline CPUs
> ARM has things like the dmc620 PMU, it appears to be missing cpumask
> in certain cases leading to the perf tool treating it like the x86
> core PMU and opening events for it on every CPU:
> ```
> # ls /sys/devices/arm_dmc620_10008c000/
> events format perf_event_mux_interval_ms power subsystem type uevent
> ```
>
> I think we need a PMU test so that bugs like this can be reported, but
> we may also need to add tool workarounds for PMUs that are broken. I
> can imagine it will be tricky to test uncore PMUs that default to CPU0
> as often we can't offline that.
I think we should make this an ARM specific fix up like:
https://lore.kernel.org/lkml/[email protected]/
The PMU needs fixing up like in the rest of the change, but as perf
can be run on older kernels I think this workaround will remain
necessary. The arm_cmn PMU appears to handle CPU hot plugging, the
arm_dmc620 lacks a cpumask altogether on the test machine I can
access. I suspect we may want a better uncore fix as we may change a
CPU map of 1 CPU into an empty CPU map, for example, if the cpumask is
"0" and CPU0 is offline, then "1" would be a better alternative than
the empty CPU map. I couldn't find a PMU to test this with.
Thanks,
Ian
> Thanks,
> Ian
>
> > Thanks,
> > Namhyung
> >
> >
> > > > + perf_cpu_map__put(cpus);
> > > > + if (intersect)
> > > > + return intersect;
> > > > + }
> > > > }
> > > >
> > > > /* Nothing found, for core PMUs assume this means all CPUs. */
> > > > --
> > > > 2.24.0
> > > >