2023-07-10 12:29:39

by James Clark

[permalink] [raw]
Subject: [PATCH 1/4] arm_pmu: Add PERF_PMU_CAP_EXTENDED_HW_TYPE capability

This capability gives us the ability to open PERF_TYPE_HARDWARE and
PERF_TYPE_HW_CACHE events on a specific PMU for free. All the
implementation is contained in the Perf core and tool code so no change
to the Arm PMU driver is needed.

The following basic use case now results in Perf opening the event on
all PMUs rather than picking only one in an unpredictable way:

$ perf stat -e cycles -- taskset --cpu-list 0,1 stress -c 2

Performance counter stats for 'taskset --cpu-list 0,1 stress -c 2':

963279620 armv8_cortex_a57/cycles/ (99.19%)
752745657 armv8_cortex_a53/cycles/ (94.80%)

Fixes: 55bcf6ef314a ("perf: Extend PERF_TYPE_HARDWARE and PERF_TYPE_HW_CACHE")
Suggested-by: Ian Rogers <[email protected]>
Signed-off-by: James Clark <[email protected]>
---
drivers/perf/arm_pmu.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/perf/arm_pmu.c b/drivers/perf/arm_pmu.c
index 277e29fbd504..d8844a9461a2 100644
--- a/drivers/perf/arm_pmu.c
+++ b/drivers/perf/arm_pmu.c
@@ -875,8 +875,13 @@ struct arm_pmu *armpmu_alloc(void)
* configuration (e.g. big.LITTLE). This is not an uncore PMU,
* and we have taken ctx sharing into account (e.g. with our
* pmu::filter callback and pmu::event_init group validation).
+ *
+ * PERF_PMU_CAP_EXTENDED_HW_TYPE is required to open the legacy
+ * PERF_TYPE_HARDWARE and PERF_TYPE_HW_CACHE events on a
+ * specific PMU.
*/
- .capabilities = PERF_PMU_CAP_HETEROGENEOUS_CPUS | PERF_PMU_CAP_EXTENDED_REGS,
+ .capabilities = PERF_PMU_CAP_HETEROGENEOUS_CPUS | PERF_PMU_CAP_EXTENDED_REGS |
+ PERF_PMU_CAP_EXTENDED_HW_TYPE,
};

pmu->attr_groups[ARMPMU_ATTR_GROUP_COMMON] =
--
2.34.1



2023-07-10 17:13:10

by Ian Rogers

[permalink] [raw]
Subject: Re: [PATCH 1/4] arm_pmu: Add PERF_PMU_CAP_EXTENDED_HW_TYPE capability

On Mon, Jul 10, 2023 at 5:22 AM James Clark <[email protected]> wrote:
>
> This capability gives us the ability to open PERF_TYPE_HARDWARE and
> PERF_TYPE_HW_CACHE events on a specific PMU for free. All the
> implementation is contained in the Perf core and tool code so no change
> to the Arm PMU driver is needed.
>
> The following basic use case now results in Perf opening the event on
> all PMUs rather than picking only one in an unpredictable way:
>
> $ perf stat -e cycles -- taskset --cpu-list 0,1 stress -c 2
>
> Performance counter stats for 'taskset --cpu-list 0,1 stress -c 2':
>
> 963279620 armv8_cortex_a57/cycles/ (99.19%)
> 752745657 armv8_cortex_a53/cycles/ (94.80%)
>
> Fixes: 55bcf6ef314a ("perf: Extend PERF_TYPE_HARDWARE and PERF_TYPE_HW_CACHE")
> Suggested-by: Ian Rogers <[email protected]>
> Signed-off-by: James Clark <[email protected]>

Acked-by: Ian Rogers <[email protected]>

Thanks,
Ian

> ---
> drivers/perf/arm_pmu.c | 7 ++++++-
> 1 file changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/perf/arm_pmu.c b/drivers/perf/arm_pmu.c
> index 277e29fbd504..d8844a9461a2 100644
> --- a/drivers/perf/arm_pmu.c
> +++ b/drivers/perf/arm_pmu.c
> @@ -875,8 +875,13 @@ struct arm_pmu *armpmu_alloc(void)
> * configuration (e.g. big.LITTLE). This is not an uncore PMU,
> * and we have taken ctx sharing into account (e.g. with our
> * pmu::filter callback and pmu::event_init group validation).
> + *
> + * PERF_PMU_CAP_EXTENDED_HW_TYPE is required to open the legacy
> + * PERF_TYPE_HARDWARE and PERF_TYPE_HW_CACHE events on a
> + * specific PMU.
> */
> - .capabilities = PERF_PMU_CAP_HETEROGENEOUS_CPUS | PERF_PMU_CAP_EXTENDED_REGS,
> + .capabilities = PERF_PMU_CAP_HETEROGENEOUS_CPUS | PERF_PMU_CAP_EXTENDED_REGS |
> + PERF_PMU_CAP_EXTENDED_HW_TYPE,
> };
>
> pmu->attr_groups[ARMPMU_ATTR_GROUP_COMMON] =
> --
> 2.34.1
>

2023-07-11 12:19:14

by Anshuman Khandual

[permalink] [raw]
Subject: Re: [PATCH 1/4] arm_pmu: Add PERF_PMU_CAP_EXTENDED_HW_TYPE capability



On 7/10/23 17:51, James Clark wrote:
> This capability gives us the ability to open PERF_TYPE_HARDWARE and
> PERF_TYPE_HW_CACHE events on a specific PMU for free. All the
> implementation is contained in the Perf core and tool code so no change
> to the Arm PMU driver is needed.
>
> The following basic use case now results in Perf opening the event on
> all PMUs rather than picking only one in an unpredictable way:
>
> $ perf stat -e cycles -- taskset --cpu-list 0,1 stress -c 2
>
> Performance counter stats for 'taskset --cpu-list 0,1 stress -c 2':
>
> 963279620 armv8_cortex_a57/cycles/ (99.19%)
> 752745657 armv8_cortex_a53/cycles/ (94.80%)
>
> Fixes: 55bcf6ef314a ("perf: Extend PERF_TYPE_HARDWARE and PERF_TYPE_HW_CACHE")
> Suggested-by: Ian Rogers <[email protected]>
> Signed-off-by: James Clark <[email protected]>
> ---
> drivers/perf/arm_pmu.c | 7 ++++++-
> 1 file changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/perf/arm_pmu.c b/drivers/perf/arm_pmu.c
> index 277e29fbd504..d8844a9461a2 100644
> --- a/drivers/perf/arm_pmu.c
> +++ b/drivers/perf/arm_pmu.c
> @@ -875,8 +875,13 @@ struct arm_pmu *armpmu_alloc(void)
> * configuration (e.g. big.LITTLE). This is not an uncore PMU,
> * and we have taken ctx sharing into account (e.g. with our
> * pmu::filter callback and pmu::event_init group validation).
> + *
> + * PERF_PMU_CAP_EXTENDED_HW_TYPE is required to open the legacy

s/legacy/generic ? These hardware events are still around.

> + * PERF_TYPE_HARDWARE and PERF_TYPE_HW_CACHE events on a
> + * specific PMU.
> */
> - .capabilities = PERF_PMU_CAP_HETEROGENEOUS_CPUS | PERF_PMU_CAP_EXTENDED_REGS,
> + .capabilities = PERF_PMU_CAP_HETEROGENEOUS_CPUS | PERF_PMU_CAP_EXTENDED_REGS |
> + PERF_PMU_CAP_EXTENDED_HW_TYPE,
> };
>
> pmu->attr_groups[ARMPMU_ATTR_GROUP_COMMON] =

2023-07-11 14:24:17

by James Clark

[permalink] [raw]
Subject: Re: [PATCH 1/4] arm_pmu: Add PERF_PMU_CAP_EXTENDED_HW_TYPE capability



On 11/07/2023 13:01, Anshuman Khandual wrote:
>
>
> On 7/10/23 17:51, James Clark wrote:
>> This capability gives us the ability to open PERF_TYPE_HARDWARE and
>> PERF_TYPE_HW_CACHE events on a specific PMU for free. All the
>> implementation is contained in the Perf core and tool code so no change
>> to the Arm PMU driver is needed.
>>
>> The following basic use case now results in Perf opening the event on
>> all PMUs rather than picking only one in an unpredictable way:
>>
>> $ perf stat -e cycles -- taskset --cpu-list 0,1 stress -c 2
>>
>> Performance counter stats for 'taskset --cpu-list 0,1 stress -c 2':
>>
>> 963279620 armv8_cortex_a57/cycles/ (99.19%)
>> 752745657 armv8_cortex_a53/cycles/ (94.80%)
>>
>> Fixes: 55bcf6ef314a ("perf: Extend PERF_TYPE_HARDWARE and PERF_TYPE_HW_CACHE")
>> Suggested-by: Ian Rogers <[email protected]>
>> Signed-off-by: James Clark <[email protected]>
>> ---
>> drivers/perf/arm_pmu.c | 7 ++++++-
>> 1 file changed, 6 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/perf/arm_pmu.c b/drivers/perf/arm_pmu.c
>> index 277e29fbd504..d8844a9461a2 100644
>> --- a/drivers/perf/arm_pmu.c
>> +++ b/drivers/perf/arm_pmu.c
>> @@ -875,8 +875,13 @@ struct arm_pmu *armpmu_alloc(void)
>> * configuration (e.g. big.LITTLE). This is not an uncore PMU,
>> * and we have taken ctx sharing into account (e.g. with our
>> * pmu::filter callback and pmu::event_init group validation).
>> + *
>> + * PERF_PMU_CAP_EXTENDED_HW_TYPE is required to open the legacy
>
> s/legacy/generic ? These hardware events are still around.

True, I thought I saw it mentioned that way somewhere, but I can
probably just remove it altogether. PERF_TYPE_HARDWARE and
PERF_TYPE_HW_CACHE is enough.


>
>> + * PERF_TYPE_HARDWARE and PERF_TYPE_HW_CACHE events on a
>> + * specific PMU.
>> */
>> - .capabilities = PERF_PMU_CAP_HETEROGENEOUS_CPUS | PERF_PMU_CAP_EXTENDED_REGS,
>> + .capabilities = PERF_PMU_CAP_HETEROGENEOUS_CPUS | PERF_PMU_CAP_EXTENDED_REGS |
>> + PERF_PMU_CAP_EXTENDED_HW_TYPE,
>> };
>>
>> pmu->attr_groups[ARMPMU_ATTR_GROUP_COMMON] =

2023-07-20 17:30:48

by Ian Rogers

[permalink] [raw]
Subject: Re: [PATCH 1/4] arm_pmu: Add PERF_PMU_CAP_EXTENDED_HW_TYPE capability

On Tue, Jul 11, 2023 at 7:12 AM James Clark <[email protected]> wrote:
>
>
>
> On 11/07/2023 13:01, Anshuman Khandual wrote:
> >
> >
> > On 7/10/23 17:51, James Clark wrote:
> >> This capability gives us the ability to open PERF_TYPE_HARDWARE and
> >> PERF_TYPE_HW_CACHE events on a specific PMU for free. All the
> >> implementation is contained in the Perf core and tool code so no change
> >> to the Arm PMU driver is needed.
> >>
> >> The following basic use case now results in Perf opening the event on
> >> all PMUs rather than picking only one in an unpredictable way:
> >>
> >> $ perf stat -e cycles -- taskset --cpu-list 0,1 stress -c 2
> >>
> >> Performance counter stats for 'taskset --cpu-list 0,1 stress -c 2':
> >>
> >> 963279620 armv8_cortex_a57/cycles/ (99.19%)
> >> 752745657 armv8_cortex_a53/cycles/ (94.80%)
> >>
> >> Fixes: 55bcf6ef314a ("perf: Extend PERF_TYPE_HARDWARE and PERF_TYPE_HW_CACHE")
> >> Suggested-by: Ian Rogers <[email protected]>
> >> Signed-off-by: James Clark <[email protected]>

Hi ARM Linux and ARM Linux PMU people,

Could this patch be picked up for Linux 6.5? I don't see it in the
tree and it seems a shame to have to wait for it. The other patches do
cleanup and so waiting for 6.6 seems okay.

Thanks,
Ian

> >> ---
> >> drivers/perf/arm_pmu.c | 7 ++++++-
> >> 1 file changed, 6 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/drivers/perf/arm_pmu.c b/drivers/perf/arm_pmu.c
> >> index 277e29fbd504..d8844a9461a2 100644
> >> --- a/drivers/perf/arm_pmu.c
> >> +++ b/drivers/perf/arm_pmu.c
> >> @@ -875,8 +875,13 @@ struct arm_pmu *armpmu_alloc(void)
> >> * configuration (e.g. big.LITTLE). This is not an uncore PMU,
> >> * and we have taken ctx sharing into account (e.g. with our
> >> * pmu::filter callback and pmu::event_init group validation).
> >> + *
> >> + * PERF_PMU_CAP_EXTENDED_HW_TYPE is required to open the legacy
> >
> > s/legacy/generic ? These hardware events are still around.
>
> True, I thought I saw it mentioned that way somewhere, but I can
> probably just remove it altogether. PERF_TYPE_HARDWARE and
> PERF_TYPE_HW_CACHE is enough.
>
>
> >
> >> + * PERF_TYPE_HARDWARE and PERF_TYPE_HW_CACHE events on a
> >> + * specific PMU.
> >> */
> >> - .capabilities = PERF_PMU_CAP_HETEROGENEOUS_CPUS | PERF_PMU_CAP_EXTENDED_REGS,
> >> + .capabilities = PERF_PMU_CAP_HETEROGENEOUS_CPUS | PERF_PMU_CAP_EXTENDED_REGS |
> >> + PERF_PMU_CAP_EXTENDED_HW_TYPE,
> >> };
> >>
> >> pmu->attr_groups[ARMPMU_ATTR_GROUP_COMMON] =

2023-07-21 10:37:13

by Will Deacon

[permalink] [raw]
Subject: Re: [PATCH 1/4] arm_pmu: Add PERF_PMU_CAP_EXTENDED_HW_TYPE capability

On Thu, Jul 20, 2023 at 10:12:21AM -0700, Ian Rogers wrote:
> On Tue, Jul 11, 2023 at 7:12 AM James Clark <[email protected]> wrote:
> >
> >
> >
> > On 11/07/2023 13:01, Anshuman Khandual wrote:
> > >
> > >
> > > On 7/10/23 17:51, James Clark wrote:
> > >> This capability gives us the ability to open PERF_TYPE_HARDWARE and
> > >> PERF_TYPE_HW_CACHE events on a specific PMU for free. All the
> > >> implementation is contained in the Perf core and tool code so no change
> > >> to the Arm PMU driver is needed.
> > >>
> > >> The following basic use case now results in Perf opening the event on
> > >> all PMUs rather than picking only one in an unpredictable way:
> > >>
> > >> $ perf stat -e cycles -- taskset --cpu-list 0,1 stress -c 2
> > >>
> > >> Performance counter stats for 'taskset --cpu-list 0,1 stress -c 2':
> > >>
> > >> 963279620 armv8_cortex_a57/cycles/ (99.19%)
> > >> 752745657 armv8_cortex_a53/cycles/ (94.80%)
> > >>
> > >> Fixes: 55bcf6ef314a ("perf: Extend PERF_TYPE_HARDWARE and PERF_TYPE_HW_CACHE")
> > >> Suggested-by: Ian Rogers <[email protected]>
> > >> Signed-off-by: James Clark <[email protected]>
>
> Hi ARM Linux and ARM Linux PMU people,
>
> Could this patch be picked up for Linux 6.5? I don't see it in the
> tree and it seems a shame to have to wait for it. The other patches do
> cleanup and so waiting for 6.6 seems okay.

I'm only taking fixes for 6.5 and I don't think this qualifies.

If it was an oversight introduced during the recent merge window, then
I'd be happier fixing it up, but 55bcf6ef314a was merged ages ago (v5.12?),
so I think we can wait.

I'll be queuing perf changes for 6.6 next week, so I'll look at this
then.

Cheers,

Will

2023-07-24 14:23:02

by James Clark

[permalink] [raw]
Subject: Re: [PATCH 1/4] arm_pmu: Add PERF_PMU_CAP_EXTENDED_HW_TYPE capability



On 21/07/2023 11:21, Will Deacon wrote:
> On Thu, Jul 20, 2023 at 10:12:21AM -0700, Ian Rogers wrote:
>> On Tue, Jul 11, 2023 at 7:12 AM James Clark <[email protected]> wrote:
>>>
>>>
>>>
>>> On 11/07/2023 13:01, Anshuman Khandual wrote:
>>>>
>>>>
>>>> On 7/10/23 17:51, James Clark wrote:
>>>>> This capability gives us the ability to open PERF_TYPE_HARDWARE and
>>>>> PERF_TYPE_HW_CACHE events on a specific PMU for free. All the
>>>>> implementation is contained in the Perf core and tool code so no change
>>>>> to the Arm PMU driver is needed.
>>>>>
>>>>> The following basic use case now results in Perf opening the event on
>>>>> all PMUs rather than picking only one in an unpredictable way:
>>>>>
>>>>> $ perf stat -e cycles -- taskset --cpu-list 0,1 stress -c 2
>>>>>
>>>>> Performance counter stats for 'taskset --cpu-list 0,1 stress -c 2':
>>>>>
>>>>> 963279620 armv8_cortex_a57/cycles/ (99.19%)
>>>>> 752745657 armv8_cortex_a53/cycles/ (94.80%)
>>>>>
>>>>> Fixes: 55bcf6ef314a ("perf: Extend PERF_TYPE_HARDWARE and PERF_TYPE_HW_CACHE")
>>>>> Suggested-by: Ian Rogers <[email protected]>
>>>>> Signed-off-by: James Clark <[email protected]>
>>
>> Hi ARM Linux and ARM Linux PMU people,
>>
>> Could this patch be picked up for Linux 6.5? I don't see it in the
>> tree and it seems a shame to have to wait for it. The other patches do
>> cleanup and so waiting for 6.6 seems okay.
>
> I'm only taking fixes for 6.5 and I don't think this qualifies.
>
> If it was an oversight introduced during the recent merge window, then
> I'd be happier fixing it up, but 55bcf6ef314a was merged ages ago (v5.12?),
> so I think we can wait.
>
> I'll be queuing perf changes for 6.6 next week, so I'll look at this
> then.
>
> Cheers,
>
> Will

Hi Will,

Thanks for looking at this. I've sent a v2 with Anshuman's fixes.

James