[permalink] [raw]

Subject: Re: [PATCH 6/6] perf/x86/rapl: Add per-core energy counter support for AMD CPUs

Hi Rui,

On 6/11/2024 2:00 PM, Zhang, Rui wrote:
>> @@ -345,9 +353,14 @@ static int rapl_pmu_event_init(struct perf_event
>> *event)
>>         u64 cfg = event->attr.config & RAPL_EVENT_MASK;
>>         int bit, ret = 0;
>>         struct rapl_pmu *rapl_pmu;
>> +       struct rapl_pmus *curr_rapl_pmus;
>>
>>         /* only look at RAPL events */
>> -       if (event->attr.type != rapl_pmus->pmu.type)
>> +       if (event->attr.type == rapl_pmus->pmu.type)
>> +               curr_rapl_pmus = rapl_pmus;
>> +       else if (rapl_pmus_per_core && event->attr.type ==
>> rapl_pmus_per_core->pmu.type)
>> +               curr_rapl_pmus = rapl_pmus_per_core;
>> +       else
>>                 return -ENOENT;
>
> can we use container_of(event->pmu, struct rapl_pmus, pmu)?

Yes! that would be cleaner, will add it in next version.

>
>>
>>         /* check only supported bits are set */
>> @@ -374,9 +387,14 @@ static int rapl_pmu_event_init(struct perf_event
>> *event)
>>                 return -EINVAL;
>>
>>         /* must be done before validate_group */
>> -       rapl_pmu = cpu_to_rapl_pmu(event->cpu);
>> +       if (curr_rapl_pmus == rapl_pmus_per_core)
>> +               rapl_pmu = curr_rapl_pmus-
>>> rapl_pmu[topology_core_id(event->cpu)];
>> +       else
>> +               rapl_pmu = curr_rapl_pmus-
>>> rapl_pmu[get_rapl_pmu_idx(event->cpu)];
>> +
>>         if (!rapl_pmu)
>>                 return -EINVAL;
>
> Current code has PERF_EV_CAP_READ_ACTIVE_PKG flag set.
> Can you help me understand why it does not affect the new per-core pmu?

Good question, I went back and looked thru the code, it turns out that we
are not going thru the code path that checks this flag and decides whether
to run on the local cpu(cpu on which perf is running) or the event->cpu.

So, having or not having this flag doesnt make a difference here, I did a
small experiment for this.

On a single package system, any core should be able to read the energy-pkg
RAPL MSR and return the value, so there would be no need for a smp call to
the event->cpu, but if we look thru the ftrace below we can see that only
core 0 executes the pmu event even though we launched the perf stat for
core 1.

--------------------------------------------------------------------------

root@shatadru:/sys/kernel/tracing# perf stat -C 1 -e power/energy-pkg/ -- dd if=/dev/zero of=/dev/null bs=1M count=100000
100000+0 records in
100000+0 records out
104857600000 bytes (105 GB, 98 GiB) copied, 2.03295 s, 51.6 GB/s

Performance counter stats for 'CPU(s) 1':

231.59 Joules power/energy-pkg/

2.033916467 seconds time elapsed

root@shatadru:/sys/kernel/tracing# echo 0 > tracing_on
root@shatadru:/sys/kernel/tracing# cat trace
# tracer: function
#
# entries-in-buffer/entries-written: 12/12 #P:192
#
# _-----=> irqs-off/BH-disabled
# / _----=> need-resched
# | / _---=> hardirq/softirq
# || / _--=> preempt-depth
# ||| / _-=> migrate-disable
# |||| / delay
# TASK-PID CPU# ||||| TIMESTAMP FUNCTION
# | | | ||||| | |
perf-3309 [096] ...1. 3422.558183: rapl_get_attr_cpumask <-dev_attr_show
perf-3309 [001] ...1. 3422.559436: rapl_pmu_event_init <-perf_try_init_event
perf-3309 [001] ...1. 3422.559441: rapl_pmu_event_init <-perf_try_init_event
perf-3309 [001] ...1. 3422.559449: rapl_pmu_event_init <-perf_try_init_event
perf-3309 [001] ...1. 3422.559537: smp_call_function_single <-event_function_call <-- smp call to the event owner cpu(i.e. CPU0)
<idle>-0 [000] d.h3. 3422.559544: rapl_pmu_event_add <-event_sched_in <-- CPU# column changed to 0
<idle>-0 [000] d.h4. 3422.559545: __rapl_pmu_event_start <-rapl_pmu_event_add
perf-3309 [001] ...1. 3424.593398: smp_call_function_single <-event_function_call <-- smp call to the event owner cpu(i.e. CPU0)
<idle>-0 [000] d.h3. 3424.593403: rapl_pmu_event_del <-event_sched_out <-- CPU# column changed to 0
<idle>-0 [000] d.h3. 3424.593403: rapl_pmu_event_stop <-rapl_pmu_event_del
<idle>-0 [000] d.h4. 3424.593404: rapl_event_update.isra.0 <-rapl_pmu_event_stop
perf-3309 [001] ...1. 3424.593514: smp_call_function_single <-event_function_call

--------------------------------------------------------------------------

So, as we always use the event->cpu to run the event, the per-core PMU
is not being affected by this flag.

Anyway in next version, I will only selectively enable this flag for
package scope events. But we will need to look into fixing this
ineffective flag.

>
>> +
>>         event->cpu = rapl_pmu->cpu;
>>         event->pmu_private = rapl_pmu;
>>         event->hw.event_base = rapl_msrs[bit].msr;
>> @@ -408,17 +426,38 @@ static struct attribute_group
>> rapl_pmu_attr_group = {
>>         .attrs = rapl_pmu_attrs,
>> };
>>
>> +static ssize_t rapl_get_attr_per_core_cpumask(struct device *dev,
>> +                                            struct device_attribute
>> *attr, char *buf)
>> +{
>> +       return cpumap_print_to_pagebuf(true, buf,
>> &rapl_pmus_per_core->cpumask);
>> +}
>> +
>> +static struct device_attribute dev_attr_per_core_cpumask =
>> __ATTR(cpumask, 0444,
>> +
>> rapl_get_attr_per_core_cpumask,
>> +
>> NULL);
>
> DEVICE_ATTR

I was not able to use DEVICE_ATTR, because there is already a "device_attribute dev_attr_cpumask_name"
created for package PMU cpumask using DEVICE_ATTR().
So I had to create a "device_attribute dev_attr_per_core_cpumask" manually
to avoid variable name clash.

>
>> +
>> +static struct attribute *rapl_pmu_per_core_attrs[] = {
>> +       &dev_attr_per_core_cpumask.attr,
>> +       NULL,
>> +};
>> +
>> +static struct attribute_group rapl_pmu_per_core_attr_group = {
>> +       .attrs = rapl_pmu_per_core_attrs,
>> +};
>> +
>> RAPL_EVENT_ATTR_STR(energy-cores, rapl_cores, "event=0x01");
>> RAPL_EVENT_ATTR_STR(energy-pkg ,   rapl_pkg, "event=0x02");
>> RAPL_EVENT_ATTR_STR(energy-ram ,   rapl_ram, "event=0x03");
>> RAPL_EVENT_ATTR_STR(energy-gpu ,   rapl_gpu, "event=0x04");
>> RAPL_EVENT_ATTR_STR(energy-psys,   rapl_psys, "event=0x05");
>> +RAPL_EVENT_ATTR_STR(energy-per-core,   rapl_per_core, "event=0x06");
>
> energy-per-core is for a separate pmu, so the event id does not need to
> be 6. The same applies to PERF_RAPL_PERCORE.

Correct, will fix in next version.

>
>>
>> static struct rapl_model model_amd_hygon = {
>> -       .events         = BIT(PERF_RAPL_PKG),
>> +       .events         = BIT(PERF_RAPL_PKG) |
>> +                         BIT(PERF_RAPL_PERCORE),
>>         .msr_power_unit = MSR_AMD_RAPL_POWER_UNIT,
>>         .rapl_msrs      = amd_rapl_msrs,
>> +       .per_core = true,
>> };
>
> can we use bit PERF_RAPL_PERCORE to check per_core pmu suppot?

Makes sense, will modify.

>
> Just FYI, arch/x86/events/intel/cstate.c handles package/module/core
> scope cstate pmus. It uses a different approach in the probing part,
> which IMO is clearer.

Yes, I went thru it, I see that separate variables are being used to
mark the valid events for package and core scope and a wrapper fn around
perf_msr_probe is created, will see if that will make sense here as well.

Thanks for the review,
Dhananjay

>
> thanks,
> rui
>