2020-10-12 13:21:41

by zhuguangqing83

[permalink] [raw]
Subject: [PATCH] PM / EM: consult something about cpumask in em_dev_register_perf_domain

From: zhuguangqing <[email protected]>

Hi, Lukasz, Quentin
I have three questions to consult about cpumask in energy_model.c.

1, The first one is about the meanings of the following two parameters,
[1] and [2].
[1]: "cpumask_t *cpus" in function em_dev_register_perf_domain(): Pointer
to cpumask_t, which in case of a CPU device is obligatory. It can be taken
from i.e. 'policy->cpus'.
[2]: "unsigned long cpus[]" in struct em_perf_domain: Cpumask covering the
CPUs of the domain. It's here for performance reasons to avoid potential
cache misses during energy calculations in the scheduler and simplifies
allocating/freeing that memory region.

From the current code, we see [2] is copied from [1]. But from comments,
accorinding to my understanding, [2] and [1] have different meanings.
[1] can be taken from i.e. 'policy->cpus', according to the comment in the
defination of struct cpufreq_policy, it means Online CPUs only. Actually,
'policy->cpus' is not always Online CPUs.
[2] means each_possible_cpus in the same domain, including phycical
hotplug cpus(just possible), logically hotplug cpus(just present) and
online cpus.

So, the first question is, what are the meanings of [1] and [2]?
I guess maybe there are the following 4 possible choices.
A), for_each_possible_cpu in the same domain, maybe phycical hotplug
B), for_each_present_cpu in the same domain, maybe logically hotplug
C), for_each_online_cpu in the same domain, online
D), others

2, The second question is about the function em_dev_register_perf_domain().
If multiple clients register the same performance domain with different
*dev or *cpus, how should we handle?

int em_dev_register_perf_domain(struct device *dev, unsigned int nr_states,
struct em_data_callback *cb, cpumask_t *cpus)

For example, say cpu0 and cpu1 are in the same performance domain,
cpu0 is registered first. As part of the init process,
em_dev_register_perf_domain() is called, then *dev = cpu0_dev,
*cpus = 01b(cpu1 is initially offline). It creates a em_pd for cpu0_dev.
After a while, cpu1 is online, em_dev_register_perf_domain() is called
again as part of init process for cpu1, then *dev =cpu1_dev,
*cpus = 11b(cpu1 is online). In this case, for the current code,
cpu1_dev can not get its em_pd.

3, The third question is, how can we ensure cpu_dev as follows is not
NULL? If we can't ensure that, maybe we should add a check before using
it.
/kernel/power/energy_model.c
174) static int em_create_pd(struct device *dev, int nr_states,
175) struct em_data_callback *cb, cpumask_t *cpus)
176) {
199) if (_is_cpu_device(dev))
200) for_each_cpu(cpu, cpus) {
201) cpu_dev = get_cpu_device(cpu);
202) cpu_dev->em_pd = pd;
203) }

Signed-off-by: zhuguangqing <[email protected]>
---
kernel/power/energy_model.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/kernel/power/energy_model.c b/kernel/power/energy_model.c
index c1ff7fa030ab..addf2f400184 100644
--- a/kernel/power/energy_model.c
+++ b/kernel/power/energy_model.c
@@ -199,7 +199,13 @@ static int em_create_pd(struct device *dev, int nr_states,
if (_is_cpu_device(dev))
for_each_cpu(cpu, cpus) {
cpu_dev = get_cpu_device(cpu);
- cpu_dev->em_pd = pd;
+ if (cpu_dev)
+ cpu_dev->em_pd = pd;
+ else {
+ cpumask_clear_cpu(cpu, em_span_cpus(pd));
+ dev_dbg(dev, "EM: failed to get cpu%d device\n",
+ cpu);
+ }
}

dev->em_pd = pd;
--
2.17.1


2020-10-12 13:36:24

by Quentin Perret

[permalink] [raw]
Subject: Re: [PATCH] PM / EM: consult something about cpumask in em_dev_register_perf_domain

On Monday 12 Oct 2020 at 14:05:01 (+0100), Quentin Perret wrote:
> > 3, The third question is, how can we ensure cpu_dev as follows is not
> > NULL? If we can't ensure that, maybe we should add a check before using
> > it.
> > /kernel/power/energy_model.c
> > 174) static int em_create_pd(struct device *dev, int nr_states,
> > 175) struct em_data_callback *cb, cpumask_t *cpus)
> > 176) {
> > 199) if (_is_cpu_device(dev))
> > 200) for_each_cpu(cpu, cpus) {
> > 201) cpu_dev = get_cpu_device(cpu);
> > 202) cpu_dev->em_pd = pd;
> > 203) }
>
> And that should not be necessary as we check for the !dev case at the
> top of em_dev_register_perf_domain(). Or were you thinking about
> something else?

Oh I think I read that one wrong, but the conclusion should be the same,
at least on Arm64 -- all _possible_ CPUs should be registered early
enough for that not to be an issue.

Did you observe anything wrong there for your use-case?

Thanks,
Quentin

2020-10-12 13:36:40

by Quentin Perret

[permalink] [raw]
Subject: Re: [PATCH] PM / EM: consult something about cpumask in em_dev_register_perf_domain

Hi,

On Monday 12 Oct 2020 at 20:41:36 (+0800), [email protected] wrote:
> From: zhuguangqing <[email protected]>
>
> Hi, Lukasz, Quentin
> I have three questions to consult about cpumask in energy_model.c.

OK, let's see if we can help :)

> 1, The first one is about the meanings of the following two parameters,
> [1] and [2].
> [1]: "cpumask_t *cpus" in function em_dev_register_perf_domain(): Pointer
> to cpumask_t, which in case of a CPU device is obligatory. It can be taken
> from i.e. 'policy->cpus'.
> [2]: "unsigned long cpus[]" in struct em_perf_domain: Cpumask covering the
> CPUs of the domain. It's here for performance reasons to avoid potential
> cache misses during energy calculations in the scheduler and simplifies
> allocating/freeing that memory region.
>
> From the current code, we see [2] is copied from [1]. But from comments,
> accorinding to my understanding, [2] and [1] have different meanings.
> [1] can be taken from i.e. 'policy->cpus', according to the comment in the
> defination of struct cpufreq_policy, it means Online CPUs only. Actually,
> 'policy->cpus' is not always Online CPUs.
> [2] means each_possible_cpus in the same domain, including phycical
> hotplug cpus(just possible), logically hotplug cpus(just present) and
> online cpus.
>
>
> So, the first question is, what are the meanings of [1] and [2]?
> I guess maybe there are the following 4 possible choices.
> A), for_each_possible_cpu in the same domain, maybe phycical hotplug
> B), for_each_present_cpu in the same domain, maybe logically hotplug
> C), for_each_online_cpu in the same domain, online
> D), others

So, if the comments are confusing we should update them, but from the EM
framework perspective, all cpumasks must be the _possible_ CPUs in the
domain. It's up to the clients (e.g. the scheduler) to deal with hotplug
and so on, but the EM framework should cover non-online CPUs too.

> 2, The second question is about the function em_dev_register_perf_domain().
> If multiple clients register the same performance domain with different
> *dev or *cpus, how should we handle?
>
> int em_dev_register_perf_domain(struct device *dev, unsigned int nr_states,
> struct em_data_callback *cb, cpumask_t *cpus)
>
> For example, say cpu0 and cpu1 are in the same performance domain,
> cpu0 is registered first. As part of the init process,
> em_dev_register_perf_domain() is called, then *dev = cpu0_dev,
> *cpus = 01b(cpu1 is initially offline). It creates a em_pd for cpu0_dev.
> After a while, cpu1 is online, em_dev_register_perf_domain() is called
> again as part of init process for cpu1, then *dev =cpu1_dev,
> *cpus = 11b(cpu1 is online). In this case, for the current code,
> cpu1_dev can not get its em_pd.

As per the above, the registration should be done once, with the mask of
all possible CPUs in the domain. If CPUs 0 and 1 share the same domain, a
single call to em_dev_register_perf_domain() should be sufficient to
register both of them at once.

> 3, The third question is, how can we ensure cpu_dev as follows is not
> NULL? If we can't ensure that, maybe we should add a check before using
> it.
> /kernel/power/energy_model.c
> 174) static int em_create_pd(struct device *dev, int nr_states,
> 175) struct em_data_callback *cb, cpumask_t *cpus)
> 176) {
> 199) if (_is_cpu_device(dev))
> 200) for_each_cpu(cpu, cpus) {
> 201) cpu_dev = get_cpu_device(cpu);
> 202) cpu_dev->em_pd = pd;
> 203) }

And that should not be necessary as we check for the !dev case at the
top of em_dev_register_perf_domain(). Or were you thinking about
something else?

Thanks,
Quentin

2020-10-13 11:27:05

by zhuguangqing83

[permalink] [raw]
Subject: Re: [PATCH] PM / EM: consult something about cpumask in em_dev_register_perf_domain


>
> Hi,
>
> On Monday 12 Oct 2020 at 20:41:36 (+0800), [email protected] wrote:
> > From: zhuguangqing <[email protected]>
> >
> > Hi, Lukasz, Quentin
> > I have three questions to consult about cpumask in energy_model.c.
>
> OK, let's see if we can help :)
>

Thanks for your help and review.

> > 1, The first one is about the meanings of the following two parameters,
> > [1] and [2].
> > [1]: "cpumask_t *cpus" in function em_dev_register_perf_domain():
Pointer
> > to cpumask_t, which in case of a CPU device is obligatory. It can be
taken
> > from i.e. 'policy->cpus'.
> > [2]: "unsigned long cpus[]" in struct em_perf_domain: Cpumask covering
the
> > CPUs of the domain. It's here for performance reasons to avoid potential
> > cache misses during energy calculations in the scheduler and simplifies
> > allocating/freeing that memory region.
> >
> > From the current code, we see [2] is copied from [1]. But from comments,
> > accorinding to my understanding, [2] and [1] have different meanings.
> > [1] can be taken from i.e. 'policy->cpus', according to the comment in
the
> > defination of struct cpufreq_policy, it means Online CPUs only.
Actually,
> > 'policy->cpus' is not always Online CPUs.
> > [2] means each_possible_cpus in the same domain, including phycical
> > hotplug cpus(just possible), logically hotplug cpus(just present) and
> > online cpus.
> >
> >
> > So, the first question is, what are the meanings of [1] and [2]?
> > I guess maybe there are the following 4 possible choices.
> > A), for_each_possible_cpu in the same domain, maybe phycical hotplug
> > B), for_each_present_cpu in the same domain, maybe logically hotplug
> > C), for_each_online_cpu in the same domain, online
> > D), others
>
> So, if the comments are confusing we should update them, but from the EM
> framework perspective, all cpumasks must be the _possible_ CPUs in the
> domain. It's up to the clients (e.g. the scheduler) to deal with hotplug
> and so on, but the EM framework should cover non-online CPUs too.
>

I see, from the EM framework perspective, all cpumasks must be the
_possible_ CPUs in the domain. But up to the clients (e.g. the scheduler),
'policy->cpus' maybe not the _possible_ CPUs. For example, in the function
scmi_cpufreq_init() which calls em_dev_register_perf_domain(),
'policy->cpus'
is got from scmi_get_sharing_cpus() and cpufreq_online(), including CPUs
satisfied the following three conditions at the same time which means
_present_ CPUs according to my understanding.
1) _possible_
2) if (get_cpu_device(cpu))
3) in the same domain

> > 2, The second question is about the function
em_dev_register_perf_domain().
> > If multiple clients register the same performance domain with different
> > *dev or *cpus, how should we handle?
> >
> > int em_dev_register_perf_domain(struct device *dev, unsigned int
nr_states,
> > struct em_data_callback *cb, cpumask_t *cpus)
> >
> > For example, say cpu0 and cpu1 are in the same performance domain,
> > cpu0 is registered first. As part of the init process,
> > em_dev_register_perf_domain() is called, then *dev = cpu0_dev,
> > *cpus = 01b(cpu1 is initially offline). It creates a em_pd for cpu0_dev.
> > After a while, cpu1 is online, em_dev_register_perf_domain() is called
> > again as part of init process for cpu1, then *dev =cpu1_dev,
> > *cpus = 11b(cpu1 is online). In this case, for the current code,
> > cpu1_dev can not get its em_pd.
>
> As per the above, the registration should be done once, with the mask of
> all possible CPUs in the domain. If CPUs 0 and 1 share the same domain, a
> single call to em_dev_register_perf_domain() should be sufficient to
> register both of them at once.
>

I just saw your discussion in https://lkml.org/lkml/2020/2/7/479, it
mentioned
"PM_EM ignores hotplug ATM. Perhaps we should document that somewhere ..."

So, if PM_EM still ignores hotplug now, then the registration should be done
once,
with the mask of all possible CPUs in the domain. If PM_EM consider hotplug
in
the future, then we should consider the case that
em_dev_register_perf_domain()
will be called more than once with different input parameters *dev or/and
*cpus.
And the CPU mask might not be all possible CPUs in the domain.

> > 3, The third question is, how can we ensure cpu_dev as follows is not
> > NULL? If we can't ensure that, maybe we should add a check before using
> > it.
> > /kernel/power/energy_model.c
> > 174) static int em_create_pd(struct device *dev, int nr_states,
> > 175) struct em_data_callback *cb, cpumask_t *cpus)
> > 176) {
> > 199) if (_is_cpu_device(dev))
> > 200) for_each_cpu(cpu, cpus) {
> > 201) cpu_dev = get_cpu_device(cpu);
> > 202) cpu_dev->em_pd = pd;
> > 203) }
>
> And that should not be necessary as we check for the !dev case at the
> top of em_dev_register_perf_domain(). Or were you thinking about
> something else?
>
> Oh I think I read that one wrong, but the conclusion should be the same,
> at least on Arm64 -- all _possible_ CPUs should be registered early
> enough for that not to be an issue.
>
>Did you observe anything wrong there for your use-case?
>

I did not observe anything wrong for my use-case. But I think it's possible
in
theory that cpu_dev maybe NULL. I observe that in the function
scmi_cpufreq_init(), before calling em_dev_register_perf_domain(),
'policy->cpus' can be ensure that all the cpu_dev in CPU mask are not NULL.
But maybe we can not ensure all the clients do the check. This could happen
if the arch did not set up cpu_dev since this CPU is not in cpu_present mask
and the driver did not send a correct CPU mask during registration.

> Thanks,
> Quentin

2020-10-13 23:58:28

by Quentin Perret

[permalink] [raw]
Subject: Re: [PATCH] PM / EM: consult something about cpumask in em_dev_register_perf_domain

On Tuesday 13 Oct 2020 at 11:40:31 (+0800), zhuguangqing83 wrote:
> I did not observe anything wrong for my use-case. But I think it's possible
> in
> theory that cpu_dev maybe NULL. I observe that in the function
> scmi_cpufreq_init(), before calling em_dev_register_perf_domain(),
> 'policy->cpus' can be ensure that all the cpu_dev in CPU mask are not NULL.
> But maybe we can not ensure all the clients do the check. This could happen
> if the arch did not set up cpu_dev since this CPU is not in cpu_present mask
> and the driver did not send a correct CPU mask during registration.

Admittedly this has only been tested on Arm64 where I think we can
safely assume that all possible CPUs have been registered at once -- see
topology_init().

And for allowing to register CPUs late in a perf domain, I'm not opposed
to it in principle but that has deep implications as the existing EM
users (e.g. EAS) currently hard rely on it being static after
registration. If you have a real need for it and a patch that adds the
feature and fixes all the users I'll be happy to look at it :)

Thanks,
Quentin