2019-09-23 17:17:09

by Yunfeng Ye

[permalink] [raw]
Subject: [PATCH V2] arm64: psci: Reduce waiting time of cpu_psci_cpu_kill()

If psci_ops.affinity_info() fails, it will sleep 10ms, which will not
take so long in the right case. Use usleep_range() instead of msleep(),
reduce the waiting time, and give a chance to busy wait before sleep.

Signed-off-by: Yunfeng Ye <[email protected]>
---
V1->V2:
- use usleep_range() instead of udelay() after waiting for a while

arch/arm64/kernel/psci.c | 17 +++++++++++++----
1 file changed, 13 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/kernel/psci.c b/arch/arm64/kernel/psci.c
index c9f72b2..99b3122 100644
--- a/arch/arm64/kernel/psci.c
+++ b/arch/arm64/kernel/psci.c
@@ -82,6 +82,7 @@ static void cpu_psci_cpu_die(unsigned int cpu)
static int cpu_psci_cpu_kill(unsigned int cpu)
{
int err, i;
+ unsigned long timeout;

if (!psci_ops.affinity_info)
return 0;
@@ -91,16 +92,24 @@ static int cpu_psci_cpu_kill(unsigned int cpu)
* while it is dying. So, try again a few times.
*/

- for (i = 0; i < 10; i++) {
+ i = 0;
+ timeout = jiffies + msecs_to_jiffies(100);
+ do {
err = psci_ops.affinity_info(cpu_logical_map(cpu), 0);
if (err == PSCI_0_2_AFFINITY_LEVEL_OFF) {
pr_info("CPU%d killed.\n", cpu);
return 0;
}

- msleep(10);
- pr_info("Retrying again to check for CPU kill\n");
- }
+ /* busy-wait max 1ms */
+ if (i++ < 100) {
+ cond_resched();
+ udelay(10);
+ continue;
+ }
+
+ usleep_range(100, 1000);
+ } while (time_before(jiffies, timeout));

pr_warn("CPU%d may not have shut down cleanly (AFFINITY_INFO reports %d)\n",
cpu, err);
--
2.7.4.huawei.3


2019-10-09 04:46:33

by Yunfeng Ye

[permalink] [raw]
Subject: [PATCH V2] arm64: psci: Reduce waiting time of cpu_psci_cpu_kill()

If psci_ops.affinity_info() fails, it will sleep 10ms, which will not
take so long in the right case. Use usleep_range() instead of msleep(),
reduce the waiting time, and give a chance to busy wait before sleep.

Signed-off-by: Yunfeng Ye <[email protected]>
---
V1->V2:
- use usleep_range() instead of udelay() after waiting for a while

arch/arm64/kernel/psci.c | 17 +++++++++++++----
1 file changed, 13 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/kernel/psci.c b/arch/arm64/kernel/psci.c
index c9f72b2..99b3122 100644
--- a/arch/arm64/kernel/psci.c
+++ b/arch/arm64/kernel/psci.c
@@ -82,6 +82,7 @@ static void cpu_psci_cpu_die(unsigned int cpu)
static int cpu_psci_cpu_kill(unsigned int cpu)
{
int err, i;
+ unsigned long timeout;

if (!psci_ops.affinity_info)
return 0;
@@ -91,16 +92,24 @@ static int cpu_psci_cpu_kill(unsigned int cpu)
* while it is dying. So, try again a few times.
*/

- for (i = 0; i < 10; i++) {
+ i = 0;
+ timeout = jiffies + msecs_to_jiffies(100);
+ do {
err = psci_ops.affinity_info(cpu_logical_map(cpu), 0);
if (err == PSCI_0_2_AFFINITY_LEVEL_OFF) {
pr_info("CPU%d killed.\n", cpu);
return 0;
}

- msleep(10);
- pr_info("Retrying again to check for CPU kill\n");
- }
+ /* busy-wait max 1ms */
+ if (i++ < 100) {
+ cond_resched();
+ udelay(10);
+ continue;
+ }
+
+ usleep_range(100, 1000);
+ } while (time_before(jiffies, timeout));

pr_warn("CPU%d may not have shut down cleanly (AFFINITY_INFO reports %d)\n",
cpu, err);
--
2.7.4.3

2019-10-15 20:16:23

by Will Deacon

[permalink] [raw]
Subject: Re: [PATCH V2] arm64: psci: Reduce waiting time of cpu_psci_cpu_kill()

Hi,

On Sat, Sep 21, 2019 at 07:21:17PM +0800, Yunfeng Ye wrote:
> If psci_ops.affinity_info() fails, it will sleep 10ms, which will not
> take so long in the right case. Use usleep_range() instead of msleep(),
> reduce the waiting time, and give a chance to busy wait before sleep.

Can you elaborate on "the right case" please? It's not clear to me
exactly what problem you're solving here.

I've also added Sudeep to the thread, since I'd like his ack on the change.

Will

> arch/arm64/kernel/psci.c | 17 +++++++++++++----
> 1 file changed, 13 insertions(+), 4 deletions(-)
>
> diff --git a/arch/arm64/kernel/psci.c b/arch/arm64/kernel/psci.c
> index c9f72b2..99b3122 100644
> --- a/arch/arm64/kernel/psci.c
> +++ b/arch/arm64/kernel/psci.c
> @@ -82,6 +82,7 @@ static void cpu_psci_cpu_die(unsigned int cpu)
> static int cpu_psci_cpu_kill(unsigned int cpu)
> {
> int err, i;
> + unsigned long timeout;
>
> if (!psci_ops.affinity_info)
> return 0;
> @@ -91,16 +92,24 @@ static int cpu_psci_cpu_kill(unsigned int cpu)
> * while it is dying. So, try again a few times.
> */
>
> - for (i = 0; i < 10; i++) {
> + i = 0;
> + timeout = jiffies + msecs_to_jiffies(100);
> + do {
> err = psci_ops.affinity_info(cpu_logical_map(cpu), 0);
> if (err == PSCI_0_2_AFFINITY_LEVEL_OFF) {
> pr_info("CPU%d killed.\n", cpu);
> return 0;
> }
>
> - msleep(10);
> - pr_info("Retrying again to check for CPU kill\n");
> - }
> + /* busy-wait max 1ms */
> + if (i++ < 100) {
> + cond_resched();
> + udelay(10);
> + continue;
> + }
> +
> + usleep_range(100, 1000);
> + } while (time_before(jiffies, timeout));
>
> pr_warn("CPU%d may not have shut down cleanly (AFFINITY_INFO reports %d)\n",
> cpu, err);
> --
> 2.7.4.huawei.3
>

2019-10-16 11:11:13

by Yunfeng Ye

[permalink] [raw]
Subject: Re: [PATCH V2] arm64: psci: Reduce waiting time of cpu_psci_cpu_kill()



On 2019/10/16 0:23, Will Deacon wrote:
> Hi,
>
> On Sat, Sep 21, 2019 at 07:21:17PM +0800, Yunfeng Ye wrote:
>> If psci_ops.affinity_info() fails, it will sleep 10ms, which will not
>> take so long in the right case. Use usleep_range() instead of msleep(),
>> reduce the waiting time, and give a chance to busy wait before sleep.
>
> Can you elaborate on "the right case" please? It's not clear to me
> exactly what problem you're solving here.
>
The situation is that when the power is off, we have a battery to save some
information, but the battery power is limited, so we reduce the power consumption
by turning off the cores, and need fastly to complete the core shutdown. However, the
time of cpu_psci_cpu_kill() will take 10ms. We have tested the time that it does not
need 10ms, and most case is about 50us-500us. if we reduce the time of cpu_psci_cpu_kill(),
we can reduce 10% - 30% of the total time.

So change msleep (10) to usleep_range() to reduce the waiting time. In addition,
we don't want to be scheduled during the sleeping time, some threads may take a
long time and don't give up the CPU, which affects the time of core shutdown,
Therefore, we add a chance to busy-wait max 1ms.

thanks.

> I've also added Sudeep to the thread, since I'd like his ack on the change.
>
> Will
>
>> arch/arm64/kernel/psci.c | 17 +++++++++++++----
>> 1 file changed, 13 insertions(+), 4 deletions(-)
>>
>> diff --git a/arch/arm64/kernel/psci.c b/arch/arm64/kernel/psci.c
>> index c9f72b2..99b3122 100644
>> --- a/arch/arm64/kernel/psci.c
>> +++ b/arch/arm64/kernel/psci.c
>> @@ -82,6 +82,7 @@ static void cpu_psci_cpu_die(unsigned int cpu)
>> static int cpu_psci_cpu_kill(unsigned int cpu)
>> {
>> int err, i;
>> + unsigned long timeout;
>>
>> if (!psci_ops.affinity_info)
>> return 0;
>> @@ -91,16 +92,24 @@ static int cpu_psci_cpu_kill(unsigned int cpu)
>> * while it is dying. So, try again a few times.
>> */
>>
>> - for (i = 0; i < 10; i++) {
>> + i = 0;
>> + timeout = jiffies + msecs_to_jiffies(100);
>> + do {
>> err = psci_ops.affinity_info(cpu_logical_map(cpu), 0);
>> if (err == PSCI_0_2_AFFINITY_LEVEL_OFF) {
>> pr_info("CPU%d killed.\n", cpu);
>> return 0;
>> }
>>
>> - msleep(10);
>> - pr_info("Retrying again to check for CPU kill\n");
>> - }
>> + /* busy-wait max 1ms */
>> + if (i++ < 100) {
>> + cond_resched();
>> + udelay(10);
>> + continue;
>> + }
>> +
>> + usleep_range(100, 1000);
>> + } while (time_before(jiffies, timeout));
>>
>> pr_warn("CPU%d may not have shut down cleanly (AFFINITY_INFO reports %d)\n",
>> cpu, err);
>> --
>> 2.7.4.huawei.3
>>
>
> .
>

2019-10-16 14:22:59

by Sudeep Holla

[permalink] [raw]
Subject: Re: [PATCH V2] arm64: psci: Reduce waiting time of cpu_psci_cpu_kill()

On Wed, Oct 16, 2019 at 11:22:23AM +0800, Yunfeng Ye wrote:
>
>
> On 2019/10/16 0:23, Will Deacon wrote:
> > Hi,
> >
> > On Sat, Sep 21, 2019 at 07:21:17PM +0800, Yunfeng Ye wrote:
> >> If psci_ops.affinity_info() fails, it will sleep 10ms, which will not
> >> take so long in the right case. Use usleep_range() instead of msleep(),
> >> reduce the waiting time, and give a chance to busy wait before sleep.
> >
> > Can you elaborate on "the right case" please? It's not clear to me
> > exactly what problem you're solving here.
> >
> The situation is that when the power is off, we have a battery to save some
> information, but the battery power is limited, so we reduce the power consumption
> by turning off the cores, and need fastly to complete the core shutdown. However, the
> time of cpu_psci_cpu_kill() will take 10ms. We have tested the time that it does not
> need 10ms, and most case is about 50us-500us. if we reduce the time of cpu_psci_cpu_kill(),
> we can reduce 10% - 30% of the total time.
>

Have you checked why PSCI AFFINITY_INFO not returning LEVEL_OFF quickly
then ? We wait for upto 5s in cpu_wait_death(worst case) before cpu_kill
is called from __cpu_die.

Moreover I don't understand the argument here. The cpu being killed
will be OFF, as soon as it can and firmware controls that and this
change is not related to CPU_OFF. And this CPU calling cpu_kill can
sleep and 10ms is good to enter idle states if it's idle saving power,
so I fail to map the power saving you mention above.

> So change msleep (10) to usleep_range() to reduce the waiting time. In addition,
> we don't want to be scheduled during the sleeping time, some threads may take a
> long time and don't give up the CPU, which affects the time of core shutdown,
> Therefore, we add a chance to busy-wait max 1ms.
>

On the other hand, usleep_range reduces the timer interval and hence
increases the chance of the callee CPU not to enter deeper idle states.

What am I missing here ? What's the use case or power off situation
you are talking about above ?

>
> > I've also added Sudeep to the thread, since I'd like his ack on the change.
> >

Thanks Will.

--
Regards,
Sudeep

2019-10-16 15:02:29

by Yunfeng Ye

[permalink] [raw]
Subject: Re: [PATCH V2] arm64: psci: Reduce waiting time of cpu_psci_cpu_kill()



On 2019/10/16 18:25, Sudeep Holla wrote:
> On Wed, Oct 16, 2019 at 11:22:23AM +0800, Yunfeng Ye wrote:
>>
>>
>> On 2019/10/16 0:23, Will Deacon wrote:
>>> Hi,
>>>
>>> On Sat, Sep 21, 2019 at 07:21:17PM +0800, Yunfeng Ye wrote:
>>>> If psci_ops.affinity_info() fails, it will sleep 10ms, which will not
>>>> take so long in the right case. Use usleep_range() instead of msleep(),
>>>> reduce the waiting time, and give a chance to busy wait before sleep.
>>>
>>> Can you elaborate on "the right case" please? It's not clear to me
>>> exactly what problem you're solving here.
>>>
>> The situation is that when the power is off, we have a battery to save some
>> information, but the battery power is limited, so we reduce the power consumption
>> by turning off the cores, and need fastly to complete the core shutdown. However, the
>> time of cpu_psci_cpu_kill() will take 10ms. We have tested the time that it does not
>> need 10ms, and most case is about 50us-500us. if we reduce the time of cpu_psci_cpu_kill(),
>> we can reduce 10% - 30% of the total time.
>>
>
> Have you checked why PSCI AFFINITY_INFO not returning LEVEL_OFF quickly
> then ? We wait for upto 5s in cpu_wait_death(worst case) before cpu_kill
> is called from __cpu_die.
>
When cpu_wait_death() is done, it means that the cpu core's hardware prepare to
die. I think not returning LEVEL_OFF quickly is that hardware need time to handle.
I don't know how much time it need is reasonable, but I test that it need about
50us - 500us.

In addition I have not meat the worst case that cpu_wait_death() need upto 5s, and
we only take normal case into account.

thanks.

> Moreover I don't understand the argument here. The cpu being killed
> will be OFF, as soon as it can and firmware controls that and this
> change is not related to CPU_OFF. And this CPU calling cpu_kill can
> sleep and 10ms is good to enter idle states if it's idle saving power,
> so I fail to map the power saving you mention above.
>
We have hundreds of CPU cores that need to be shut down. For example,
a CPU has 200 cores, and the thread to shut down the core is in CPU 0.
and the thread need to shut down from core 1 to core 200. However, the
implementation of the kernel can only shut down cpu cores one by one, so we
need to wait for cpu_kill() to finish before shutting down the next
CPU core. If it wait for 10ms each time in cpu_kill, it will takes up
about 2 seconds in cpu_kill() total.

It is not to save power through msleep to idle state, but to quickly
turn off other CPU core's hardware to reduce power consumption.

thanks.

>> So change msleep (10) to usleep_range() to reduce the waiting time. In addition,
>> we don't want to be scheduled during the sleeping time, some threads may take a
>> long time and don't give up the CPU, which affects the time of core shutdown,
>> Therefore, we add a chance to busy-wait max 1ms.
>>
>
> On the other hand, usleep_range reduces the timer interval and hence
> increases the chance of the callee CPU not to enter deeper idle states.
>
> What am I missing here ? What's the use case or power off situation
> you are talking about above ?
>
As mentioned above, we are not to save power through msleep to idle state, but to quickly
turn off other CPU core's hardware to reduce power consumption.

>>
>>> I've also added Sudeep to the thread, since I'd like his ack on the change.
>>>
>
> Thanks Will.
>
> --
> Regards,
> Sudeep
>
> .
>

2019-10-16 16:51:26

by Sudeep Holla

[permalink] [raw]
Subject: Re: [PATCH V2] arm64: psci: Reduce waiting time of cpu_psci_cpu_kill()

On Wed, Oct 16, 2019 at 07:29:59PM +0800, Yunfeng Ye wrote:
>
>
> On 2019/10/16 18:25, Sudeep Holla wrote:
> > On Wed, Oct 16, 2019 at 11:22:23AM +0800, Yunfeng Ye wrote:
> >>
> >>
> >> On 2019/10/16 0:23, Will Deacon wrote:
> >>> Hi,
> >>>
> >>> On Sat, Sep 21, 2019 at 07:21:17PM +0800, Yunfeng Ye wrote:
> >>>> If psci_ops.affinity_info() fails, it will sleep 10ms, which will not
> >>>> take so long in the right case. Use usleep_range() instead of msleep(),
> >>>> reduce the waiting time, and give a chance to busy wait before sleep.
> >>>
> >>> Can you elaborate on "the right case" please? It's not clear to me
> >>> exactly what problem you're solving here.
> >>>
> >> The situation is that when the power is off, we have a battery to save some
> >> information, but the battery power is limited, so we reduce the power consumption
> >> by turning off the cores, and need fastly to complete the core shutdown. However, the
> >> time of cpu_psci_cpu_kill() will take 10ms. We have tested the time that it does not
> >> need 10ms, and most case is about 50us-500us. if we reduce the time of cpu_psci_cpu_kill(),
> >> we can reduce 10% - 30% of the total time.
> >>
> >
> > Have you checked why PSCI AFFINITY_INFO not returning LEVEL_OFF quickly
> > then ? We wait for upto 5s in cpu_wait_death(worst case) before cpu_kill
> > is called from __cpu_die.
> >
> When cpu_wait_death() is done, it means that the cpu core's hardware prepare to
> die. I think not returning LEVEL_OFF quickly is that hardware need time to handle.
> I don't know how much time it need is reasonable, but I test that it need about
> 50us - 500us.
>

Fair enough.

> In addition I have not meat the worst case that cpu_wait_death() need upto
> 5s, and we only take normal case into account.
>

Good

>
> > Moreover I don't understand the argument here. The cpu being killed
> > will be OFF, as soon as it can and firmware controls that and this
> > change is not related to CPU_OFF. And this CPU calling cpu_kill can
> > sleep and 10ms is good to enter idle states if it's idle saving power,
> > so I fail to map the power saving you mention above.
> >
> We have hundreds of CPU cores that need to be shut down. For example,
> a CPU has 200 cores, and the thread to shut down the core is in CPU 0.
> and the thread need to shut down from core 1 to core 200. However, the
> implementation of the kernel can only shut down cpu cores one by one, so we
> need to wait for cpu_kill() to finish before shutting down the next
> CPU core. If it wait for 10ms each time in cpu_kill, it will takes up
> about 2 seconds in cpu_kill() total.
>

OK, thanks for the illustrative example. This make sense to me now. But
you comparing with battery powered devices confused me and I assumed
it as some hack to optimise mobile workload.

> >> So change msleep (10) to usleep_range() to reduce the waiting time. In addition,
> >> we don't want to be scheduled during the sleeping time, some threads may take a
> >> long time and don't give up the CPU, which affects the time of core shutdown,
> >> Therefore, we add a chance to busy-wait max 1ms.
> >>
> >
> > On the other hand, usleep_range reduces the timer interval and hence
> > increases the chance of the callee CPU not to enter deeper idle states.
> >
> > What am I missing here ? What's the use case or power off situation
> > you are talking about above ?
> >
> As mentioned above, we are not to save power through msleep to idle state,
> but to quickly turn off other CPU core's hardware to reduce power consumption.

You still don't provide your use-case in which this is required. I know
this will be useful for suspend-to-ram. Do you have any other use-case
that you need to power-off large number of CPUs like this ? Also you
mentioned battery powered, and I don't think any battery powered device
has 200 thread like in your example :)

You need to mention few things clearly in the commit log:
1. How the CPU hotplug operation is serialised in some use-case like
suspend-to-ram
2. How that may impact systems with large number of CPUs
3. How your change helps to improve that

It may it easy for anyone to understand the motivation for this change.
The commit message you have doesn't give any clue on all the above and
hence we have lot of questions.

I will respond to the original patch separately.

--
Regards,
Sudeep

2019-10-16 22:21:32

by Sudeep Holla

[permalink] [raw]
Subject: Re: [PATCH V2] arm64: psci: Reduce waiting time of cpu_psci_cpu_kill()

On Wed, Oct 09, 2019 at 12:45:16PM +0800, Yunfeng Ye wrote:
> If psci_ops.affinity_info() fails, it will sleep 10ms, which will not
> take so long in the right case. Use usleep_range() instead of msleep(),
> reduce the waiting time, and give a chance to busy wait before sleep.
>
> Signed-off-by: Yunfeng Ye <[email protected]>
> ---
> V1->V2:
> - use usleep_range() instead of udelay() after waiting for a while
>
> arch/arm64/kernel/psci.c | 17 +++++++++++++----
> 1 file changed, 13 insertions(+), 4 deletions(-)
>
> diff --git a/arch/arm64/kernel/psci.c b/arch/arm64/kernel/psci.c
> index c9f72b2..99b3122 100644
> --- a/arch/arm64/kernel/psci.c
> +++ b/arch/arm64/kernel/psci.c
> @@ -82,6 +82,7 @@ static void cpu_psci_cpu_die(unsigned int cpu)
> static int cpu_psci_cpu_kill(unsigned int cpu)
> {
> int err, i;
> + unsigned long timeout;
>
> if (!psci_ops.affinity_info)
> return 0;
> @@ -91,16 +92,24 @@ static int cpu_psci_cpu_kill(unsigned int cpu)
> * while it is dying. So, try again a few times.
> */
>
> - for (i = 0; i < 10; i++) {
> + i = 0;
> + timeout = jiffies + msecs_to_jiffies(100);
> + do {
> err = psci_ops.affinity_info(cpu_logical_map(cpu), 0);
> if (err == PSCI_0_2_AFFINITY_LEVEL_OFF) {
> pr_info("CPU%d killed.\n", cpu);
> return 0;
> }
>
> - msleep(10);
> - pr_info("Retrying again to check for CPU kill\n");

You dropped this message, any particular reason ?

> - }
> + /* busy-wait max 1ms */
> + if (i++ < 100) {
> + cond_resched();
> + udelay(10);
> + continue;

Why can't it be simple like loop of 100 * msleep(1) instead of loop of
10 * msleep(10). The above initial busy wait for 1 ms looks too much
optimised for your setup where it takes 50-500us, what if it take just
over 1 ms ?

We need more generic solution.

--
Regards,
Sudeep

2019-10-18 16:06:01

by Yunfeng Ye

[permalink] [raw]
Subject: Re: [PATCH V2] arm64: psci: Reduce waiting time of cpu_psci_cpu_kill()



On 2019/10/16 23:32, Sudeep Holla wrote:
> On Wed, Oct 09, 2019 at 12:45:16PM +0800, Yunfeng Ye wrote:
>> If psci_ops.affinity_info() fails, it will sleep 10ms, which will not
>> take so long in the right case. Use usleep_range() instead of msleep(),
>> reduce the waiting time, and give a chance to busy wait before sleep.
>>
>> Signed-off-by: Yunfeng Ye <[email protected]>
>> ---
>> V1->V2:
>> - use usleep_range() instead of udelay() after waiting for a while
>>
>> arch/arm64/kernel/psci.c | 17 +++++++++++++----
>> 1 file changed, 13 insertions(+), 4 deletions(-)
>>
>> diff --git a/arch/arm64/kernel/psci.c b/arch/arm64/kernel/psci.c
>> index c9f72b2..99b3122 100644
>> --- a/arch/arm64/kernel/psci.c
>> +++ b/arch/arm64/kernel/psci.c
>> @@ -82,6 +82,7 @@ static void cpu_psci_cpu_die(unsigned int cpu)
>> static int cpu_psci_cpu_kill(unsigned int cpu)
>> {
>> int err, i;
>> + unsigned long timeout;
>>
>> if (!psci_ops.affinity_info)
>> return 0;
>> @@ -91,16 +92,24 @@ static int cpu_psci_cpu_kill(unsigned int cpu)
>> * while it is dying. So, try again a few times.
>> */
>>
>> - for (i = 0; i < 10; i++) {
>> + i = 0;
>> + timeout = jiffies + msecs_to_jiffies(100);
>> + do {
>> err = psci_ops.affinity_info(cpu_logical_map(cpu), 0);
>> if (err == PSCI_0_2_AFFINITY_LEVEL_OFF) {
>> pr_info("CPU%d killed.\n", cpu);
>> return 0;
>> }
>>
>> - msleep(10);
>> - pr_info("Retrying again to check for CPU kill\n");
>
> You dropped this message, any particular reason ?
>
When reduce the time interval to 1ms, the print message maybe increase 10 times.
on the other hand, cpu_psci_cpu_kill() will print message on success or failure, which
this retry log is not very necessary. of cource, I think use pr_info_once() instead of
pr_info() is better.

thanks.

>> - }
>> + /* busy-wait max 1ms */
>> + if (i++ < 100) {
>> + cond_resched();
>> + udelay(10);
>> + continue;
>
> Why can't it be simple like loop of 100 * msleep(1) instead of loop of
> 10 * msleep(10). The above initial busy wait for 1 ms looks too much
> optimised for your setup where it takes 50-500us, what if it take just
> over 1 ms ?
>
msleep() is implemented by jiffies. when HZ=100 or HZ=250, msleep(1) is not
accurate. so I think usleep_range() is better. 1 ms looks simple and good, but how
about 100us is better? I refer a function sunxi_mc_smp_cpu_kill(), it use
usleep_range(50, 100).

thanks.

> We need more generic solution.
>
> --
> Regards,
> Sudeep
>
> .
>

2019-10-18 16:12:50

by Sudeep Holla

[permalink] [raw]
Subject: Re: [PATCH V2] arm64: psci: Reduce waiting time of cpu_psci_cpu_kill()

On Thu, Oct 17, 2019 at 09:26:15PM +0800, Yunfeng Ye wrote:
>
>
> On 2019/10/16 23:32, Sudeep Holla wrote:
> > On Wed, Oct 09, 2019 at 12:45:16PM +0800, Yunfeng Ye wrote:
> >> If psci_ops.affinity_info() fails, it will sleep 10ms, which will not
> >> take so long in the right case. Use usleep_range() instead of msleep(),
> >> reduce the waiting time, and give a chance to busy wait before sleep.
> >>
> >> Signed-off-by: Yunfeng Ye <[email protected]>
> >> ---
> >> V1->V2:
> >> - use usleep_range() instead of udelay() after waiting for a while
> >>
> >> arch/arm64/kernel/psci.c | 17 +++++++++++++----
> >> 1 file changed, 13 insertions(+), 4 deletions(-)
> >>
> >> diff --git a/arch/arm64/kernel/psci.c b/arch/arm64/kernel/psci.c
> >> index c9f72b2..99b3122 100644
> >> --- a/arch/arm64/kernel/psci.c
> >> +++ b/arch/arm64/kernel/psci.c
> >> @@ -82,6 +82,7 @@ static void cpu_psci_cpu_die(unsigned int cpu)
> >> static int cpu_psci_cpu_kill(unsigned int cpu)
> >> {
> >> int err, i;
> >> + unsigned long timeout;
> >>
> >> if (!psci_ops.affinity_info)
> >> return 0;
> >> @@ -91,16 +92,24 @@ static int cpu_psci_cpu_kill(unsigned int cpu)
> >> * while it is dying. So, try again a few times.
> >> */
> >>
> >> - for (i = 0; i < 10; i++) {
> >> + i = 0;
> >> + timeout = jiffies + msecs_to_jiffies(100);
> >> + do {
> >> err = psci_ops.affinity_info(cpu_logical_map(cpu), 0);
> >> if (err == PSCI_0_2_AFFINITY_LEVEL_OFF) {
> >> pr_info("CPU%d killed.\n", cpu);
> >> return 0;
> >> }
> >>
> >> - msleep(10);
> >> - pr_info("Retrying again to check for CPU kill\n");
> >
> > You dropped this message, any particular reason ?
> >
> When reduce the time interval to 1ms, the print message maybe increase 10
> times. on the other hand, cpu_psci_cpu_kill() will print message on success
> or failure, which this retry log is not very necessary. of cource, I think
> use pr_info_once() instead of pr_info() is better.
>

Yes changing it to pr_info_once is better than dropping it as it gives
some indication to the firmware if there's scope for improvement.

> >> - }
> >> + /* busy-wait max 1ms */
> >> + if (i++ < 100) {
> >> + cond_resched();
> >> + udelay(10);
> >> + continue;
> >
> > Why can't it be simple like loop of 100 * msleep(1) instead of loop of
> > 10 * msleep(10). The above initial busy wait for 1 ms looks too much
> > optimised for your setup where it takes 50-500us, what if it take just
> > over 1 ms ?
> >
> msleep() is implemented by jiffies. when HZ=100 or HZ=250, msleep(1) is not
> accurate. so I think usleep_range() is better. 1 ms looks simple and good, but how
> about 100us is better? I refer a function sunxi_mc_smp_cpu_kill(), it use
> usleep_range(50, 100).
>

Again that's specific to sunxi platforms and may work well. While I agree
msleep(1) may not be accurate, I am still inclined to have a max value
of 1000(i.e. 1ms) for usleep_range.

--
Regards,
Sudeep

2019-10-18 16:14:21

by Yunfeng Ye

[permalink] [raw]
Subject: Re: [PATCH V2] arm64: psci: Reduce waiting time of cpu_psci_cpu_kill()



On 2019/10/16 23:05, Sudeep Holla wrote:
> On Wed, Oct 16, 2019 at 07:29:59PM +0800, Yunfeng Ye wrote:
>>
>>
>> On 2019/10/16 18:25, Sudeep Holla wrote:
>>> On Wed, Oct 16, 2019 at 11:22:23AM +0800, Yunfeng Ye wrote:
>>>>
>>>>
>>>> On 2019/10/16 0:23, Will Deacon wrote:
>>>>> Hi,
>>>>>
>>>>> On Sat, Sep 21, 2019 at 07:21:17PM +0800, Yunfeng Ye wrote:
>>>>>> If psci_ops.affinity_info() fails, it will sleep 10ms, which will not
>>>>>> take so long in the right case. Use usleep_range() instead of msleep(),
>>>>>> reduce the waiting time, and give a chance to busy wait before sleep.
>>>>>
>>>>> Can you elaborate on "the right case" please? It's not clear to me
>>>>> exactly what problem you're solving here.
>>>>>
>>>> The situation is that when the power is off, we have a battery to save some
>>>> information, but the battery power is limited, so we reduce the power consumption
>>>> by turning off the cores, and need fastly to complete the core shutdown. However, the
>>>> time of cpu_psci_cpu_kill() will take 10ms. We have tested the time that it does not
>>>> need 10ms, and most case is about 50us-500us. if we reduce the time of cpu_psci_cpu_kill(),
>>>> we can reduce 10% - 30% of the total time.
>>>>
>>>
>>> Have you checked why PSCI AFFINITY_INFO not returning LEVEL_OFF quickly
>>> then ? We wait for upto 5s in cpu_wait_death(worst case) before cpu_kill
>>> is called from __cpu_die.
>>>
>> When cpu_wait_death() is done, it means that the cpu core's hardware prepare to
>> die. I think not returning LEVEL_OFF quickly is that hardware need time to handle.
>> I don't know how much time it need is reasonable, but I test that it need about
>> 50us - 500us.
>>
>
> Fair enough.
>
>> In addition I have not meat the worst case that cpu_wait_death() need upto
>> 5s, and we only take normal case into account.
>>
>
> Good
>
>>
>>> Moreover I don't understand the argument here. The cpu being killed
>>> will be OFF, as soon as it can and firmware controls that and this
>>> change is not related to CPU_OFF. And this CPU calling cpu_kill can
>>> sleep and 10ms is good to enter idle states if it's idle saving power,
>>> so I fail to map the power saving you mention above.
>>>
>> We have hundreds of CPU cores that need to be shut down. For example,
>> a CPU has 200 cores, and the thread to shut down the core is in CPU 0.
>> and the thread need to shut down from core 1 to core 200. However, the
>> implementation of the kernel can only shut down cpu cores one by one, so we
>> need to wait for cpu_kill() to finish before shutting down the next
>> CPU core. If it wait for 10ms each time in cpu_kill, it will takes up
>> about 2 seconds in cpu_kill() total.
>>
>
> OK, thanks for the illustrative example. This make sense to me now. But
> you comparing with battery powered devices confused me and I assumed
> it as some hack to optimise mobile workload.
>
It is not mobile workload, but a arm64 server with hundreds of cpu cores.
Battery powered is a backup battery for reliability and to prevent
accidental power failure.

>>>> So change msleep (10) to usleep_range() to reduce the waiting time. In addition,
>>>> we don't want to be scheduled during the sleeping time, some threads may take a
>>>> long time and don't give up the CPU, which affects the time of core shutdown,
>>>> Therefore, we add a chance to busy-wait max 1ms.
>>>>
>>>
>>> On the other hand, usleep_range reduces the timer interval and hence
>>> increases the chance of the callee CPU not to enter deeper idle states.
>>>
>>> What am I missing here ? What's the use case or power off situation
>>> you are talking about above ?
>>>
>> As mentioned above, we are not to save power through msleep to idle state,
>> but to quickly turn off other CPU core's hardware to reduce power consumption.
>
> You still don't provide your use-case in which this is required. I know
> this will be useful for suspend-to-ram. Do you have any other use-case
> that you need to power-off large number of CPUs like this ? Also you
> mentioned battery powered, and I don't think any battery powered device
> has 200 thread like in your example :)
>
The use-case is like suspend-to-disk, but a little different:
In the abnormal power failure of server equipment, in order to increase
reliability, there is a backup battery for power supply. Before the battery runs out,
we need to save the key datas to the disk. In order to maintain the battery power
supply, a series of power reduction processing is needed, include which all the cores
need to be shut down quickly, we have max near 200 cores need to shutdown.

Also this modify will be useful for suspend-to-ram too.
thanks.

> You need to mention few things clearly in the commit log:
> 1. How the CPU hotplug operation is serialised in some use-case like
> suspend-to-ram
> 2. How that may impact systems with large number of CPUs
> 3. How your change helps to improve that
>
> It may it easy for anyone to understand the motivation for this change.
> The commit message you have doesn't give any clue on all the above and
> hence we have lot of questions.
>
ok, thanks.

> I will respond to the original patch separately.
>
> --
> Regards,
> Sudeep
>
> .
>

2019-10-18 16:15:47

by David Laight

[permalink] [raw]
Subject: RE: [PATCH V2] arm64: psci: Reduce waiting time of cpu_psci_cpu_kill()

From: Yunfeng Ye
> Sent: 17 October 2019 14:26
...
> >> - for (i = 0; i < 10; i++) {
> >> + i = 0;
> >> + timeout = jiffies + msecs_to_jiffies(100);
> >> + do {
> >> err = psci_ops.affinity_info(cpu_logical_map(cpu), 0);
> >> if (err == PSCI_0_2_AFFINITY_LEVEL_OFF) {
> >> pr_info("CPU%d killed.\n", cpu);
> >> return 0;
> >> }
> >>
> >> - msleep(10);
> >> - pr_info("Retrying again to check for CPU kill\n");
> >
> > You dropped this message, any particular reason ?
> >
> When reduce the time interval to 1ms, the print message maybe increase 10 times.
> on the other hand, cpu_psci_cpu_kill() will print message on success or failure, which
> this retry log is not very necessary. of cource, I think use pr_info_once() instead of
> pr_info() is better.

Maybe you should print in on (say) the 10th time around the loop.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

2019-10-18 16:22:04

by Yunfeng Ye

[permalink] [raw]
Subject: Re: [PATCH V2] arm64: psci: Reduce waiting time of cpu_psci_cpu_kill()



On 2019/10/17 22:00, David Laight wrote:
> From: Yunfeng Ye
>> Sent: 17 October 2019 14:26
> ...
>>>> - for (i = 0; i < 10; i++) {
>>>> + i = 0;
>>>> + timeout = jiffies + msecs_to_jiffies(100);
>>>> + do {
>>>> err = psci_ops.affinity_info(cpu_logical_map(cpu), 0);
>>>> if (err == PSCI_0_2_AFFINITY_LEVEL_OFF) {
>>>> pr_info("CPU%d killed.\n", cpu);
>>>> return 0;
>>>> }
>>>>
>>>> - msleep(10);
>>>> - pr_info("Retrying again to check for CPU kill\n");
>>>
>>> You dropped this message, any particular reason ?
>>>
>> When reduce the time interval to 1ms, the print message maybe increase 10 times.
>> on the other hand, cpu_psci_cpu_kill() will print message on success or failure, which
>> this retry log is not very necessary. of cource, I think use pr_info_once() instead of
>> pr_info() is better.
>
> Maybe you should print in on (say) the 10th time around the loop.
>
Can it like this:
pr_info("CPU%d killed with %d loops.\n", cpu, loops);

If put the number of waiting times in the successful printing message, it is
not necessary to print the "Retrying ..." message.

thanks.

> David
>
> -
> Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
> Registration No: 1397386 (Wales)
>

2019-10-18 19:43:22

by Yunfeng Ye

[permalink] [raw]
Subject: Re: [PATCH V2] arm64: psci: Reduce waiting time of cpu_psci_cpu_kill()



On 2019/10/17 21:54, Sudeep Holla wrote:
> On Thu, Oct 17, 2019 at 09:26:15PM +0800, Yunfeng Ye wrote:
>>
>>
>> On 2019/10/16 23:32, Sudeep Holla wrote:
>>> On Wed, Oct 09, 2019 at 12:45:16PM +0800, Yunfeng Ye wrote:
>>>> If psci_ops.affinity_info() fails, it will sleep 10ms, which will not
>>>> take so long in the right case. Use usleep_range() instead of msleep(),
>>>> reduce the waiting time, and give a chance to busy wait before sleep.
>>>>
>>>> Signed-off-by: Yunfeng Ye <[email protected]>
>>>> ---
>>>> V1->V2:
>>>> - use usleep_range() instead of udelay() after waiting for a while
>>>>
>>>> arch/arm64/kernel/psci.c | 17 +++++++++++++----
>>>> 1 file changed, 13 insertions(+), 4 deletions(-)
>>>>
>>>> diff --git a/arch/arm64/kernel/psci.c b/arch/arm64/kernel/psci.c
>>>> index c9f72b2..99b3122 100644
>>>> --- a/arch/arm64/kernel/psci.c
>>>> +++ b/arch/arm64/kernel/psci.c
>>>> @@ -82,6 +82,7 @@ static void cpu_psci_cpu_die(unsigned int cpu)
>>>> static int cpu_psci_cpu_kill(unsigned int cpu)
>>>> {
>>>> int err, i;
>>>> + unsigned long timeout;
>>>>
>>>> if (!psci_ops.affinity_info)
>>>> return 0;
>>>> @@ -91,16 +92,24 @@ static int cpu_psci_cpu_kill(unsigned int cpu)
>>>> * while it is dying. So, try again a few times.
>>>> */
>>>>
>>>> - for (i = 0; i < 10; i++) {
>>>> + i = 0;
>>>> + timeout = jiffies + msecs_to_jiffies(100);
>>>> + do {
>>>> err = psci_ops.affinity_info(cpu_logical_map(cpu), 0);
>>>> if (err == PSCI_0_2_AFFINITY_LEVEL_OFF) {
>>>> pr_info("CPU%d killed.\n", cpu);
>>>> return 0;
>>>> }
>>>>
>>>> - msleep(10);
>>>> - pr_info("Retrying again to check for CPU kill\n");
>>>
>>> You dropped this message, any particular reason ?
>>>
>> When reduce the time interval to 1ms, the print message maybe increase 10
>> times. on the other hand, cpu_psci_cpu_kill() will print message on success
>> or failure, which this retry log is not very necessary. of cource, I think
>> use pr_info_once() instead of pr_info() is better.
>>
>
> Yes changing it to pr_info_once is better than dropping it as it gives
> some indication to the firmware if there's scope for improvement.
>
>>>> - }
>>>> + /* busy-wait max 1ms */
>>>> + if (i++ < 100) {
>>>> + cond_resched();
>>>> + udelay(10);
>>>> + continue;
>>>
>>> Why can't it be simple like loop of 100 * msleep(1) instead of loop of
>>> 10 * msleep(10). The above initial busy wait for 1 ms looks too much
>>> optimised for your setup where it takes 50-500us, what if it take just
>>> over 1 ms ?
>>>
>> msleep() is implemented by jiffies. when HZ=100 or HZ=250, msleep(1) is not
>> accurate. so I think usleep_range() is better. 1 ms looks simple and good, but how
>> about 100us is better? I refer a function sunxi_mc_smp_cpu_kill(), it use
>> usleep_range(50, 100).
>>
>
> Again that's specific to sunxi platforms and may work well. While I agree
> msleep(1) may not be accurate, I am still inclined to have a max value
> of 1000(i.e. 1ms) for usleep_range.
>
ok, I will send a new version patch that waiting max 1ms.
thanks.

> --
> Regards,
> Sudeep
>
> .
>

2019-10-18 19:43:27

by David Laight

[permalink] [raw]
Subject: RE: [PATCH V2] arm64: psci: Reduce waiting time of cpu_psci_cpu_kill()

From: Yunfeng Ye
> Sent: 17 October 2019 15:20
> On 2019/10/17 22:00, David Laight wrote:
> > From: Yunfeng Ye
> >> Sent: 17 October 2019 14:26
> > ...
> >>>> - for (i = 0; i < 10; i++) {
> >>>> + i = 0;
> >>>> + timeout = jiffies + msecs_to_jiffies(100);
> >>>> + do {
> >>>> err = psci_ops.affinity_info(cpu_logical_map(cpu), 0);
> >>>> if (err == PSCI_0_2_AFFINITY_LEVEL_OFF) {
> >>>> pr_info("CPU%d killed.\n", cpu);
> >>>> return 0;
> >>>> }
> >>>>
> >>>> - msleep(10);
> >>>> - pr_info("Retrying again to check for CPU kill\n");
> >>>
> >>> You dropped this message, any particular reason ?
> >>>
> >> When reduce the time interval to 1ms, the print message maybe increase 10 times.
> >> on the other hand, cpu_psci_cpu_kill() will print message on success or failure, which
> >> this retry log is not very necessary. of cource, I think use pr_info_once() instead of
> >> pr_info() is better.
> >
> > Maybe you should print in on (say) the 10th time around the loop.
> >
> Can it like this:
> pr_info("CPU%d killed with %d loops.\n", cpu, loops);
>
> If put the number of waiting times in the successful printing message, it is
> not necessary to print the "Retrying ..." message.

That depends on whether you want to know how long it took or why the system
is 'stuck'.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)