This series is to improve the arm_cpuidle_suspend() a bit by removing/moving
out checks from this hot path.
Jisheng Zhang (2):
arm64: cpuidle: remove cpu_ops check from arm_cpuidle_suspend()
arm64: cpuidle: make arm_cpuidle_suspend() a bit more efficient
arch/arm64/kernel/cpuidle.c | 9 ++-------
1 file changed, 2 insertions(+), 7 deletions(-)
--
2.8.0.rc3
If cpu_ops has not been registered, arm_cpuidle_init() will return
-EOPNOTSUPP, so arm_cpuidle_suspend() will never have chance to
run. In other word, the cpu_ops check can be avoid.
Signed-off-by: Jisheng Zhang <[email protected]>
---
arch/arm64/kernel/cpuidle.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)
diff --git a/arch/arm64/kernel/cpuidle.c b/arch/arm64/kernel/cpuidle.c
index 9047cab6..bd57c59 100644
--- a/arch/arm64/kernel/cpuidle.c
+++ b/arch/arm64/kernel/cpuidle.c
@@ -37,10 +37,9 @@ int arm_cpuidle_suspend(int index)
int cpu = smp_processor_id();
/*
- * If cpu_ops have not been registered or suspend
- * has not been initialized, cpu_suspend call fails early.
+ * If suspend has not been initialized, cpu_suspend call fails early.
*/
- if (!cpu_ops[cpu] || !cpu_ops[cpu]->cpu_suspend)
+ if (!cpu_ops[cpu]->cpu_suspend)
return -EOPNOTSUPP;
return cpu_ops[cpu]->cpu_suspend(index);
}
--
2.8.0.rc3
Currently, we check cpu_ops->cpu_suspend every time when entering a
low-power idle state. But this check could be avoided in this hot path
by moving it into arm_cpuidle_init() to reduce arm_cpuidle_suspend()
overhead a bit.
Signed-off-by: Jisheng Zhang <[email protected]>
---
arch/arm64/kernel/cpuidle.c | 8 ++------
1 file changed, 2 insertions(+), 6 deletions(-)
diff --git a/arch/arm64/kernel/cpuidle.c b/arch/arm64/kernel/cpuidle.c
index bd57c59..e11857f 100644
--- a/arch/arm64/kernel/cpuidle.c
+++ b/arch/arm64/kernel/cpuidle.c
@@ -19,7 +19,8 @@ int __init arm_cpuidle_init(unsigned int cpu)
{
int ret = -EOPNOTSUPP;
- if (cpu_ops[cpu] && cpu_ops[cpu]->cpu_init_idle)
+ if (cpu_ops[cpu] && cpu_ops[cpu]->cpu_suspend &&
+ cpu_ops[cpu]->cpu_init_idle)
ret = cpu_ops[cpu]->cpu_init_idle(cpu);
return ret;
@@ -36,10 +37,5 @@ int arm_cpuidle_suspend(int index)
{
int cpu = smp_processor_id();
- /*
- * If suspend has not been initialized, cpu_suspend call fails early.
- */
- if (!cpu_ops[cpu]->cpu_suspend)
- return -EOPNOTSUPP;
return cpu_ops[cpu]->cpu_suspend(index);
}
--
2.8.0.rc3
On Thu, Mar 24, 2016 at 01:08:48PM +0800, Jisheng Zhang wrote:
> This series is to improve the arm_cpuidle_suspend() a bit by removing/moving
> out checks from this hot path.
>
> Jisheng Zhang (2):
> arm64: cpuidle: remove cpu_ops check from arm_cpuidle_suspend()
> arm64: cpuidle: make arm_cpuidle_suspend() a bit more efficient
>
> arch/arm64/kernel/cpuidle.c | 9 ++-------
> 1 file changed, 2 insertions(+), 7 deletions(-)
These look fine to me, but do you have any rough numbers showing what
sort of improvement we get from this change?
Will
Hi Will,
On Thu, 24 Mar 2016 11:15:07 +0000 Will Deacon wrote:
> On Thu, Mar 24, 2016 at 01:08:48PM +0800, Jisheng Zhang wrote:
> > This series is to improve the arm_cpuidle_suspend() a bit by removing/moving
> > out checks from this hot path.
> >
> > Jisheng Zhang (2):
> > arm64: cpuidle: remove cpu_ops check from arm_cpuidle_suspend()
> > arm64: cpuidle: make arm_cpuidle_suspend() a bit more efficient
> >
> > arch/arm64/kernel/cpuidle.c | 9 ++-------
> > 1 file changed, 2 insertions(+), 7 deletions(-)
>
> These look fine to me, but do you have any rough numbers showing what
> sort of improvement we get from this change?
Good question. Here it is:
I measured the 4096 * time from arm_cpuidle_suspend entry point to the
cpu_psci_cpu_suspend entry point. HW platform is Marvell BG4CT STB board.
1. only one shell, no other process, hot-unplug secondary cpus, execute the
following cmd
while true
do
sleep 0.2
done
before the patch: 1581220ns
after the patch: 1579630ns
reduced by 0.1%
2. only one shell, no other process, hot-unplug secondary cpus, execute the
following cmd
while true
do
md5sum /tmp/testfile
sleep 0.2
done
NOTE the testfile size should be larger than L1+L2 cache size
before the patch: 1961960ns
after the patch: 1912500ns
reduced by 2.5%
So the more complex the system load, the bigger the improvement.
Thanks,
Jisheng
On Thu, Mar 24, 2016 at 09:18:53PM +0800, Jisheng Zhang wrote:
> Hi Will,
>
> On Thu, 24 Mar 2016 11:15:07 +0000 Will Deacon wrote:
>
> > On Thu, Mar 24, 2016 at 01:08:48PM +0800, Jisheng Zhang wrote:
> > > This series is to improve the arm_cpuidle_suspend() a bit by removing/moving
> > > out checks from this hot path.
> > >
> > > Jisheng Zhang (2):
> > > arm64: cpuidle: remove cpu_ops check from arm_cpuidle_suspend()
> > > arm64: cpuidle: make arm_cpuidle_suspend() a bit more efficient
> > >
> > > arch/arm64/kernel/cpuidle.c | 9 ++-------
> > > 1 file changed, 2 insertions(+), 7 deletions(-)
> >
> > These look fine to me, but do you have any rough numbers showing what
> > sort of improvement we get from this change?
>
> Good question. Here it is:
>
> I measured the 4096 * time from arm_cpuidle_suspend entry point to the
> cpu_psci_cpu_suspend entry point. HW platform is Marvell BG4CT STB board.
>
> 1. only one shell, no other process, hot-unplug secondary cpus, execute the
> following cmd
>
> while true
> do
> sleep 0.2
> done
>
> before the patch: 1581220ns
>
> after the patch: 1579630ns
>
> reduced by 0.1%
>
> 2. only one shell, no other process, hot-unplug secondary cpus, execute the
> following cmd
>
> while true
> do
> md5sum /tmp/testfile
> sleep 0.2
> done
>
> NOTE the testfile size should be larger than L1+L2 cache size
>
> before the patch: 1961960ns
> after the patch: 1912500ns
>
> reduced by 2.5%
>
> So the more complex the system load, the bigger the improvement.
So between arm_cpuidle_suspend() and psci_cpu_suspend_enter() the
checks that you are removing are almost the *only* code that is
currently executed and this patch saves us best case 12ns per idle state
entry (which is noise compared to CPU PM notifiers/FW execution time)
if I am not mistaken, I can't wait to use that energy for something more
useful :)
Anyway, as a clean-up your patches are fine it is sloppy to check those
pointers on every idle state entry (do you really need two patches ?), so:
Acked-by: Lorenzo Pieralisi <[email protected]>
Hi Lorenzo,
On Thu, 24 Mar 2016 16:44:19 +0000
Lorenzo Pieralisi <[email protected]> wrote:
> On Thu, Mar 24, 2016 at 09:18:53PM +0800, Jisheng Zhang wrote:
> > Hi Will,
> >
> > On Thu, 24 Mar 2016 11:15:07 +0000 Will Deacon wrote:
> >
> > > On Thu, Mar 24, 2016 at 01:08:48PM +0800, Jisheng Zhang wrote:
> > > > This series is to improve the arm_cpuidle_suspend() a bit by removing/moving
> > > > out checks from this hot path.
> > > >
> > > > Jisheng Zhang (2):
> > > > arm64: cpuidle: remove cpu_ops check from arm_cpuidle_suspend()
> > > > arm64: cpuidle: make arm_cpuidle_suspend() a bit more efficient
> > > >
> > > > arch/arm64/kernel/cpuidle.c | 9 ++-------
> > > > 1 file changed, 2 insertions(+), 7 deletions(-)
> > >
> > > These look fine to me, but do you have any rough numbers showing what
> > > sort of improvement we get from this change?
> >
> > Good question. Here it is:
> >
> > I measured the 4096 * time from arm_cpuidle_suspend entry point to the
> > cpu_psci_cpu_suspend entry point. HW platform is Marvell BG4CT STB board.
> >
> > 1. only one shell, no other process, hot-unplug secondary cpus, execute the
> > following cmd
> >
> > while true
> > do
> > sleep 0.2
> > done
> >
> > before the patch: 1581220ns
> >
> > after the patch: 1579630ns
> >
> > reduced by 0.1%
> >
> > 2. only one shell, no other process, hot-unplug secondary cpus, execute the
> > following cmd
> >
> > while true
> > do
> > md5sum /tmp/testfile
> > sleep 0.2
> > done
> >
> > NOTE the testfile size should be larger than L1+L2 cache size
> >
> > before the patch: 1961960ns
> > after the patch: 1912500ns
> >
> > reduced by 2.5%
> >
> > So the more complex the system load, the bigger the improvement.
>
> So between arm_cpuidle_suspend() and psci_cpu_suspend_enter() the
> checks that you are removing are almost the *only* code that is
> currently executed and this patch saves us best case 12ns per idle state
> entry (which is noise compared to CPU PM notifiers/FW execution time)
> if I am not mistaken, I can't wait to use that energy for something more
> useful :)
>
> Anyway, as a clean-up your patches are fine it is sloppy to check those
> pointers on every idle state entry (do you really need two patches ?), so:
hmm, yes, it makes more sense to combined them into one patch.
>
> Acked-by: Lorenzo Pieralisi <[email protected]>
Thanks for reviewing,
Jisheng