PowerVM Hypervisor dispatches on a whole core basis. In a shared LPAR, a
CPU from a core that is preempted may have a larger latency. In
such a scenario, its preferable to choose a different CPU to run.
If one of the CPUs in the core is active, i.e neither CEDED nor
preempted, then consider this CPU as not preempted.
Also if any of the CPUs in the core has yielded but OS has not requested
CEDE or CONFER, then consider this CPU to be preempted.
Cc: Ajay Kaher <[email protected]>
Cc: Alexey Makhalov <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: Juergen Gross <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: Michael Ellerman <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: [email protected]
Signed-off-by: Srikar Dronamraju <[email protected]>
---
arch/powerpc/include/asm/paravirt.h | 33 ++++++++++++++++++++++++++---
1 file changed, 30 insertions(+), 3 deletions(-)
diff --git a/arch/powerpc/include/asm/paravirt.h b/arch/powerpc/include/asm/paravirt.h
index e08513d73119..a980756f58df 100644
--- a/arch/powerpc/include/asm/paravirt.h
+++ b/arch/powerpc/include/asm/paravirt.h
@@ -121,9 +121,19 @@ static inline bool vcpu_is_preempted(int cpu)
if (!is_shared_processor())
return false;
+ if (!(yield_count_of(cpu) & 1))
+ return false;
+
+ /*
+ * If CPU has yielded but OS has not requested idle then this CPU is
+ * definitely preempted.
+ */
+ if (!lppaca_of(cpu).idle)
+ return true;
+
#ifdef CONFIG_PPC_SPLPAR
if (!is_kvm_guest()) {
- int first_cpu;
+ int first_cpu, i;
/*
* The result of vcpu_is_preempted() is used in a
@@ -149,11 +159,28 @@ static inline bool vcpu_is_preempted(int cpu)
*/
if (cpu_first_thread_sibling(cpu) == first_cpu)
return false;
+
+ /*
+ * If any of the threads of this core is not preempted or
+ * ceded, then consider this CPU to be non-preempted
+ */
+ first_cpu = cpu_first_thread_sibling(cpu);
+ for (i = first_cpu; i < first_cpu + threads_per_core; i++) {
+ if (i == cpu)
+ continue;
+ if (!(yield_count_of(i) & 1))
+ return false;
+ if (!lppaca_of(i).idle)
+ return true;
+ }
}
#endif
- if (yield_count_of(cpu) & 1)
- return true;
+ /*
+ * None of the threads in this thread group are running but none of
+ * them were preempted too. Hence assume the thread to be
+ * non-preempted.
+ */
return false;
}
--
2.31.1
On 10/9/23 10:47 AM, Srikar Dronamraju wrote:
Hi Srikar. This is an interesting patch.
> PowerVM Hypervisor dispatches on a whole core basis. In a shared LPAR, a
s/whole/big
Can we mention that a big core consist of two small cores. and w.r.t
linux a core is at small core. Hence there is mismatch.
> CPU from a core that is preempted may have a larger latency. In
> such a scenario, its preferable to choose a different CPU to run.
>
> If one of the CPUs in the core is active, i.e neither CEDED nor
> preempted, then consider this CPU as not preempted
>
> Also if any of the CPUs in the core has yielded but OS has not requested
> CEDE or CONFER, then consider this CPU to be preempted.
>
This is because an idle CPU cannot be preempted. Right?
This patch should help address the has_idle_core functionality and ttwu path
in powerpc SPLPAR based on powerVM. Currently they are not correct.
when the all the CPU's are idle, __update_idle_core will not set has_idle_core
which is functionally not right. That is one example, there are other places where correct
functionality of vcpu_is_preempted is crucial as well.
> Cc: Ajay Kaher <[email protected]>
> Cc: Alexey Makhalov <[email protected]>
> Cc: Christophe Leroy <[email protected]>
> Cc: Juergen Gross <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> Cc: Michael Ellerman <[email protected]>
> Cc: Nicholas Piggin <[email protected]>
> Cc: [email protected]
> Signed-off-by: Srikar Dronamraju <[email protected]>
> ---
> arch/powerpc/include/asm/paravirt.h | 33 ++++++++++++++++++++++++++---
> 1 file changed, 30 insertions(+), 3 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/paravirt.h b/arch/powerpc/include/asm/paravirt.h
> index e08513d73119..a980756f58df 100644
> --- a/arch/powerpc/include/asm/paravirt.h
> +++ b/arch/powerpc/include/asm/paravirt.h
> @@ -121,9 +121,19 @@ static inline bool vcpu_is_preempted(int cpu)
> if (!is_shared_processor())
> return false;
>
> + if (!(yield_count_of(cpu) & 1))
> + return false;
> +
> + /*
> + * If CPU has yielded but OS has not requested idle then this CPU is
nit: can it be "if CPU is in hypervisor but OS has not requested ..." ?
> + * definitely preempted.
> + */
> + if (!lppaca_of(cpu).idle)
> + return true;
> +
> #ifdef CONFIG_PPC_SPLPAR
> if (!is_kvm_guest()) {
> - int first_cpu;
> + int first_cpu, i;
>
> /*
> * The result of vcpu_is_preempted() is used in a
> @@ -149,11 +159,28 @@ static inline bool vcpu_is_preempted(int cpu)
> */
> if (cpu_first_thread_sibling(cpu) == first_cpu)
> return false;
> +
> + /*
> + * If any of the threads of this core is not preempted or
> + * ceded, then consider this CPU to be non-preempted
> + */
> + first_cpu = cpu_first_thread_sibling(cpu);
> + for (i = first_cpu; i < first_cpu + threads_per_core; i++) {
> + if (i == cpu)
> + continue;
> + if (!(yield_count_of(i) & 1))
> + return false;
> + if (!lppaca_of(i).idle)
> + return true;
> + }
> }
> #endif
>
> - if (yield_count_of(cpu) & 1)
> - return true;
> + /*
> + * None of the threads in this thread group are running but none of
> + * them were preempted too. Hence assume the thread to be
> + * non-preempted.
> + */
That comment is bit confusing. instead of threads it would be better say CPUs
"None of the CPUs in this Big Core are running but none of them were preempted too. Hence assume the
the CPU to be non-preempted."
> return false;
> }
>
Otherwise LGTM
Reviewed-by: Shrikanth Hegde <[email protected]>
* Shrikanth Hegde <[email protected]> [2023-10-11 14:33:34]:
> On 10/9/23 10:47 AM, Srikar Dronamraju wrote:
>
> Hi Srikar. This is an interesting patch.
>
> > PowerVM Hypervisor dispatches on a whole core basis. In a shared LPAR, a
> s/whole/big
>
> Can we mention that a big core consist of two small cores. and w.r.t
> linux a core is at small core. Hence there is mismatch.
PowerVM currently always schedules at a Big core granularity. And we would
want to transparent about it even if it changes.
> > CPU from a core that is preempted may have a larger latency. In
> > such a scenario, its preferable to choose a different CPU to run.
> >
> > If one of the CPUs in the core is active, i.e neither CEDED nor
> > preempted, then consider this CPU as not preempted
> >
> > Also if any of the CPUs in the core has yielded but OS has not requested
> > CEDE or CONFER, then consider this CPU to be preempted.
> >
>
> This is because an idle CPU cannot be preempted. Right?
If a CPU from the same SMT8 core has been preempted, we should consider this CPU
also has been preempted.
>
> This patch should help address the has_idle_core functionality and ttwu path
> in powerpc SPLPAR based on powerVM. Currently they are not correct.
>
> when the all the CPU's are idle, __update_idle_core will not set has_idle_core
> which is functionally not right. That is one example, there are other places where correct
> functionality of vcpu_is_preempted is crucial as well.
>
Right, its a crucial from a functionality perspective on shared LPARs.
The Dedicated ones dont have this issue.
>
> > Cc: Ajay Kaher <[email protected]>
> > Cc: Alexey Makhalov <[email protected]>
> > Cc: Christophe Leroy <[email protected]>
> > Cc: Juergen Gross <[email protected]>
> > Cc: [email protected]
> > Cc: [email protected]
> > Cc: Michael Ellerman <[email protected]>
> > Cc: Nicholas Piggin <[email protected]>
> > Cc: [email protected]
> > Signed-off-by: Srikar Dronamraju <[email protected]>
> > ---
> > arch/powerpc/include/asm/paravirt.h | 33 ++++++++++++++++++++++++++---
> > 1 file changed, 30 insertions(+), 3 deletions(-)
> >
> > diff --git a/arch/powerpc/include/asm/paravirt.h b/arch/powerpc/include/asm/paravirt.h
> > index e08513d73119..a980756f58df 100644
> > --- a/arch/powerpc/include/asm/paravirt.h
> > +++ b/arch/powerpc/include/asm/paravirt.h
> > @@ -121,9 +121,19 @@ static inline bool vcpu_is_preempted(int cpu)
> > if (!is_shared_processor())
> > return false;
> >
> > + if (!(yield_count_of(cpu) & 1))
> > + return false;
> > +
> > + /*
> > + * If CPU has yielded but OS has not requested idle then this CPU is
>
> nit: can it be "if CPU is in hypervisor but OS has not requested ..." ?
Ok, will take it.
>
> > + * definitely preempted.
> > + */
> > + if (!lppaca_of(cpu).idle)
> > + return true;
> > +
> > #ifdef CONFIG_PPC_SPLPAR
> > if (!is_kvm_guest()) {
> > - int first_cpu;
> > + int first_cpu, i;
> >
> > /*
> > * The result of vcpu_is_preempted() is used in a
> > @@ -149,11 +159,28 @@ static inline bool vcpu_is_preempted(int cpu)
> > */
> > if (cpu_first_thread_sibling(cpu) == first_cpu)
> > return false;
> > +
> > + /*
> > + * If any of the threads of this core is not preempted or
> > + * ceded, then consider this CPU to be non-preempted
> > + */
> > + first_cpu = cpu_first_thread_sibling(cpu);
> > + for (i = first_cpu; i < first_cpu + threads_per_core; i++) {
> > + if (i == cpu)
> > + continue;
> > + if (!(yield_count_of(i) & 1))
> > + return false;
> > + if (!lppaca_of(i).idle)
> > + return true;
> > + }
> > }
> > #endif
> >
> > - if (yield_count_of(cpu) & 1)
> > - return true;
> > + /*
> > + * None of the threads in this thread group are running but none of
> > + * them were preempted too. Hence assume the thread to be
> > + * non-preempted.
> > + */
>
> That comment is bit confusing. instead of threads it would be better say CPUs
>
> "None of the CPUs in this Big Core are running but none of them were preempted too. Hence assume the
> the CPU to be non-preempted."
>
>
> > return false;
> > }
> >
>
> Otherwise LGTM
> Reviewed-by: Shrikanth Hegde <[email protected]>
Thanks Shrikanth.
--
Thanks and Regards
Srikar Dronamraju
On Mon, 2023-10-09 at 10:47 +0530, Srikar Dronamraju wrote:
Hi Srikar,
Benchmarked this patch on baremetal POWER9 node by launching KVM to
observe the improvements achieved in KVM with the patched kernel.
Below, you can find the schbench latency result comparision.
System was running on SMT4 mode with the below configuration:
Setup:
$ lscpu
Architecture: ppc64le
Byte Order: Little Endian
CPU(s): 128
On-line CPU(s) list: 0-127
Thread(s) per core: 4
Core(s) per socket: 16
Socket(s): 2
NUMA node(s): 8
Model: 2.3 (pvr 004e 1203)
Model name: POWER9, altivec supported
CPU max MHz: 3800.0000
CPU min MHz: 2300.0000
L1d cache: 32K
L1i cache: 32K
L2 cache: 512K
L3 cache: 10240K
NUMA node0 CPU(s): 0-63
NUMA node8 CPU(s): 64-127
NUMA node250 CPU(s):
NUMA node251 CPU(s):
NUMA node252 CPU(s):
NUMA node253 CPU(s):
NUMA node254 CPU(s):
NUMA node255 CPU(s):
- Baseline kernel : v6.6.0-rc5
- Patched kernel : v6.6.0-rc5 with patchset
- schbench version : upto commit 2eef44 (schbench: record the
execution time in the matrix multiplication mode)
Results:
--------
These results shows the schbench latency on a patched kernel compared
to a baseline kernel on KVM. The numbers in the "compare%" column
represent the percentage difference between the latency measured on the
baseline kernel and the patched kernel. A negative percentage means the
patched kernel performs less optimially (higher latency) than the
baseline, while a positive percentage means it performs better (lower
latency).
Scenarios:
----------
Case 1: No Noise
Host: Idle
KVM 1: Launched a KVM affined to 0-39 CPUs (40 CPUs)
KVM 1 (Workload) : ./schbench -m 20 -t 2 -r 30 (benchmark)
schbench latency (niter: 20)
percentile compare% (avg)
(higher the better)
50.0th: -4.84
75.0th: -8.09
90.0th: -3.39
95.0th: +5.16
99.0th: +90.78
99.5th: +36.34
99.9th: +8.31
--------------------------------------------------------
Case 2: With Noise: Over-commit case: Multiple KVM guests sharing the
same set of CPUs
Two KVM instances are launched, where one being benchmarked, and the
other executing a workload to introduce noise.
KVM 1: Launched a KVM affined to 0-39 CPUs (40 CPUs)
KVM 1 (Workload) : ./schbench -m 20 -t 2 -r 30 (benchmark)
KVM 2 (Noise): Launched a KVM affined to 0-39 CPUs
schbench latency (niter: 20)
percentile compare% (avg)
(higher the better)
50.0th: -1.47
75.0th: -5.72
90.0th: +7.88
95.0th: +10.71
99.0th: +512.08
99.5th: +380.61
99.9th: +90.39
--------------------------------------------------------
Case 3: Overlap case: Multiple KVM guests sharing a subset of CPUs.
Two KVM instances are launched, where one being benchmarked, and the
other executing a workload to introduce noise.
KVM 1: Launched a KVM affined to 0-39 CPUs (40 CPUs)
KVM 1 (Workload) : ./schbench -m 20 -t 2 -r 30 (benchmark)
KVM 2 (Noise): Launched a KVM affined to 0-19 CPUs
schbench latency (niter: 20)
percentile compare% (avg)
(higher the better)
50.0th: -1.63
75.0th: -2.78
90.0th: +57.62
95.0th: +87.90
99.0th: +343.66
99.5th: +178.01
99.9th: +36.07
--------------------------------------------------------
The above results demonstrate the effectiveness of the proposed
approach, which utilizes the idle-hint in lppaca to detect the
preempted vCPU more efficiently. This approach is beneficial for
improving schbench latency on KVM, particularly the tail latencies.
Thanks,
Aboorva
> PowerVM Hypervisor dispatches on a whole core basis. In a shared
> LPAR, a
> CPU from a core that is preempted may have a larger latency. In
> such a scenario, its preferable to choose a different CPU to run.
>
> If one of the CPUs in the core is active, i.e neither CEDED nor
> preempted, then consider this CPU as not preempted.
>
> Also if any of the CPUs in the core has yielded but OS has not
> requested
> CEDE or CONFER, then consider this CPU to be preempted.
>
> Cc: Ajay Kaher <[email protected]>
> Cc: Alexey Makhalov <[email protected]>
> Cc: Christophe Leroy <[email protected]>
> Cc: Juergen Gross <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> Cc: Michael Ellerman <[email protected]>
> Cc: Nicholas Piggin <[email protected]>
> Cc: [email protected]
> Signed-off-by: Srikar Dronamraju <[email protected]>
> ---
> arch/powerpc/include/asm/paravirt.h | 33 ++++++++++++++++++++++++++-
> --
> 1 file changed, 30 insertions(+), 3 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/paravirt.h
> b/arch/powerpc/include/asm/paravirt.h
> index e08513d73119..a980756f58df 100644
> --- a/arch/powerpc/include/asm/paravirt.h
> +++ b/arch/powerpc/include/asm/paravirt.h
> @@ -121,9 +121,19 @@ static inline bool vcpu_is_preempted(int cpu)
> if (!is_shared_processor())
> return false;
>
> + if (!(yield_count_of(cpu) & 1))
> + return false;
> +
> + /*
> + * If CPU has yielded but OS has not requested idle then this
> CPU is
> + * definitely preempted.
> + */
> + if (!lppaca_of(cpu).idle)
> + return true;
> +
> #ifdef CONFIG_PPC_SPLPAR
> if (!is_kvm_guest()) {
> - int first_cpu;
> + int first_cpu, i;
>
> /*
> * The result of vcpu_is_preempted() is used in a
> @@ -149,11 +159,28 @@ static inline bool vcpu_is_preempted(int cpu)
> */
> if (cpu_first_thread_sibling(cpu) == first_cpu)
> return false;
> +
> + /*
> + * If any of the threads of this core is not preempted
> or
> + * ceded, then consider this CPU to be non-preempted
> + */
> + first_cpu = cpu_first_thread_sibling(cpu);
> + for (i = first_cpu; i < first_cpu + threads_per_core;
> i++) {
> + if (i == cpu)
> + continue;
> + if (!(yield_count_of(i) & 1))
> + return false;
> + if (!lppaca_of(i).idle)
> + return true;
> + }
> }
> #endif
>
> - if (yield_count_of(cpu) & 1)
> - return true;
> + /*
> + * None of the threads in this thread group are running but
> none of
> + * them were preempted too. Hence assume the thread to be
> + * non-preempted.
> + */
> return false;
> }
>