Increasing the value of 'sched_relax_domain_level' in cpuset expands
the searching range of task balancing on some schedule events.
As the result it makes the task balancing in the range more aggressive,
so it will benefit some situation, such as where the latency is required
even it sacrifices cache hit rate etc. (for such situation, it would be
ideally best that cpus do not be idle until there are no runnable task.)
This patch aimed to accelerate the balancing in the relax_domain.
The newidle balancing is kicked when tasks in a runqueue run out.
It finds and pulls runnable tasks from other busy cpus, checking load
imbalance between cpus. Considering above situation, using loads in
short term is preferred than that in long term because it makes balancing
more aggressive otherwise it becomes relatively conservative.
The referenced load is selected by the newidle_idx parameter of scheduler
domains, so this patch tunes the parameters only when domains are in the
relax_domain's range. There are no effects if you don't use relax_domain.
Following is a result of my short-lightweight-transaction test, showing
average of requester's latency (ms), 300 couple of threads running 30 sec
on 8cpu/Itanium:
1) v2.6.28-rc4
Average 0.748783 Std Div 1.688022 Throughput 165313
2) v2.6.28-rc4 + relax_domain
Average 0.536867 Std Div 1.115383 Throughput 168492
3) v2.6.28-rc4 + relax_domain + patch
Average 0.385164 Std Div 0.801875 Throughput 170069
Signed-off-by: Hidetoshi Seto <[email protected]>
---
kernel/sched.c | 2 ++
1 files changed, 2 insertions(+), 0 deletions(-)
diff --git a/kernel/sched.c b/kernel/sched.c
index 57c933f..c970239 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -7366,6 +7366,8 @@ static void set_domain_attribute(struct sched_domain *sd,
} else {
/* turn on idle balance on this domain */
sd->flags |= (SD_WAKE_IDLE_FAR|SD_BALANCE_NEWIDLE);
+ /* make newidle balancing more aggressive */
+ sd->newidle_idx = 0;
}
}
--
1.6.0.GIT
* Hidetoshi Seto <[email protected]> wrote:
> Increasing the value of 'sched_relax_domain_level' in cpuset expands
> the searching range of task balancing on some schedule events. As
> the result it makes the task balancing in the range more aggressive,
> so it will benefit some situation, such as where the latency is
> required even it sacrifices cache hit rate etc. (for such situation,
> it would be ideally best that cpus do not be idle until there are no
> runnable task.)
>
> This patch aimed to accelerate the balancing in the relax_domain.
>
> The newidle balancing is kicked when tasks in a runqueue run out. It
> finds and pulls runnable tasks from other busy cpus, checking load
> imbalance between cpus. Considering above situation, using loads in
> short term is preferred than that in long term because it makes
> balancing more aggressive otherwise it becomes relatively
> conservative. The referenced load is selected by the newidle_idx
> parameter of scheduler domains, so this patch tunes the parameters
> only when domains are in the relax_domain's range. There are no
> effects if you don't use relax_domain.
>
> Following is a result of my short-lightweight-transaction test, showing
> average of requester's latency (ms), 300 couple of threads running 30 sec
> on 8cpu/Itanium:
>
> 1) v2.6.28-rc4
> Average 0.748783 Std Div 1.688022 Throughput 165313
> 2) v2.6.28-rc4 + relax_domain
> Average 0.536867 Std Div 1.115383 Throughput 168492
> 3) v2.6.28-rc4 + relax_domain + patch
> Average 0.385164 Std Div 0.801875 Throughput 170069
that improvement in metric looks good.
> Signed-off-by: Hidetoshi Seto <[email protected]>
> ---
> kernel/sched.c | 2 ++
> 1 files changed, 2 insertions(+), 0 deletions(-)
>
> diff --git a/kernel/sched.c b/kernel/sched.c
> index 57c933f..c970239 100644
> --- a/kernel/sched.c
> +++ b/kernel/sched.c
> @@ -7366,6 +7366,8 @@ static void set_domain_attribute(struct sched_domain *sd,
> } else {
> /* turn on idle balance on this domain */
> sd->flags |= (SD_WAKE_IDLE_FAR|SD_BALANCE_NEWIDLE);
> + /* make newidle balancing more aggressive */
> + sd->newidle_idx = 0;
I agree with making it more sensitive to momentary load fluctuations.
(as long as other metrics do not degrade).
But this solutin basically overrides the newidle_idx tuning in
topology.h.
Is there a strong reason to do this tuning dynamically, or could we
just decrease newidle_idx in the appropriate templates in the
topology.h files?
Ingo
Ingo Molnar wrote:
> I agree with making it more sensitive to momentary load fluctuations.
> (as long as other metrics do not degrade).
>
> But this solutin basically overrides the newidle_idx tuning in
> topology.h.
>
> Is there a strong reason to do this tuning dynamically, or could we
> just decrease newidle_idx in the appropriate templates in the
> topology.h files?
IMHO topology.h should have proper values that will fit 'standard'
system usage. I'm not sure that the current values (such as 2 in
SD_NODE) is best number or not... And I think it can differ by archs.
If there are no features that will be affected by the changing
default newidle_idx values, then we can fix it in the templates.
It would be a strong reason - I don't like regression :-)
Thanks,
H.Seto