So it's that time again. We need to decide which of the queued sched
patches should be merged into 2.6.17.
I have:
sched-fix-task-interactivity-calculation.patch
small-schedule-microoptimization.patch
#
sched-implement-smpnice.patch
sched-smpnice-apply-review-suggestions.patch
sched-smpnice-fix-average-load-per-run-queue-calculations.patch
sched-store-weighted-load-on-up.patch
sched-add-discrete-weighted-cpu-load-function.patch
sched-add-above-background-load-function.patch
# Suresh had problems
# con:
sched-cleanup_task_activated.patch
sched-make_task_noninteractive_use_sleep_type.patch
sched-dont_decrease_idle_sleep_avg.patch
sched-include_noninteractive_sleep_in_idle_detect.patch
sched-remove-on-runqueue-requeueing.patch
sched-activate-sched-batch-expired.patch
sched-reduce-overhead-of-calc_load.patch
#
sched-fix-interactive-task-starvation.patch
#
# "strange load balancing problems": [email protected]
sched-new-sched-domain-for-representing-multi-core.patch
sched-fix-group-power-for-allnodes_domains.patch
x86-dont-use-cpuid2-to-determine-cache-info-if-cpuid4-is-supported.patch
I'm not sure what the "Suresh had problems" comment refers to - perhaps a
now-removed patch.
afaik, the load balancing problem which Peter observed remains unresolved.
Has smpnice had appropriate testing for regressions?
Quoting Andrew Morton <[email protected]>:
>
> So it's that time again. We need to decide which of the queued sched
> patches should be merged into 2.6.17.
>
> I have:
>
> sched-fix-task-interactivity-calculation.patch
> small-schedule-microoptimization.patch
> #
> sched-implement-smpnice.patch
> sched-smpnice-apply-review-suggestions.patch
> sched-smpnice-fix-average-load-per-run-queue-calculations.patch
> sched-store-weighted-load-on-up.patch
> sched-add-discrete-weighted-cpu-load-function.patch
> sched-add-above-background-load-function.patch
> # Suresh had problems
> # con:
> sched-cleanup_task_activated.patch
> sched-make_task_noninteractive_use_sleep_type.patch
> sched-dont_decrease_idle_sleep_avg.patch
> sched-include_noninteractive_sleep_in_idle_detect.patch
> sched-remove-on-runqueue-requeueing.patch
> sched-activate-sched-batch-expired.patch
> sched-reduce-overhead-of-calc_load.patch
> #
> sched-fix-interactive-task-starvation.patch
I'd like to see all of these up to this point go in. I can't comment on the
below directly.
> #
> # "strange load balancing problems": [email protected]
> sched-new-sched-domain-for-representing-multi-core.patch
> sched-fix-group-power-for-allnodes_domains.patch
> x86-dont-use-cpuid2-to-determine-cache-info-if-cpuid4-is-supported.patch
>
>
> I'm not sure what the "Suresh had problems" comment refers to - perhaps a
> now-removed patch.
On previous versions of smp nice Suresh found some throughput issues. Peter has
addressed these as far as I'm aware, but we really need Suresh to check all
those again.
>
> afaik, the load balancing problem which Peter observed remains unresolved.
That was a multicore enabled balancing problem which he reported went away on a
later -mm.
Cheers,
Con
Andrew Morton wrote:
> So it's that time again. We need to decide which of the queued sched
> patches should be merged into 2.6.17.
>
> I have:
>
> sched-fix-task-interactivity-calculation.patch
> small-schedule-microoptimization.patch
> #
> sched-implement-smpnice.patch
> sched-smpnice-apply-review-suggestions.patch
> sched-smpnice-fix-average-load-per-run-queue-calculations.patch
> sched-store-weighted-load-on-up.patch
> sched-add-discrete-weighted-cpu-load-function.patch
> sched-add-above-background-load-function.patch
> # Suresh had problems
I really need to review smpnice. I'll try to get on to that soon.
I don't have any problems with the non-multiprocessor stuff
(Con's and Mike's patches).
--
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com
Andrew Morton wrote:
> So it's that time again. We need to decide which of the queued sched
> patches should be merged into 2.6.17.
>
> I have:
>
> sched-fix-task-interactivity-calculation.patch
> small-schedule-microoptimization.patch
> #
> sched-implement-smpnice.patch
> sched-smpnice-apply-review-suggestions.patch
> sched-smpnice-fix-average-load-per-run-queue-calculations.patch
> sched-store-weighted-load-on-up.patch
> sched-add-discrete-weighted-cpu-load-function.patch
> sched-add-above-background-load-function.patch
> # Suresh had problems
> # con:
> sched-cleanup_task_activated.patch
> sched-make_task_noninteractive_use_sleep_type.patch
> sched-dont_decrease_idle_sleep_avg.patch
> sched-include_noninteractive_sleep_in_idle_detect.patch
> sched-remove-on-runqueue-requeueing.patch
> sched-activate-sched-batch-expired.patch
> sched-reduce-overhead-of-calc_load.patch
> #
> sched-fix-interactive-task-starvation.patch
> #
> # "strange load balancing problems": [email protected]
> sched-new-sched-domain-for-representing-multi-core.patch
> sched-fix-group-power-for-allnodes_domains.patch
> x86-dont-use-cpuid2-to-determine-cache-info-if-cpuid4-is-supported.patch
>
>
> I'm not sure what the "Suresh had problems" comment refers to - perhaps a
> now-removed patch.
>
> afaik, the load balancing problem which Peter observed remains unresolved.
I have not seen this problem on recent -mm kernels (-rc6-mm1 and
-rc6-mm2 even with SCHED_MC configured in) so it would appear that it's
fixed. The only worrying thing is that we don't know what fixed it.
>
> Has smpnice had appropriate testing for regressions?
There've been no reported problems for quite a while so my (biased)
answer would be "yes".
Peter
--
Peter Williams [email protected]
"Learning, n. The kind of ignorance distinguishing the studious."
-- Ambrose Bierce
On Thu, Mar 23, 2006 at 09:57:06AM +1100, [email protected] wrote:
> Quoting Andrew Morton <[email protected]>:
> > #
> > # "strange load balancing problems": [email protected]
> > sched-new-sched-domain-for-representing-multi-core.patch
> > sched-fix-group-power-for-allnodes_domains.patch
> > x86-dont-use-cpuid2-to-determine-cache-info-if-cpuid4-is-supported.patch
I'd like to see the three above patches in 2.6.17. Peters "strange load
balancing problems" seems to be a false alarm(this patch will have
minimal impact on a single core cpu because of domain degeneration..) and
doesn't happen on recent -mm kernels..
> >
> >
> > I'm not sure what the "Suresh had problems" comment refers to - perhaps a
> > now-removed patch.
>
> On previous versions of smp nice Suresh found some throughput issues. Peter has
> addressed these as far as I'm aware, but we really need Suresh to check all
> those again.
I am just back from vacation. I will soon review and provide feedback.
> >
> > afaik, the load balancing problem which Peter observed remains unresolved.
>
> That was a multicore enabled balancing problem which he reported went away on a
> later -mm.
thanks,
suresh
* Andrew Morton <[email protected]> wrote:
> So it's that time again. We need to decide which of the queued sched
> patches should be merged into 2.6.17.
>
> I have:
>
> sched-fix-task-interactivity-calculation.patch
> small-schedule-microoptimization.patch
> #
> sched-implement-smpnice.patch
> sched-smpnice-apply-review-suggestions.patch
> sched-smpnice-fix-average-load-per-run-queue-calculations.patch
> sched-store-weighted-load-on-up.patch
> sched-add-discrete-weighted-cpu-load-function.patch
> sched-add-above-background-load-function.patch
>
> # Suresh had problems
> # con:
> sched-cleanup_task_activated.patch
> sched-make_task_noninteractive_use_sleep_type.patch
> sched-dont_decrease_idle_sleep_avg.patch
> sched-include_noninteractive_sleep_in_idle_detect.patch
> sched-remove-on-runqueue-requeueing.patch
> sched-activate-sched-batch-expired.patch
> sched-reduce-overhead-of-calc_load.patch
> #
> sched-fix-interactive-task-starvation.patch
> #
> # "strange load balancing problems": [email protected]
> sched-new-sched-domain-for-representing-multi-core.patch
> sched-fix-group-power-for-allnodes_domains.patch
> x86-dont-use-cpuid2-to-determine-cache-info-if-cpuid4-is-supported.patch
strange as it may sound, all of these patches are fine with me. I really
tried to find a questionable one (out of principle) but failed ;-)
there are two main risk areas: smpnice and the interactivity changes.
[multi-core support ought to be risk-free] ['risk' here means some 'oh
sh*t' conceptual problem that could cause big head-scratching shortly
before 2.6.17 is released, not some easy to fix regression.]
Smpnice got alot of attention (and testing) and it's still a feature
well worth having. The biggest risk comes from its relative complexity,
but not doing the merge now wont reduce that risk. The biggest plus
compared to the previous iteration is that smpnice is now essentially a
NOP for same-nice-level workloads.
The interactivity changes had less testing (being relatively young), but
they are pretty well split up and they should solve the worst of the
starvation problems. So if any of those causes problems, it will be an
easy revert.
All in one, i'm not worried about any these changes.
> I'm not sure what the "Suresh had problems" comment refers to -
> perhaps a now-removed patch.
i think that got resolved with a retest.
> afaik, the load balancing problem which Peter observed remains
> unresolved.
this seems resolved too.
> Has smpnice had appropriate testing for regressions?
it's all green again, and it seems all parties that reported regressions
before retested and there are no outstanding complaints. Having it in
-mm longer probably wont cause additional increase in test coverage. (in
fact bitrot will probably degrade its test status, so i wouldnt wait any
longer with it. We've got the spotlight on it now, so lets try it
upstream while it's still hot and in tester's attention span.)
Ingo
Ingo Molnar <[email protected]> wrote:
>
> it's all green again
>
OK...
It'll take as long as a week to get that far into the -mm queue anyway
(sched is staged 70% of the way through). So unless we hear differently
between now and then, off it all goes. Thanks.
Siddha, Suresh B wrote:
> On Thu, Mar 23, 2006 at 09:57:06AM +1100, [email protected] wrote:
>
>>Quoting Andrew Morton <[email protected]>:
>>
>>>#
>>># "strange load balancing problems": [email protected]
>>>sched-new-sched-domain-for-representing-multi-core.patch
>>>sched-fix-group-power-for-allnodes_domains.patch
>>>x86-dont-use-cpuid2-to-determine-cache-info-if-cpuid4-is-supported.patch
>
>
> I'd like to see the three above patches in 2.6.17. Peters "strange load
> balancing problems" seems to be a false alarm(this patch will have
> minimal impact on a single core cpu because of domain degeneration..) and
> doesn't happen on recent -mm kernels..
I agree. This is no longer a problem and certainly shouldn't prevent
the above patches going in to 2.6.17.
Peter
--
Peter Williams [email protected]
"Learning, n. The kind of ignorance distinguishing the studious."
-- Ambrose Bierce
more issues with smpnice patch...
a) consider a 4-way system (simple SMP system with no HT and cores) scenario
where a high priority task (nice -20) is running on P0 and two normal
priority tasks running on P1. load balance with smp nice code
will never be able to detect an imbalance and hence will never move one of
the normal priority tasks on P1 to idle cpus P2 or P3.
b) smpnice seems to break this patch..
[PATCH] sched: allow the load to grow upto its cpu_power
http://www.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=0c117f1b4d14380baeed9c883f765ee023da8761
example scenario for this case: consider a numa system with two nodes, each
node containing four processors. if there are two processes in node-0 and with
node-1 being completely idle, your patch will move one of those processes to
node-1 whereas the previous behavior will retain those two processes in node-0..
(in this case, in your code max_load will be less than busiest_load_per_task)
smpnice patch has complicated the load balance code... Very difficult
to comprehend the side effects of this patch in the presence of different
priority tasks...
thanks,
suresh
Siddha, Suresh B wrote:
> more issues with smpnice patch...
>
> a) consider a 4-way system (simple SMP system with no HT and cores) scenario
> where a high priority task (nice -20) is running on P0 and two normal
> priority tasks running on P1. load balance with smp nice code
> will never be able to detect an imbalance and hence will never move one of
> the normal priority tasks on P1 to idle cpus P2 or P3.
Why?
>
> b) smpnice seems to break this patch..
>
> [PATCH] sched: allow the load to grow upto its cpu_power
> http://www.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=0c117f1b4d14380baeed9c883f765ee023da8761
>
> example scenario for this case: consider a numa system with two nodes, each
> node containing four processors. if there are two processes in node-0 and with
> node-1 being completely idle, your patch will move one of those processes to
> node-1 whereas the previous behavior will retain those two processes in node-0..
> (in this case, in your code max_load will be less than busiest_load_per_task)
>
> smpnice patch has complicated the load balance code... Very difficult
> to comprehend the side effects of this patch in the presence of different
> priority tasks...
That is NOT true. Without smpnice whether the "right thing" happens
with non zero nice tasks is largely down to luck. With smpnice the
result is far more predictable.
The PURPOSE of smpnice IS to alter load balancing in the face of the use
of non zero nice tasks. The reason for doing this is so that nice
reliably effects the allocation of CPU resources on SMP machines. I.e.
changes in load balancing behaviour as a result of tasks having nice
values other than zero is the desired result and NOT a side effect.
Peter
--
Peter Williams [email protected]
"Learning, n. The kind of ignorance distinguishing the studious."
-- Ambrose Bierce
Peter Williams wrote:
> Siddha, Suresh B wrote:
>> more issues with smpnice patch...
>>
>> a) consider a 4-way system (simple SMP system with no HT and cores)
>> scenario
>> where a high priority task (nice -20) is running on P0 and two normal
>> priority tasks running on P1. load balance with smp nice code
>> will never be able to detect an imbalance and hence will never move
>> one of the normal priority tasks on P1 to idle cpus P2 or P3.
>
> Why?
OK, I think I know why. The load balancing code will always decide that
P0 is the busiest CPU, right?
Peter
--
Peter Williams [email protected]
"Learning, n. The kind of ignorance distinguishing the studious."
-- Ambrose Bierce
Index: MM-2.6.X/kernel/sched.c
===================================================================
--- MM-2.6.X.orig/kernel/sched.c 2006-03-25 13:43:06.000000000 +1100
+++ MM-2.6.X/kernel/sched.c 2006-03-25 13:56:37.000000000 +1100
@@ -2115,6 +2115,7 @@ find_busiest_group(struct sched_domain *
int local_group;
int i;
unsigned long sum_nr_running, sum_weighted_load;
+ unsigned int nr_loaded_cpus = 0; /* where nr_running > 1 */
local_group = cpu_isset(this_cpu, group->cpumask);
@@ -2135,6 +2136,8 @@ find_busiest_group(struct sched_domain *
avg_load += load;
sum_nr_running += rq->nr_running;
+ if (rq->nr_running > 1)
+ ++nr_loaded_cpus;
sum_weighted_load += rq->raw_weighted_load;
}
@@ -2149,7 +2152,7 @@ find_busiest_group(struct sched_domain *
this = group;
this_nr_running = sum_nr_running;
this_load_per_task = sum_weighted_load;
- } else if (avg_load > max_load) {
+ } else if (nr_loaded_cpus && avg_load > max_load) {
max_load = avg_load;
busiest = group;
busiest_nr_running = sum_nr_running;
@@ -2258,16 +2261,16 @@ out_balanced:
static runqueue_t *find_busiest_queue(struct sched_group *group,
enum idle_type idle)
{
- unsigned long load, max_load = 0;
- runqueue_t *busiest = NULL;
+ unsigned long max_load = 0;
+ runqueue_t *busiest = NULL, *rqi;
int i;
for_each_cpu_mask(i, group->cpumask) {
- load = weighted_cpuload(i);
+ rqi = cpu_rq(i);
- if (load > max_load) {
- max_load = load;
- busiest = cpu_rq(i);
+ if (rqi->nr_running > 1 && rqi->raw_weighted_load > max_load) {
+ max_load = rqi->raw_weighted_load;
+ busiest = rqi;
}
}
Siddha, Suresh B wrote:
> more issues with smpnice patch...
>
> a) consider a 4-way system (simple SMP system with no HT and cores) scenario
> where a high priority task (nice -20) is running on P0 and two normal
> priority tasks running on P1. load balance with smp nice code
> will never be able to detect an imbalance and hence will never move one of
> the normal priority tasks on P1 to idle cpus P2 or P3.
Fix already sent.
>
> b) smpnice seems to break this patch..
>
> [PATCH] sched: allow the load to grow upto its cpu_power
> http://www.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=0c117f1b4d14380baeed9c883f765ee023da8761
>
> example scenario for this case: consider a numa system with two nodes, each
> node containing four processors. if there are two processes in node-0 and with
> node-1 being completely idle, your patch will move one of those processes to
> node-1 whereas the previous behavior will retain those two processes in node-0..
> (in this case, in your code max_load will be less than busiest_load_per_task)
I think that the patch I sent to address a) above will also fix this
problem as find_busiest_queue() will no longer find node-0 as the
busiest group unless both of the processes in node-0 are on the same
CPU. This is because it now only considers groups that have at least
one CPU with more than one running task as candidates for being the
busiest group.
Implicit in this is the assumption that it's OK to move one of the tasks
from node-0 to node-1 if they're both on the same CPU within node-0.
Could you confirm this is OK?
Thanks,
Peter
--
Peter Williams [email protected]
"Learning, n. The kind of ignorance distinguishing the studious."
-- Ambrose Bierce
Index: MM-2.6.X/kernel/sched.c
===================================================================
--- MM-2.6.X.orig/kernel/sched.c 2006-03-25 13:56:37.000000000 +1100
+++ MM-2.6.X/kernel/sched.c 2006-03-27 10:15:38.000000000 +1100
@@ -2161,7 +2161,7 @@ find_busiest_group(struct sched_domain *
group = group->next;
} while (group != sd->groups);
- if (!busiest || this_load >= max_load || busiest_nr_running <= 1)
+ if (!busiest || this_load >= max_load)
goto out_balanced;
avg_load = (SCHED_LOAD_SCALE * total_load) / total_pwr;
@@ -2171,6 +2171,9 @@ find_busiest_group(struct sched_domain *
goto out_balanced;
busiest_load_per_task /= busiest_nr_running;
+
+ if (avg_load <= busiest_load_per_task)
+ goto out_balanced;
/*
* We're trying to get all the cpus to the average_load, so we don't
* want to push ourselves above the average load, nor do we wish to