2002-10-02 18:38:40

by Matthew Dobson

[permalink] [raw]
Subject: [rfc][patch] kernel/sched.c oddness?

diff -Nur --exclude-from=/usr/src/.dontdiff linux-2.5.40-vanilla/kernel/sched.c linux-2.5.40-sched_cleanup/kernel/sched.c
--- linux-2.5.40-vanilla/kernel/sched.c Tue Oct 1 00:07:35 2002
+++ linux-2.5.40-sched_cleanup/kernel/sched.c Wed Oct 2 11:22:41 2002
@@ -689,7 +689,7 @@
if (likely(!busiest))
goto out;

- *imbalance = (max_load - nr_running) / 2;
+ *imbalance = (max_load - nr_running);

/* It needs an at least ~25% imbalance to trigger balancing. */
if (!idle && (*imbalance < (max_load + 3)/4)) {
@@ -746,6 +746,11 @@
task_t *tmp;

busiest = find_busiest_queue(this_rq, this_cpu, idle, &imbalance);
+ /*
+ * We only want to steal a number of tasks equal to 1/2 the imbalance,
+ * otherwise, we'll just shift the imbalance to the new queue.
+ */
+ imbalance >>= 1;
if (!busiest)
goto out;


Attachments:
sched_cleanup-2.5.40.patch (826.00 B)

2002-10-03 00:05:57

by Nick Piggin

[permalink] [raw]
Subject: Re: [rfc][patch] kernel/sched.c oddness?

--- linux-2.5/kernel/sched.c.old 2002-10-02 23:14:12.000000000 +1000
+++ linux-2.5/kernel/sched.c 2002-10-03 10:03:41.000000000 +1000
@@ -689,10 +689,10 @@
if (likely(!busiest))
goto out;

- *imbalance = (max_load - nr_running) / 2;
+ *imbalance = max_load - nr_running;

/* It needs an at least ~25% imbalance to trigger balancing. */
- if (!idle && (*imbalance < (max_load + 3)/4)) {
+ if (!idle && (*imbalance*4 < max_load)) {
busiest = NULL;
goto out;
}
@@ -702,10 +702,11 @@
* Make sure nothing changed since we checked the
* runqueue length.
*/
- if (busiest->nr_running <= nr_running + 1) {
+ if (busiest->nr_running <= nr_running) {
spin_unlock(&busiest->lock);
busiest = NULL;
}
+
out:
return busiest;
}
@@ -748,7 +749,12 @@
busiest = find_busiest_queue(this_rq, this_cpu, idle, &imbalance);
if (!busiest)
goto out;
-
+ /*
+ * We only want to steal a number of tasks equal to 1/2 the imbalance,
+ * otherwise, we'll just shift the imbalance to the new queue.
+ */
+ imbalance /= 2;
+
/*
* We first consider expired tasks. Those will likely not be
* executed in the near future, and they are most likely to


Attachments:
sched-balance2.patch (1.14 kB)

2002-10-03 00:28:27

by Matthew Dobson

[permalink] [raw]
Subject: Re: [rfc][patch] kernel/sched.c oddness?

Looks good to me...

Ingo, are we both crazy? It's entirely possible as no one else seems to
have noticed the 'oddities' that Nick, Michael, and myself have...
It'd be nice to know for sure either way...

Cheers!

-Matt

Nick Piggin wrote:
> Hi Matt,
> I think your patch is good except that it would be fine to use a /= 2 as
> the
> compiler will turn this into a shift, and imbalance is not really valid (or
> used) if busiest == NULL. Actually there are a number of things (I think)
> still wrong with this! I posted a patch to Ingo - I guess I should have
> cc'ed the list - we should combine our efforts! Here is my explanation and
> a new patch including both our fixes.
>
> Regards,
> Nick
>
> Nick Piggin wrote:
>
>> Just a quick follow up
>> * using imbalance (not divided by 2) instead of nr_running in
>> is_unbalanced
>> would save a multiply by 3
>> * it is a more accurate with a small number of processes due to no
>> divisions
>> * perhaps the last test should actually be
>> if(busiest->nr_running <= nr_running)
>>
>> Regards,
>> Nick
>>
>> Nick Piggin wrote:
>>
>>> Dear Ingo,
>>> I apologise in advance if the current behaviour of the scheduler is
>>> the result
>>> of tuning / special casing you did in the scheduler I didn't follow its
>>> development closely. However, I noticed on my 2xSMP system that quite
>>> unbalanced loads weren't getting even CPU time best example - 3
>>> processes in
>>> busywait loops - one would get 100% of one cpu while two would get
>>> 50% each of
>>> the other.
>>>
>>> I think find_busiest_queue might have a few problems:
>>> * the "imbalance" value divided by 2 loses meaning and precision esp.
>>> with
>>> small number of processes
>>>
>>> * in the 25% imbalance calculation presumably the + should be a *
>>>
>>> * the 25% calculation uses the divided imbalance value which is
>>> inaccurate
>>>
>>> * i think the 25% imbalance calculation should use a quarter of
>>> max_load, not
>>> three quarters.
>>>
>>> * the second test (after the lock) isn't very good, its cheaper than
>>> the mul
>>> but having battled contention and claimed the lock it would be a
>>> shame to
>>> not do anything with it!
>>>
>>> Now top says the patch has fixed up the balancing but perhaps some of
>>> these
>>> aren't bugs - perhaps they need a comment because they are a bit
>>> subtle! If
>>> they are, please recommend the patch to Linus.
>>>
>>> Regards,
>>> Nick
>>>
>>>
>>
>
>
> Matthew Dobson wrote:
>
>> This snippet of code appears wrong... Either that, or the
>> accompanying comment is wrong?
>>
>> from kernel/sched.c::find_busiest_queue():
>>
>> | *imbalance = (max_load - nr_running) / 2;
>> |
>> | /* It needs an at least ~25% imbalance to trigger balancing. */
>> | if (!idle && (*imbalance < (max_load + 3)/4)) {
>> | busiest = NULL;
>> | goto out;
>> | }
>>
>> The comment says 25% imbalance, but the code is really checking for a
>> ~50% imbalance. The attatched patch moves the division by two to the
>> pull_task function where the imbalance number is actually used. This
>> patch makes the code match the comment, and divides the imbalance by
>> two where it is needed.
>>
>> Please let me know if I've misinterpreted what this code is supposed
>> to be doing, -or- if we really want the comment to match the current
>> code.
>>
>> Cheers!
>>
>> -Matt
>
>
>
>
> ------------------------------------------------------------------------
>
> --- linux-2.5/kernel/sched.c.old 2002-10-02 23:14:12.000000000 +1000
> +++ linux-2.5/kernel/sched.c 2002-10-03 10:03:41.000000000 +1000
> @@ -689,10 +689,10 @@
> if (likely(!busiest))
> goto out;
>
> - *imbalance = (max_load - nr_running) / 2;
> + *imbalance = max_load - nr_running;
>
> /* It needs an at least ~25% imbalance to trigger balancing. */
> - if (!idle && (*imbalance < (max_load + 3)/4)) {
> + if (!idle && (*imbalance*4 < max_load)) {
> busiest = NULL;
> goto out;
> }
> @@ -702,10 +702,11 @@
> * Make sure nothing changed since we checked the
> * runqueue length.
> */
> - if (busiest->nr_running <= nr_running + 1) {
> + if (busiest->nr_running <= nr_running) {
> spin_unlock(&busiest->lock);
> busiest = NULL;
> }
> +
> out:
> return busiest;
> }
> @@ -748,7 +749,12 @@
> busiest = find_busiest_queue(this_rq, this_cpu, idle, &imbalance);
> if (!busiest)
> goto out;
> -
> + /*
> + * We only want to steal a number of tasks equal to 1/2 the imbalance,
> + * otherwise, we'll just shift the imbalance to the new queue.
> + */
> + imbalance /= 2;
> +
> /*
> * We first consider expired tasks. Those will likely not be
> * executed in the near future, and they are most likely to


2002-10-03 06:38:27

by Ingo Molnar

[permalink] [raw]
Subject: Re: [rfc][patch] kernel/sched.c oddness?


> [...] However, I noticed on my 2xSMP system that quite unbalanced loads
> weren't getting even CPU time best example - 3 processes in busywait
> loops - one would get 100% of one cpu while two would get 50% each of
> the other.

this was done intentionally, and this scenario (1+2 tasks) is the very
worst scenario. The problem is that by trying to balance all 3 tasks we
now have 3 tasks that trash their cache going from one CPU to another.
(this is what happens with your patch - even with another approach we'd
have to trash at least one task)

By keeping 2 tasks on one CPU and 1 task on the other CPU we avoid
cross-CPU migration of threads. Think about the 2+3 or 4+5 tasks case
rather, do we want absolutely perfect balancing, or good SMP affinity and
good combined performance?

Ingo

2002-10-03 08:27:43

by Nick Piggin

[permalink] [raw]
Subject: Re: [rfc][patch] kernel/sched.c oddness?


* Make sure nothing changed since we checked the
* runqueue length.
*/
- if (busiest->nr_running <= nr_running + 1) {
+ if (busiest->nr_running <= nr_running) {
spin_unlock(&busiest->lock);
busiest = NULL;
}

OK, thanks for the explanation, then disregard this part of the patch, the
other two are still valid I think.

Ingo Molnar wrote:

>>[...] However, I noticed on my 2xSMP system that quite unbalanced loads
>>weren't getting even CPU time best example - 3 processes in busywait
>>loops - one would get 100% of one cpu while two would get 50% each of
>>the other.
>>
>
>this was done intentionally, and this scenario (1+2 tasks) is the very
>worst scenario. The problem is that by trying to balance all 3 tasks we
>now have 3 tasks that trash their cache going from one CPU to another.
>(this is what happens with your patch - even with another approach we'd
>have to trash at least one task)
>
>By keeping 2 tasks on one CPU and 1 task on the other CPU we avoid
>cross-CPU migration of threads. Think about the 2+3 or 4+5 tasks case
>rather, do we want absolutely perfect balancing, or good SMP affinity and
>good combined performance?
>
> Ingo
>
>
>
>


2002-10-03 21:12:50

by Matthew Dobson

[permalink] [raw]
Subject: Re: [rfc][patch] kernel/sched.c oddness?

diff -Nur --exclude-from=/usr/src/.dontdiff linux-2.5.40-vanilla/kernel/sched.c linux-2.5.40-sched_cleanup/kernel/sched.c
--- linux-2.5.40-vanilla/kernel/sched.c Tue Oct 1 00:07:35 2002
+++ linux-2.5.40-sched_cleanup/kernel/sched.c Thu Oct 3 14:09:31 2002
@@ -689,10 +689,10 @@
if (likely(!busiest))
goto out;

- *imbalance = (max_load - nr_running) / 2;
+ *imbalance = max_load - nr_running;

/* It needs an at least ~25% imbalance to trigger balancing. */
- if (!idle && (*imbalance < (max_load + 3)/4)) {
+ if (!idle && (*imbalance <= (max_load + 3)/4)) {
busiest = NULL;
goto out;
}
@@ -746,6 +746,11 @@
task_t *tmp;

busiest = find_busiest_queue(this_rq, this_cpu, idle, &imbalance);
+ /*
+ * We only want to steal a number of tasks equal to 1/2 the imbalance,
+ * otherwise, we'll just shift the imbalance to the new queue.
+ */
+ imbalance /= 2;
if (!busiest)
goto out;


Attachments:
sched_cleanup-2.5.40.patch (912.00 B)

2002-10-04 07:30:14

by Ingo Molnar

[permalink] [raw]
Subject: Re: [rfc][patch] kernel/sched.c oddness?


On Thu, 3 Oct 2002, Matthew Dobson wrote:

> /* It needs an at least ~50% imbalance to trigger balancing. */
>
> Either way works for me. I'd like to see something done, as the
> comments don't match the code right now...

the patch looks good to me - i'll add it to my next scheduler patchset,
after some testing on bigger SMP boxes. Balancing tends to be a very
volatile area.

Ingo