2006-09-03 13:41:43

by Vincent Pelletier

[permalink] [raw]
Subject: [PATCH] sched.c: Be a bit more conservative in SMP

I've often seen the following use case happening on the few linux SMP boxes
I have access to : one process eats one cpu becaus eit has a big
computation to do, all cpu being idle, and the process keeps on hopping
from one cpu to another.
This patch is a quick try to make this behaviour disapear without requiring
to bind all processes manually with taskset.
I don't know if there is any practical performance increase (although I
believe there locally is).

Patch principle is simple :
When calculating the load of "source" cpu (the one the process is on)
substract one to the number of runing processes so we don't count the
process to be balanced.
As I only know sched.c for 5 minutes, I added a max(..., 0) to make sure the
load can't be negative if the function happens to be called on a cpu with
only idle tasks. No idea if it can actually happen.

I tested its efficiency this way :
Before :
-start a command eating one full cpu on an idle smp machine.
I used dd if=/dev/urandom of=/dev/null.
-wait for ~30 seconds, and see that it switched to another cpu.
After :
-repeat the same test and see that it does not switch to another cpu (the
patch does what it's meant to).
-start a second dd, and bind both to the same cpu with taskset, then free
one of them (allow it to use 2 cpus, including the one it can already
access) and see that the task gets moved to the second cpu (load balancing
still works).

Disclaimer :
This patch is just the result of a 5 minutes hacking rush. Although I think
it technically work, I'm no SMP expert.

--- linux-2.6-2.6.17/kernel/sched.c 2006-06-18 03:49:35.000000000 +0200
+++ linux-2.6-2.6.17-conservative/kernel/sched.c 2006-09-03
13:18:11.000000000 +0200
@@ -952,7 +952,7 @@ void kick_process(task_t *p)
static inline unsigned long source_load(int cpu, int type)
{
runqueue_t *rq = cpu_rq(cpu);
- unsigned long load_now = rq->nr_running * SCHED_LOAD_SCALE;
+ unsigned long load_now = (max(rq->nr_running - 1, 0)) *
SCHED_LOAD_SCALE;
if (type == 0)
return load_now;

--
Vincent Pelletier


Attachments:
(No filename) (2.04 kB)
(No filename) (189.00 B)
Download all attachments

2006-09-03 17:11:00

by Vincent Pelletier

[permalink] [raw]
Subject: Re: [PATCH] sched.c: Be a bit more conservative in SMP

Forgot the signed-off-by line in previous mail. Reposting same patch just in
case. CC to maintainer as advised in the FAQ.

Signed-off-by: Vincent Pelletier <[email protected]>

--- linux-2.6-2.6.17/kernel/sched.c 2006-06-18 03:49:35.000000000 +0200
+++ linux-2.6-2.6.17-conservative/kernel/sched.c 2006-09-03
13:18:11.000000000 +0200
@@ -952,7 +952,7 @@ void kick_process(task_t *p)
static inline unsigned long source_load(int cpu, int type)
{
runqueue_t *rq = cpu_rq(cpu);
- unsigned long load_now = rq->nr_running * SCHED_LOAD_SCALE;
+ unsigned long load_now = (max(rq->nr_running - 1, 0)) *
SCHED_LOAD_SCALE;
if (type == 0)
return load_now;

--
VGER BF report: U 0.500348

2006-09-06 23:30:57

by Vincent Pelletier

[permalink] [raw]
Subject: Re: [PATCH] sched.c: Be a bit more conservative in SMP

I found one maybe-drawback to this change :
When runing n+1 process (n = number of cpu), one takes one cpu, the other 2
share another cpu. And, because of this patch, all processes stay in their
own cpu, so one always has 100% of cpu power, the 2 others get 50% each.
In current implementation, one of the 2 processes from the same cpu would
migrate to the other cpu, and so on, somehow sharing cpu time among them.
Is it a feature or a side effect of current implementation ?

I'll do some tests soon to see which version gives better performance at a
higher level than just process migration cost - if different at all.
--
Vincent Pelletier

2006-09-19 13:50:18

by Ludovic Drolez

[permalink] [raw]
Subject: Re: [PATCH] sched.c: Be a bit more conservative in SMP

Vincent Pelletier <subdino2004 <at> yahoo.fr> writes:
> I've often seen the following use case happening on the few linux SMP boxes
> I have access to : one process eats one cpu becaus eit has a big
> computation to do, all cpu being idle, and the process keeps on hopping
> from one cpu to another.
> This patch is a quick try to make this behaviour disapear without requiring
> to bind all processes manually with taskset.
> I don't know if there is any practical performance increase (although I
> believe there locally is).

Hi !

Do you know if your patch has been included somewhere ?
We have the same problem on a HPCC here with 4 CPUs per MB, and I don't like
playing with taskset (moreover, performance under Windows *much* is better
without any tuning, shame on us), it would be nice to see less migration when
it's not needed...

Cheers,

Ludovic.





2006-09-19 14:06:51

by Ludovic Drolez

[permalink] [raw]
Subject: Re: [PATCH] sched.c: Be a bit more conservative in SMP

Vincent Pelletier <vincent.plr <at> wanadoo.fr> writes:
> I'll do some tests soon to see which version gives better performance at a
> higher level than just process migration cost - if different at all.

I think that your patch should improve the performance because process
migrations are expensive (cache miss) and should be avoided when not
really necessary.

Cheers,

Ludovic.


2006-09-19 17:50:47

by Antonio Vargas

[permalink] [raw]
Subject: Re: [PATCH] sched.c: Be a bit more conservative in SMP

On 9/19/06, Ludovic Drolez <[email protected]> wrote:
> Vincent Pelletier <vincent.plr <at> wanadoo.fr> writes:
> > I'll do some tests soon to see which version gives better performance at a
> > higher level than just process migration cost - if different at all.
>
> I think that your patch should improve the performance because process
> migrations are expensive (cache miss) and should be avoided when not
> really necessary.
>
> Cheers,
>
> Ludovic.
>

A variant on this theme would be (not tested or somewhat, just a
random idea for considering):

1. find if the process is a cpu-hog, if not then ignore

2. find somehow how much time has this process on it's current cpu

3. then, instead of always substracting 1 from th current load on the
current cpu, substract for example 1...0 when running from 0 to 60
seconds... this way cpu hogs would only rotate slowly?

in code:

number_to_sub_from_queue_load = (256 - min(256,
time_from_last_change_of_cpu)) >> 8;

somehow managing to get fixedpoint loadlevels on the runqueues would
make this work better....


--
Greetz, Antonio Vargas aka winden of network

http://network.amigascne.org/
[email protected]
[email protected]

Every day, every year
you have to work
you have to study
you have to scene.

2006-09-20 07:43:02

by Ludovic Drolez

[permalink] [raw]
Subject: Re: [PATCH] sched.c: Be a bit more conservative in SMP

Antonio Vargas wrote:
> A variant on this theme would be (not tested or somewhat, just a
> random idea for considering):
>
> 1. find if the process is a cpu-hog, if not then ignore
>
> 2. find somehow how much time has this process on it's current cpu
>
> 3. then, instead of always substracting 1 from th current load on the
> current cpu, substract for example 1...0 when running from 0 to 60
> seconds... this way cpu hogs would only rotate slowly?
>
> in code:
>
> number_to_sub_from_queue_load = (256 - min(256,
> time_from_last_change_of_cpu)) >> 8;
>
> somehow managing to get fixedpoint loadlevels on the runqueues would
> make this work better....
>

Yes ! That might be a better idea !
In fact, I tested the 1st patch on our cluster (Finite elements computing on 8
CPUs):
- Under Windows: 875 seconds
- Linux 2.6.16 : 1019 s
- Linux 2.6.16 + manual taskset : 842 s
- Linux 2.6.16 + Vincent's patch : 1373 s :-(

If you find time to write a patch, Antonio, I would be pleased to try it !

Cheers,

--
Ludovic DROLEZ
http://lrs.linbox.org - Free asset management software

2006-09-21 18:36:27

by Vincent Pelletier

[permalink] [raw]
Subject: Re: [PATCH] sched.c: Be a bit more conservative in SMP

Le mercredi 20 septembre 2006 09:42, Ludovic Drolez a ?crit?:
> Yes ! That might be a better idea !
> In fact, I tested the 1st patch on our cluster (Finite elements computing
> on 8 CPUs):
> - Under Windows: 875 seconds
> - Linux 2.6.16 : 1019 s
> - Linux 2.6.16 + manual taskset : 842 s
> - Linux 2.6.16 + Vincent's patch : 1373 s :-(

I was afraid of this :/.
I did some quick tests, and I got non-significant results. I tried building a
kernel with different make -j parameters, and there was like a few seconds of
difference, and not always in favour of the same version.

I find it strange that you get such horrible results...
Maybe I was completely wrong with my assumption that one running process
always has an impact of 1, which would have make the scheduler underestimate
the load on one cpu and put too many processes on it, without moving them
afterward.

--
Vincent Pelletier


Attachments:
(No filename) (898.00 B)
(No filename) (189.00 B)
Download all attachments

2006-09-22 07:24:33

by Ludovic Drolez

[permalink] [raw]
Subject: Re: [PATCH] sched.c: Be a bit more conservative in SMP

Vincent Pelletier wrote:
> Maybe I was completely wrong with my assumption that one running process
> always has an impact of 1, which would have make the scheduler underestimate
> the load on one cpu and put too many processes on it, without moving them
> afterward.

Yes, maybe that's the problem, since in my bench, one process takes only 40% of
the CPU.

Cheers,

--
Ludovic DROLEZ Linbox / Free&ALter Soft
http://www.linbox.com http://www.linbox.org tel: +33 3 87 50 87 90
152 rue de Grigy - Technopole Metz 2000 57070 METZ

2006-09-22 12:31:15

by Antonio Vargas

[permalink] [raw]
Subject: Re: [PATCH] sched.c: Be a bit more conservative in SMP

On 9/22/06, Ludovic Drolez <[email protected]> wrote:
> Vincent Pelletier wrote:
> > Maybe I was completely wrong with my assumption that one running process
> > always has an impact of 1, which would have make the scheduler underestimate
> > the load on one cpu and put too many processes on it, without moving them
> > afterward.
>
> Yes, maybe that's the problem, since in my bench, one process takes only 40% of
> the CPU.
>
> Cheers,
>
> --
> Ludovic DROLEZ Linbox / Free&ALter Soft
> http://www.linbox.com http://www.linbox.org tel: +33 3 87 50 87 90
> 152 rue de Grigy - Technopole Metz 2000 57070 METZ
> -

Provided you have enough memory, the somewhat better way to test this
is to turn off swap, copy the sources to a tmpfs directory and compile
there. Then any disks accesses would be only related to reloading code
pages from the compiler / daemons /shared libs, which having even more
ram would solve so that it's all compute bound. I guess even 1.5Gb of
ram is plenty for all this, and not so much costly nowdays for a
kernel hacker ;)


--
Greetz, Antonio Vargas aka winden of network

http://network.amigascne.org/
[email protected]
[email protected]

Every day, every year
you have to work
you have to study
you have to scene.