2006-08-23 15:24:36

by Rich Paredes

[permalink] [raw]
Subject: SMP Affinity and nice

I am trying to come to an understanding as to why 1 process is getting
less cpu time than identical processes with a higher "nice" value.
Server has 2 physical processors with hyperthreading (cpu 0,1,2,3)

I am starting 5 processes that perform a square root loop to max out a
cpu. They use the exact same code but are renamed for identification:
cpumax1, cpumax2, cpumax3, cpumax4, cpumax5
I start in order:
1. nice -n 10 cpumax1
2. nice -n 10 cpumax2
3. nice -n 10 cpumax3
4. nice -n 10 cpumax4
5. nice -n 0 cpumax5

Here is the top output:
PR NI S %CPU TIME+ P COMMAND
35 10 R 99.9 1:46.90 3 cpumax1
35 10 R 99.9 1:41.01 1 cpumax3
35 10 R 99.9 1:39.48 0 cpumax4
25 0 R 66.9 1:03.13 2 cpumax5
35 10 R 33.0 0:39.30 2 cpumax2

cpumax1 is using processor 3, 99%
cpumax2 is using processor 2, 33%
cpumax3 is using processor 1, 99%
cpumax4 is using processor 0, 99%
cpumax5 is using processor 2, 66%

So since cpumax5 has a lower nice value and thus a higher priority (25 in
this case), shouldn't it be given it's own cpu. If I give cpumax5 a nice
value of -20, it does start using it's own cpu. I don't want to manage
cpu affinity via taskset command.

My explanation would be that since the scheduler tries to limit cpu
affinity, the nice value of 0 isn't enough to get the scheduler to move
this process to another processors run queue. I could be totally wrong
here though.

I should also note here that this test is totally dependent on the order
of startup. If I start cpumax5 first with a nice value of 0 before the
other 4, it will get it's own cpu:
PR NI S %CPU TIME+ P COMMAND
35 10 R 99.9 1:00.03 3 cpumax2
25 0 R 99.9 1:08.01 1 cpumax5
35 10 R 99.9 1:03.69 2 cpumax1
35 10 R 50.3 0:29.02 0 cpumax3
35 10 R 49.6 0:26.37 0 cpumax4

I just want to understand this better. Thanks.


2006-08-23 15:36:27

by Jan Engelhardt

[permalink] [raw]
Subject: Re: SMP Affinity and nice

>Subject: SMP Affinity and nice
>
>I am trying to come to an understanding as to why 1 process is getting
>less cpu time than identical processes with a higher "nice" value.
>Server has 2 physical processors with hyperthreading (cpu 0,1,2,3)
>
>I am starting 5 processes that perform a square root loop to max out a
>cpu. They use the exact same code but are renamed for identification:
>cpumax1, cpumax2, cpumax3, cpumax4, cpumax5
[...]

What you describe should be addressed in the -ck patchset (smpnice-...diff)
Not sure if it is in mainline already, though.



Jan Engelhardt
--

2006-08-23 20:54:59

by Chris Friesen

[permalink] [raw]
Subject: Re: SMP Affinity and nice

Rich Paredes wrote:

> So since cpumax5 has a lower nice value and thus a higher priority (25 in
> this case), shouldn't it be given it's own cpu. If I give cpumax5 a nice
> value of -20, it does start using it's own cpu.
>
> My explanation would be that since the scheduler tries to limit cpu
> affinity, the nice value of 0 isn't enough to get the scheduler to move
> this process to another processors run queue. I could be totally wrong
> here though.

I think you are correct. The load balancer doesn't think that this is
enough of an imbalance to go through the effort of swapping two
processes around.

Chris

2006-08-24 01:30:15

by Peter Williams

[permalink] [raw]
Subject: Re: SMP Affinity and nice

Jan Engelhardt wrote:
>> Subject: SMP Affinity and nice
>>
>> I am trying to come to an understanding as to why 1 process is getting
>> less cpu time than identical processes with a higher "nice" value.
>> Server has 2 physical processors with hyperthreading (cpu 0,1,2,3)
>>
>> I am starting 5 processes that perform a square root loop to max out a
>> cpu. They use the exact same code but are renamed for identification:
>> cpumax1, cpumax2, cpumax3, cpumax4, cpumax5
> [...]
>
> What you describe should be addressed in the -ck patchset (smpnice-...diff)
> Not sure if it is in mainline already, though.

It's coming in 2.6.18.

Rich,
What kernel version are you using when you see this phenomenon?

Peter
--
Peter Williams [email protected]

"Learning, n. The kind of ignorance distinguishing the studious."
-- Ambrose Bierce

2006-08-24 04:39:24

by Peter Williams

[permalink] [raw]
Subject: Re: SMP Affinity and nice

Rich Paredes wrote:
> Hi Peter. 2.6.5
>

OK. That version does not take tasks' nice values into account when
doing load balancing and just tries to evenly distribute the tasks among
the CPUs. The smpnice patches (to be included in the 2.6.18 kernel)
change this so that nice is taken into account when load balancing is
done and should fix your problem.

Peter
--
Peter Williams [email protected]

"Learning, n. The kind of ignorance distinguishing the studious."
-- Ambrose Bierce

2006-08-24 05:25:04

by Peter Williams

[permalink] [raw]
Subject: Re: SMP Affinity and nice

Chris Friesen wrote:
> Rich Paredes wrote:
>
>> So since cpumax5 has a lower nice value and thus a higher priority (25 in
>> this case), shouldn't it be given it's own cpu. If I give cpumax5 a
>> nice
>> value of -20, it does start using it's own cpu.
>>
>> My explanation would be that since the scheduler tries to limit cpu
>> affinity, the nice value of 0 isn't enough to get the scheduler to move
>> this process to another processors run queue. I could be totally wrong
>> here though.
>
> I think you are correct. The load balancer doesn't think that this is
> enough of an imbalance to go through the effort of swapping two
> processes around.

The kernel in use (2.6.5) doesn't take nice into account during load
balancing and just allocates the 5 tasks among the 4 CPUs in a way that
tries to give each CPU the same number of tasks. It also tries not to
move tasks around too much so when it has found a solution that
satisfies that criterion it leaves the tasks there.

5 tasks among 4 CPUs means 1 task each for 3 of the CPUs and 2 tasks for
the other CPU. As nice isn't taken into account it is purely down to
chance whether or not the high priority task ends up being one of those
that gets a CPU to itself or has to share with another task. Some
elementary probability theory should enable the probability of a "good"
outcome (i.e. the high priority task not having to share) to be calculated.

This is an example of the type of situation that the smpnice patches
were designed to handle. They take nice into account and should ensure
that the high priority does get a CPU to itself in this scenario. They
are scheduled for release in the 2.6.18 kernel.

Peter
--
Peter Williams [email protected]

"Learning, n. The kind of ignorance distinguishing the studious."
-- Ambrose Bierce