LinuxLists.cc - sys_sched

2001-03-10 00:48:52

Subject: sys_sched_yield fast path

Any thoughts about adding a 'fast path' to the SMP code in
sys_sched_yield. Why not compare nr_pending to smp_num_cpus
before examining the aligned_data structures? Something like,

if (nr_pending > smp_num_cpus)
goto set_resched_now;

Where set_resched_now is a label placed just before the code
that sets the need_resched field of the current process.
This would eliminate touching all the aligned_data cache lines
in the case where nr_pending can never be decremented to zero.

Also, would it make sense to stop decrementing nr_pending to
prevent it from going negative? OR Is the reasoning that in
these cases there is so much 'scheduling' activity that we
should force the reschedule?

--
Mike Kravetz [email protected]
IBM Linux Technology Center

2001-03-10 10:08:02

by Davide Libenzi

[permalink] [raw]

Subject: RE: sys_sched_yield fast path

On 10-Mar-2001 Mike Kravetz wrote:
> Any thoughts about adding a 'fast path' to the SMP code in
> sys_sched_yield. Why not compare nr_pending to smp_num_cpus
> before examining the aligned_data structures? Something like,
>
> if (nr_pending > smp_num_cpus)
> goto set_resched_now;
>
> Where set_resched_now is a label placed just before the code
> that sets the need_resched field of the current process.
> This would eliminate touching all the aligned_data cache lines
> in the case where nr_pending can never be decremented to zero.
>
> Also, would it make sense to stop decrementing nr_pending to
> prevent it from going negative? OR Is the reasoning that in
> these cases there is so much 'scheduling' activity that we
> should force the reschedule?

Probably the rate at which is called sys_sched_yield() is not so high to let
the performance improvement to be measurable.
If You're going to measure the schedule() speed with the test program in which
the schedule() rate is the same of the sched_yield() rate, this could clean Your
measure of the schedule() speed.

- Davide

2001-03-10 17:00:37

by Andi Kleen

[permalink] [raw]

Subject: Re: sys_sched_yield fast path

Davide Libenzi <[email protected]> writes:

> Probably the rate at which is called sys_sched_yield() is not so high to let
> the performance improvement to be measurable.

LinuxThreads mutexes call sched_yield() when a lock is locked, so when you
have a multithreaded program with some lock contention it'll be called a lot.

-Andi

2001-03-11 12:50:09

by Davide Libenzi

[permalink] [raw]

Subject: Re: sys_sched_yield fast path

On 10-Mar-2001 Andi Kleen wrote:
> Davide Libenzi <[email protected]> writes:
>
>
>> Probably the rate at which is called sys_sched_yield() is not so high to let
>> the performance improvement to be measurable.
>
> LinuxThreads mutexes call sched_yield() when a lock is locked, so when you
> have a multithreaded program with some lock contention it'll be called a
> lot.

This is the linux thread spinlock acquire :

static void __pthread_acquire(int * spinlock)
{
int cnt = 0;
struct timespec tm;

while (testandset(spinlock)) {
if (cnt < MAX_SPIN_COUNT) {
sched_yield();
cnt++;
} else {
tm.tv_sec = 0;
tm.tv_nsec = SPIN_SLEEP_DURATION;
nanosleep(&tm, NULL);
cnt = 0;
}
}
}

Yes, it calls sched_yield() but this is not a std wait for mutex but for
spinlocks that are hold a very short time.
Real wait are implemented using signals.
More, with the new implementation of sys_sched_yield() the task release all its
time quantum so, even in a case where a task repeatedly calls sched_yield() the
call rate is not so high if there is at least one process to spin.
And if there isn't one task with goodness() > 0, nobody cares about
sched_yield() performance.

- Davide

2001-03-11 13:57:17

by Anton Blanchard

[permalink] [raw]

Subject: Re: sys_sched_yield fast path

> This is the linux thread spinlock acquire :
>
>
> static void __pthread_acquire(int * spinlock)
> {
> int cnt = 0;
> struct timespec tm;
>
> while (testandset(spinlock)) {
> if (cnt < MAX_SPIN_COUNT) {
> sched_yield();
> cnt++;
> } else {
> tm.tv_sec = 0;
> tm.tv_nsec = SPIN_SLEEP_DURATION;
> nanosleep(&tm, NULL);
> cnt = 0;
> }
> }
> }
>
>
> Yes, it calls sched_yield() but this is not a std wait for mutex but for
> spinlocks that are hold a very short time. Real wait are implemented using
> signals. More, with the new implementation of sys_sched_yield() the task
> release all its time quantum so, even in a case where a task repeatedly calls
> sched_yield() the call rate is not so high if there is at least one process
> to spin. And if there isn't one task with goodness() > 0, nobody cares about
> sched_yield() performance.

The problem I found with sched_yield is that things break down with high
levels of contention. If you have 3 processes and one has a lock then
the other two can ping pong doing sched_yield() until their priority drops
below the process with the lock. eg in a run I just did then where 2
has the lock:

1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
2

Perhaps we need something like sched_yield that takes off some of
tsk->counter so the task with the spinlock will run earlier.

Anton

2001-03-11 19:21:19

by Dave Zarzycki

[permalink] [raw]

Subject: Re: sys_sched_yield fast path

On Mon, 12 Mar 2001, Anton Blanchard wrote:

> Perhaps we need something like sched_yield that takes off some of
> tsk->counter so the task with the spinlock will run earlier.

Personally speaking, I wish sched_yield() API was like so:

int sched_yield(pid_t pid);

The pid argument would be advisory, of course, the kernel doesn't have to
honor it.

This would allow the thread wanting to acquire the spinlock to yield
specifically to the thread holding the lock (assuming the pid of the lock
holder was stored in the spinlock...) In fact, the the original lock owner
could in theory yield back to the threading wanting to acquire the lock.

Feedback from the scheduling gurus would be appreciated.

Thanks,

davez

--
Dave Zarzycki
http://thor.sbay.org/~dave/

2001-03-11 22:23:37

by Davide Libenzi

[permalink] [raw]

Subject: Re: sys_sched_yield fast path

2001-03-11 22:47:22

by Davide Libenzi

[permalink] [raw]

Subject: Re: sys_sched_yield fast path

On 11-Mar-2001 Anton Blanchard wrote:
>
>> This is the linux thread spinlock acquire :
>>
>>
>> static void __pthread_acquire(int * spinlock)
>> {
>> int cnt = 0;
>> struct timespec tm;
>>
>> while (testandset(spinlock)) {
>> if (cnt < MAX_SPIN_COUNT) {
>> sched_yield();
>> cnt++;
>> } else {
>> tm.tv_sec = 0;
>> tm.tv_nsec = SPIN_SLEEP_DURATION;
>> nanosleep(&tm, NULL);
>> cnt = 0;
>> }
>> }
>> }
>>
>>
>> Yes, it calls sched_yield() but this is not a std wait for mutex but for
>> spinlocks that are hold a very short time. Real wait are implemented using
>> signals. More, with the new implementation of sys_sched_yield() the task
>> release all its time quantum so, even in a case where a task repeatedly
>> calls
>> sched_yield() the call rate is not so high if there is at least one process
>> to spin. And if there isn't one task with goodness() > 0, nobody cares
>> about
>> sched_yield() performance.
>
> The problem I found with sched_yield is that things break down with high
> levels of contention. If you have 3 processes and one has a lock then
> the other two can ping pong doing sched_yield() until their priority drops
> below the process with the lock. eg in a run I just did then where 2
> has the lock:
>
> 1
> 0
> 1
> 0
> 1
> 0
> 1
> 0
> 1
> 0
> 1
> 0
> 1
> 0
> 1
> 0
> 1
> 0
> 2
>
> Perhaps we need something like sched_yield that takes off some of
> tsk->counter so the task with the spinlock will run earlier.

2.4.x has changed the scheduler behaviour so that the task that call
sched_yield() is not rescheduled by the incoming schedule().
A flag is set ( under certain conditions in SMP ) and the goodness()
calculation assign the lower value to the exiting task ( this flag is cleared
in schedule_tail() ).
This could give the task owning the lock the opportunity to complete the locked
code.
But yes, if the locked code is rescheduled for some reason ( timeslice or I/O )
the yielding task will run again.
But this is a software design problem, not a sched_yield() one coz, if the time
path between lock ans unlock can be high the use of sched_yield() is not the
best way to wait.
Wait queue or user space equivalences are a better choice to do this.

- Davide

2001-03-11 22:55:43

by Davide Libenzi

[permalink] [raw]

Subject: Re: sys_sched_yield fast path

On 11-Mar-2001 Dave Zarzycki wrote:
> On Mon, 12 Mar 2001, Anton Blanchard wrote:
>
>> Perhaps we need something like sched_yield that takes off some of
>> tsk->counter so the task with the spinlock will run earlier.
>
> Personally speaking, I wish sched_yield() API was like so:
>
> int sched_yield(pid_t pid);

Yes, You could do an API like this but it's not the mean of sched_yield().

> This would allow the thread wanting to acquire the spinlock to yield
> specifically to the thread holding the lock (assuming the pid of the lock
> holder was stored in the spinlock...) In fact, the the original lock owner
> could in theory yield back to the threading wanting to acquire the lock.

Everything happens inside a spinlock should be very fast otherwise the use of a
spinlock should be avoided.

- Davide

2001-03-12 01:26:44

by Anton Blanchard

[permalink] [raw]

Subject: Re: sys_sched_yield fast path

Hi,

> 2.4.x has changed the scheduler behaviour so that the task that call
> sched_yield() is not rescheduled by the incoming schedule(). A flag is
> set ( under certain conditions in SMP ) and the goodness() calculation
> assign the lower value to the exiting task ( this flag is cleared in
> schedule_tail() ).

The behaviour I am talking about is when there is a heavily contended
spinlock, and more than one task is trying to obtain it. Since SCHED_YIELD
only changes the goodness when we are trying to reschedule the task we
can bounce between two or more tasks doing sched_yield() for a while before
finally running the task that has the spinlock.

Of course with short lived spinlocks you should rarely get the case where
a task grabs a spinlock just before its timeslice is up, so maybe the answer
is just to spin a few times on sched_yield() then back off to nanosleep()
like pthreads does.

Anton