2008-06-08 11:34:29

by Jakub W. Jozwicki

[permalink] [raw]
Subject: sched_yield() on 2.6.25

Hello,
I observe strange behavior of sched_yield() on 2.6.25 (strange comparing to
2.6.24). Here is the code (available at
http://systest.googlecode.com/files/systest20080119.tgz):

------------------------------------------------------
timer_t timer;
sig_atomic_t cnt = 0;
long long sum = 0;
long times[21], min, max;
pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;
struct timespec ts = { 0, 0 };
pthread_t last_th = 0;

void *th_proc(void* p) {
int n = SIZE(times) -1;
pthread_t th;

while(1) {
pthread_mutex_lock(&mutex);
th = pthread_self();
if (pthread_equal(th,last_th)) {
pthread_mutex_unlock(&mutex);
sched_yield();
continue;
}
rt_timer_stop(&ts);
last_th = th;
if (cnt>=1) {
times[cnt-1] = ts_sum(&ts);
if (cnt <= n) {
sum += times[cnt-1];
box(times[cnt-1],min,max);
#define uint unsigned int
printf("[%u] Thread switching time: %ldns\n",(uint)th, times[cnt-1]);
}
else {
printf("[%u] Thread switching time (not counted): %ldns\n",(uint)th,
times[cnt-1]);
}
cnt--;
}
....
-----------------------------------------------------
and here are the results:

Setting cpu mask to 1

-- SYSTEM INFO -------------------

localhost: Linux 2.6.24-rt4 #8 SMP PREEMPT RT Mon Jan 21 18:45:00 CET 2008

Setting priority SCHED_OTHER to 0 (normal process) for 20802
[3084102544] Thread switching time (not counted): 10709015ns
[3075709840] Thread switching time: 35468301ns
[3084102544] Thread switching time: 2793ns
[3075709840] Thread switching time 30725ns
[3084102544] Thread switching time: 10405ns
[3075709840] Thread switching time: 2724ns
[3084102544] Thread switching time: 2654ns
[3075709840] Thread switching time: 2653ns
[3084102544] Thread switching time: 3352ns
[3075709840] Thread switching time: 2583ns
[3084102544] Thread switching time: 2514ns
[3075709840] Thread switching time: 2514ns
[3084102544] Thread switching time: 2584ns
[3075709840] Thread switching time: 2584ns
[3084102544] Thread switching time: 2584ns
[3075709840] Thread switching time: 2584ns
[3084102544] Thread switching time: 2584ns
[3075709840] Thread switching time: 2584ns
[3084102544] Thread switching time: 2583ns
[3075709840] Thread switching time: 2584ns
[3084102544] Thread switching time: 2583ns
[3084102544] n=20, min=2514, max=35468301, avg=1777723, stddev=7729151
Setting priority SHED_FIFO to 99 (range is 1-99) for 20802
[3084102544] Thread switching time (not counted): 31004ns
[3075709840] Thread switching time: 2444ns
[3084102544] Thread switching time: 2305ns
[3075709840] Thread switching time: 2305ns
[3084102544] Thread switching time: 2374ns
[3075709840] Thread switching time: 2374ns
[3084102544] Thread switching time: 2374ns
[3075709840] Thread switching time: 2375ns
[3084102544] Thread switching time: 2305ns
[3075709840] Thread switching time: 2374ns
[3084102544] Thread switching time: 2305ns
[3075709840] Thread switching time: 2375ns
[3084102544] Thread switching time: 2305ns
[3075709840] Thread switching time: 2305ns
[3084102544] Thread switching time: 2305ns
[3075709840] Thread switching time: 2305ns
[3084102544] Thread switching time: 2304ns
[3075709840] Thread switching time: 2304ns
[3084102544] Thread switching time: 2374ns
[3075709840] Thread switching time: 2374ns
[3084102544] Thread switching time: 2304ns
[3084102544] n=20, min=2304, max=2444, avg=2339, stddev=41

--------------------------------------------------------------------------

Ustawianie maski procesorow na 1

-- SYSTEM INFO -------------------

hackett: Linux 2.6.25.4-rt6 #5 SMP PREEMPT RT Sun Jun 8 12:40:15 CEST 2008

Ustawianie priorytetu dla SCHED_OTHER na 0 (zwykly proces) dla 20511
[3085323152] Czas przelaczania watku (pomijany): 9286166ns
[3076930448] Czas przelaczania watku: 38678449ns
[3085323152] Czas przelaczania watku: 1181784ns
[3076930448] Czas przelaczania watku: 284114ns
[3085323152] Czas przelaczania watku: 2894642ns
[3076930448] Czas przelaczania watku: 975962ns
[3085323152] Czas przelaczania watku: 2010730ns
[3076930448] Czas przelaczania watku: 980292ns
[3085323152] Czas przelaczania watku: 2004934ns
[3076930448] Czas przelaczania watku: 983994ns
[3085323152] Czas przelaczania watku: 2009682ns
[3076930448] Czas przelaczania watku: 984343ns
[3085323152] Czas przelaczania watku: 2013036ns
[3076930448] Czas przelaczania watku: 979035ns
[3085323152] Czas przelaczania watku: 2013664ns
[3076930448] Czas przelaczania watku: 973727ns
[3085323152] Czas przelaczania watku: 1688204ns
[3076930448] Czas przelaczania watku: 309397ns
[3085323152] Czas przelaczania watku: 985181ns
[3076930448] Czas przelaczania watku: 997822ns
[3085323152] Czas przelaczania watku: 996495ns
[3085323152] n=20, min=284114, max=38678449, avg=3197274, stddev=8164859
Ustawianie priorytetu dla SHED_FIFO na 99 (zakres 1-99) dla 20511
[3085323152] Czas przelaczania watku (pomijany): 39740ns
[3076930448] Czas przelaczania watku: 2723ns
[3085323152] Czas przelaczania watku: 2444ns
[3076930448] Czas przelaczania watku: 2445ns
[3085323152] Czas przelaczania watku: 2444ns
[3076930448] Czas przelaczania watku: 2445ns
[3085323152] Czas przelaczania watku: 2444ns
[3076930448] Czas przelaczania watku: 2445ns
[3085323152] Czas przelaczania watku: 2375ns
[3076930448] Czas przelaczania watku: 2445ns
[3085323152] Czas przelaczania watku: 2375ns
[3076930448] Czas przelaczania watku: 2375ns
[3085323152] Czas przelaczania watku: 2444ns
[3076930448] Czas przelaczania watku: 2374ns
[3085323152] Czas przelaczania watku: 2375ns
[3076930448] Czas przelaczania watku: 2375ns
[3085323152] Czas przelaczania watku: 2445ns
[3076930448] Czas przelaczania watku: 2444ns
[3085323152] Czas przelaczania watku: 2375ns
[3076930448] Czas przelaczania watku: 2374ns
[3085323152] Czas przelaczania watku: 2445ns
[3085323152] n=20, min=2374, max=2723, avg=2430, stddev=75

Is this behavior expected?

Regards,
Jakub


2008-06-08 22:07:26

by Robert Hancock

[permalink] [raw]
Subject: Re: sched_yield() on 2.6.25

Jakub W. Jozwicki wrote:
> Hello,
> I observe strange behavior of sched_yield() on 2.6.25 (strange comparing to
> 2.6.24). Here is the code (available at
> http://systest.googlecode.com/files/systest20080119.tgz):
>
> ------------------------------------------------------
> timer_t timer;
> sig_atomic_t cnt = 0;
> long long sum = 0;
> long times[21], min, max;
> pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;
> struct timespec ts = { 0, 0 };
> pthread_t last_th = 0;
>
> void *th_proc(void* p) {
> int n = SIZE(times) -1;
> pthread_t th;
>
> while(1) {
> pthread_mutex_lock(&mutex);
> th = pthread_self();
> if (pthread_equal(th,last_th)) {
> pthread_mutex_unlock(&mutex);
> sched_yield();
> continue;
> }
> rt_timer_stop(&ts);
> last_th = th;
> if (cnt>=1) {
> times[cnt-1] = ts_sum(&ts);
> if (cnt <= n) {
> sum += times[cnt-1];
> box(times[cnt-1],min,max);
> #define uint unsigned int
> printf("[%u] Thread switching time: %ldns\n",(uint)th, times[cnt-1]);
> }
> else {
> printf("[%u] Thread switching time (not counted): %ldns\n",(uint)th,
> times[cnt-1]);
> }
> cnt--;
> }
> ....
> -----------------------------------------------------
> and here are the results:

...

> Is this behavior expected?

The behavior of sched_yield with SCHED_OTHER processes has changed
several times with Linux over the years, since its behavior is not
defined by standards, so it's really "whatever the scheduler feels like
doing". The behavior is only defined with realtime scheduling
(SCHED_FIFO or SCHED_OTHER).

Generally, it's a mistake to assume specific timing behavior from
sched_yied for SCHED_OTHER processes.

2008-06-09 06:37:40

by Jakub W. Jozwicki

[permalink] [raw]
Subject: Re: sched_yield() on 2.6.25


> > Is this behavior expected?
>
> The behavior of sched_yield with SCHED_OTHER processes has changed
> several times with Linux over the years, since its behavior is not
> defined by standards, so it's really "whatever the scheduler feels like
> doing". The behavior is only defined with realtime scheduling
> (SCHED_FIFO or SCHED_OTHER).
>
> Generally, it's a mistake to assume specific timing behavior from
> sched_yied for SCHED_OTHER processes.

>From the man sched_yield:

A process can relinquish the processor voluntarily without blocking by
calling sched_yield(). The process will then be moved to the end of the
queue for its static priority and a new process gets to run.

and also IEEE/Open Group:
http://www.opengroup.org/onlinepubs/000095399/functions/sched_yield.html

>> pthread_mutex_lock(&mutex);
> > th = pthread_self();
> > if (pthread_equal(th,last_th)) {
> > pthread_mutex_unlock(&mutex);
> > sched_yield();
> > continue;

Here with SCHED_OTHER sched_yield for the first 100-200 times does nothing.
Should the man be updated?

Regards,
Jakub

2008-06-09 09:04:26

by Helge Hafting

[permalink] [raw]
Subject: Re: sched_yield() on 2.6.25

Jakub Jozwicki wrote:
> From the man sched_yield:
>
> A process can relinquish the processor voluntarily without blocking by
> calling sched_yield(). The process will then be moved to the end of the
> queue for its static priority and a new process gets to run.
>
> and also IEEE/Open Group:
> http://www.opengroup.org/onlinepubs/000095399/functions/sched_yield.html
>
>
>>> pthread_mutex_lock(&mutex);
>>> th = pthread_self();
>>> if (pthread_equal(th,last_th)) {
>>> pthread_mutex_unlock(&mutex);
>>> sched_yield();
>>> continue;
>>>
>
> Here with SCHED_OTHER sched_yield for the first 100-200 times does nothing.
> Should the man be updated?
>
Having the man page mention the fact that that sched_yield() probably
won't do "what you intend" in the non-realtime cases is probably a
good idea;
that way we get fewer application programmers who mistakenly think
that sched_yield can be used for their purposes. And then we'll have
less broken apps.

A pointer to info about what they might want to use instead is even better.

Helge Hafting

2008-06-09 09:35:38

by Andrew Morton

[permalink] [raw]
Subject: Re: sched_yield() on 2.6.25

On Sun, 8 Jun 2008 13:34:10 +0200 "Jakub W. Jozwicki" <[email protected]> wrote:

> Hello,
> I observe strange behavior of sched_yield() on 2.6.25 (strange comparing to
> 2.6.24). Here is the code (available at
> http://systest.googlecode.com/files/systest20080119.tgz):
>
> ------------------------------------------------------
> timer_t timer;
> sig_atomic_t cnt = 0;
> long long sum = 0;
> long times[21], min, max;
> pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;
> struct timespec ts = { 0, 0 };
> pthread_t last_th = 0;
>
> void *th_proc(void* p) {
> int n = SIZE(times) -1;
> pthread_t th;
>
> while(1) {
> pthread_mutex_lock(&mutex);
> th = pthread_self();
> if (pthread_equal(th,last_th)) {
> pthread_mutex_unlock(&mutex);
> sched_yield();
> continue;
> }
> rt_timer_stop(&ts);
> last_th = th;
> if (cnt>=1) {
> times[cnt-1] = ts_sum(&ts);
> if (cnt <= n) {
> sum += times[cnt-1];
> box(times[cnt-1],min,max);
> #define uint unsigned int
> printf("[%u] Thread switching time: %ldns\n",(uint)th, times[cnt-1]);
> }
> else {
> printf("[%u] Thread switching time (not counted): %ldns\n",(uint)th,
> times[cnt-1]);
> }
> cnt--;
> }
> ....
> -----------------------------------------------------
> and here are the results:
>
> Setting cpu mask to 1
>
> -- SYSTEM INFO -------------------
>
> localhost: Linux 2.6.24-rt4 #8 SMP PREEMPT RT Mon Jan 21 18:45:00 CET 2008
>
> Setting priority SCHED_OTHER to 0 (normal process) for 20802
> [3084102544] Thread switching time (not counted): 10709015ns
> [3075709840] Thread switching time: 35468301ns
> [3084102544] Thread switching time: 2793ns
> [3075709840] Thread switching time 30725ns
> [3084102544] Thread switching time: 10405ns
> [3075709840] Thread switching time: 2724ns
> [3084102544] Thread switching time: 2654ns
> [3075709840] Thread switching time: 2653ns
> [3084102544] Thread switching time: 3352ns
> [3075709840] Thread switching time: 2583ns
> [3084102544] Thread switching time: 2514ns
> [3075709840] Thread switching time: 2514ns

It would save us some time if you were to tell us what these results
mean, please.

2008-06-09 15:05:09

by Peter Zijlstra

[permalink] [raw]
Subject: Re: sched_yield() on 2.6.25

On Mon, 2008-06-09 at 08:37 +0200, Jakub Jozwicki wrote:
> > > Is this behavior expected?
> >
> > The behavior of sched_yield with SCHED_OTHER processes has changed
> > several times with Linux over the years, since its behavior is not
> > defined by standards, so it's really "whatever the scheduler feels like
> > doing". The behavior is only defined with realtime scheduling
> > (SCHED_FIFO or SCHED_OTHER).
> >
> > Generally, it's a mistake to assume specific timing behavior from
> > sched_yied for SCHED_OTHER processes.
>
> From the man sched_yield:
>
> A process can relinquish the processor voluntarily without blocking by
> calling sched_yield(). The process will then be moved to the end of the
> queue for its static priority and a new process gets to run.
>
> and also IEEE/Open Group:
> http://www.opengroup.org/onlinepubs/000095399/functions/sched_yield.html

Yeah, except that is for Real-Time scheduling classes, SCHED_OTHER
doesn't have static priority queues.

SCHED_OTHER doesn't have a specified implementation - so relying on it
to do anything specific is well outside the scope of definition.

2008-06-11 15:36:51

by Bodo Eggert

[permalink] [raw]
Subject: Re: sched_yield() on 2.6.25

Peter Zijlstra <[email protected]> wrote:
> On Mon, 2008-06-09 at 08:37 +0200, Jakub Jozwicki wrote:

>> From the man sched_yield:
>>
>> A process can relinquish the processor voluntarily without blocking by
>> calling sched_yield(). The process will then be moved to the end of the
>> queue for its static priority and a new process gets to run.
>>
>> and also IEEE/Open Group:
>> http://www.opengroup.org/onlinepubs/000095399/functions/sched_yield.html
>
> Yeah, except that is for Real-Time scheduling classes, SCHED_OTHER
> doesn't have static priority queues.
>
> SCHED_OTHER doesn't have a specified implementation - so relying on it
> to do anything specific is well outside the scope of definition.

OTOH, it's sane not to schedule exactly the thread which just tried
to say "I can't do any sane work, please run another thread.

2008-06-11 22:45:26

by Leon Woestenberg

[permalink] [raw]
Subject: Re: sched_yield() on 2.6.25

Hello,

On Wed, Jun 11, 2008 at 5:28 PM, Bodo Eggert <[email protected]> wrote:
> Peter Zijlstra <[email protected]> wrote:
>> On Mon, 2008-06-09 at 08:37 +0200, Jakub Jozwicki wrote:
>
>>> From the man sched_yield:
>>>
>>> A process can relinquish the processor voluntarily without blocking by
>>> calling sched_yield(). The process will then be moved to the end of the
>>> queue for its static priority and a new process gets to run.
>>>
>>> and also IEEE/Open Group:
>>> http://www.opengroup.org/onlinepubs/000095399/functions/sched_yield.html
>>
>> Yeah, except that is for Real-Time scheduling classes, SCHED_OTHER
>> doesn't have static priority queues.
>>
>> SCHED_OTHER doesn't have a specified implementation - so relying on it
>> to do anything specific is well outside the scope of definition.
>
> OTOH, it's sane not to schedule exactly the thread which just tried
> to say "I can't do any sane work, please run another thread.
>
That's not the definition of sched_yield(). See the earlier emails,
and the quote above.

As the code after sched_yield() has to be executed the thread will be
rescheduled soon (or even immediately) anyway.

The users not understanding the limited scope where sched_yield()
behaves deterministicly, seem to think that _yield() will yield() AND
lower the thread's dynamic priority for SCHED_OTHER. Is downgrading
the dynamic priority a behavioral option?

On the other hand, I don't think anything should encourage the use of
sched_yield() outside of the rare SCHED_FIFO/RR case.

Regards,
--
Leon

2008-06-12 08:09:37

by Helge Hafting

[permalink] [raw]
Subject: Re: sched_yield() on 2.6.25

Leon Woestenberg wrote:
[...]
> That's not the definition of sched_yield(). See the earlier emails,
> and the quote above.
>
> As the code after sched_yield() has to be executed the thread will be
> rescheduled soon (or even immediately) anyway.
>
> The users not understanding the limited scope where sched_yield()
> behaves deterministicly, seem to think that _yield() will yield() AND
> lower the thread's dynamic priority for SCHED_OTHER. Is downgrading
> the dynamic priority a behavioral option?
>
That can be done of course, but that too will cause breakage.
Consider a multithreaded app mistakenly relying on sched_yield.

Priority downgrading might work really well as long as this app runs alone,
the yielding thread stops and the others progress, so sched_yield works
for "userspace locking". And it works so well, the app uses it a lot.

Then someone recompiles the distro or runs some other kinds of cpu hogs
that drives the load well above 1. Users expect the apps to run a little
slower because of this. But a load of 5 still ought to give you 1/5
of the cpu - and with today's CPU's that might still be better than
a 5-year old machine. Interactive software should almost not notice,
as it don't use the cpu that much anyway - and it gets priority over
cpu hogs when it occationally needs to do something.

But now this multithreaded app practically
stops because it yields a lot - an everytime it lowers its priority
below not only its own other threads, but below the various
cpu hogs as well. (Compilers gets dynamic boosts too, as they
wait a little for the disk now and then. A parallel compile still
keeps the total load high.)

I remember seeing openoffice taking 5min to start some years ago,
with a compile going on. Of course there were other problems
like swapping and a smaller computer, but other apps were merely slow,
not that glacial.
> On the other hand, I don't think anything should encourage the use of
> sched_yield() outside of the rare SCHED_FIFO/RR case.
>
Exactly. There seems to be no way to make sched_yield work "as expected"
for all the ways it is abused, so better use something else.

Helge Hafting



2008-06-13 07:46:19

by Bodo Eggert

[permalink] [raw]
Subject: Re: sched_yield() on 2.6.25

On Thu, 12 Jun 2008, Leon Woestenberg wrote:
> On Wed, Jun 11, 2008 at 5:28 PM, Bodo Eggert <[email protected]> wrote:
> > Peter Zijlstra <[email protected]> wrote:
> >> On Mon, 2008-06-09 at 08:37 +0200, Jakub Jozwicki wrote:

> >>> From the man sched_yield:
> >>>
> >>> A process can relinquish the processor voluntarily without blocking by
> >>> calling sched_yield(). The process will then be moved to the end of the
> >>> queue for its static priority and a new process gets to run.
> >>>
> >>> and also IEEE/Open Group:
> >>> http://www.opengroup.org/onlinepubs/000095399/functions/sched_yield.html
> >>
> >> Yeah, except that is for Real-Time scheduling classes, SCHED_OTHER
> >> doesn't have static priority queues.
> >>
> >> SCHED_OTHER doesn't have a specified implementation - so relying on it
> >> to do anything specific is well outside the scope of definition.
> >
> > OTOH, it's sane not to schedule exactly the thread which just tried
> > to say "I can't do any sane work, please run another thread.
> >
> That's not the definition of sched_yield(). See the earlier emails,
> and the quote above.
>
> As the code after sched_yield() has to be executed the thread will be
> rescheduled soon (or even immediately) anyway.

The code after yield() is most likely to not run successfully (and as a
result will return to the yield call) unless some time passes, and this
time can pass while another process gets the CPU. It might even depend on
another process to change the system state.

Besides that, "Schedule another process, if you can" is part of the
semantics of yield. The code after yield should therefore be expected to
NOT be the very next code to run.

If you can't do that, it's fine, the process will abuse some more innocent
electrons for busy waiting, but if you can support this yield() semantics,
the system will perform much better. Won't it?

> The users not understanding the limited scope where sched_yield()
> behaves deterministicly, seem to think that _yield() will yield() AND
> lower the thread's dynamic priority for SCHED_OTHER. Is downgrading
> the dynamic priority a behavioral option?

I expect it to be. It may cause lower-nice-level processes to run,
but the (lack of) definition allows it.

> On the other hand, I don't think anything should encourage the use of
> sched_yield() outside of the rare SCHED_FIFO/RR case.

I agree that sleeping should be prefered, but if you really have to
busy-wait for the next thread, you'll want yield() semantics.

2008-06-13 08:38:17

by Helge Hafting

[permalink] [raw]
Subject: Re: sched_yield() on 2.6.25

Bodo Eggert wrote:
> On Thu, 12 Jun 2008, Leon Woestenberg wrote:
>
[...]
>>
>> As the code after sched_yield() has to be executed the thread will be
>> rescheduled soon (or even immediately) anyway.
>>
>
> The code after yield() is most likely to not run successfully (and as a
> result will return to the yield call) unless some time passes, and this
> time can pass while another process gets the CPU. It might even depend on
> another process to change the system state.
>
> Besides that, "Schedule another process, if you can" is part of the
> semantics of yield. The code after yield should therefore be expected to
> NOT be the very next code to run.
>
> If you can't do that, it's fine, the process will abuse some more innocent
> electrons for busy waiting, but if you can support this yield() semantics,
> the system will perform much better. Won't it?
>
The system as a whole will perform better if the busy-wait is avoided - yes.
A multithreaded app actually using yield this way may suffer badly though.

Consider a case where two threads use yield() for synchronization, and
there is
some contention. Ideally, the two threads want to ping-pong the cpu between
them. In the precence of other processes, they want to use up their fair
amount of
cpu in this manner. If they busy-wait, then both the app and the system
overall gets
slower.

Now, consider what happens if yield() really do lower the priority, or
does something
like "sleep(1) unless there really is nothing else to run".
Busy-waiting will disappear,
and the overall performance will be fine. The app running alone will
also seem fine.

The app running with some contention - massive use of yield - on a machine
with some other load will almost stop. Every yield will wait for all the
other
processes on the system, not only the process/thread we needed to wait for.

Ideally, yield() from a non-realtime process should cause a segfault,
that would
crash all software abusing it, forcing a change. Too drastic though. :-/

>> The users not understanding the limited scope where sched_yield()
>> behaves deterministicly, seem to think that _yield() will yield() AND
>> lower the thread's dynamic priority for SCHED_OTHER. Is downgrading
>> the dynamic priority a behavioral option?
>>
>
> I expect it to be. It may cause lower-nice-level processes to run,
> but the (lack of) definition allows it.
>
And this is the problem - suddenly those niced cpu hogs trumps an
interactive process that seems hopelessly stuck.

Helge Hafting