LinuxLists.cc - Re: Question about sched

2002-06-18 16:58:31

Subject: Re: Question about sched_yield()

David Schwartz wrote:
>
> >And you seem to have a misconception about sched_yield, too. If a
> >machine has n tasks, half of which are doing CPU-intense work and the
> >other half of which are just yielding... why on Earth would the yielding
> >tasks get any noticeable amount of CPU use?
>
> Because they are not blocking. They are in an endless CPU burning loop. They
> should get CPU use for the same reason they should get CPU use if they're the
> only threads running. They are always ready-to-run.
>
> >Quite frankly, even if the supposed standard says nothing of this... I
> >do not care: calling sched_yield in a loop should not show up as a CPU
> >hog.
>
> It has to. What if the only task running is:
>
> while(1) sched_yield();
>
> What would you expect?

If there is only the one task, then sure it's going to be 100% cpu on that task.

However, if there is anything else other than the idle task that wants to run,
then it should run until it exhausts its timeslice.

One process looping on sched_yield() and another one doing calculations should
result in almost the entire system being devoted to calculations.

Chris

--
Chris Friesen | MailStop: 043/33/F10
Nortel Networks | work: (613) 765-0557
3500 Carling Avenue | fax: (613) 765-2986
Nepean, ON K2H 8E9 Canada | email: [email protected]

2002-06-18 17:10:22

by Richard B. Johnson

[permalink] [raw]

Subject: Re: Question about sched_yield()

On Tue, 18 Jun 2002, Chris Friesen wrote:

> David Schwartz wrote:
> >
> > >And you seem to have a misconception about sched_yield, too. If a
> > >machine has n tasks, half of which are doing CPU-intense work and the
> > >other half of which are just yielding... why on Earth would the yielding
> > >tasks get any noticeable amount of CPU use?
> >
> > Because they are not blocking. They are in an endless CPU burning loop. They
> > should get CPU use for the same reason they should get CPU use if they're the
> > only threads running. They are always ready-to-run.
> >
> > >Quite frankly, even if the supposed standard says nothing of this... I
> > >do not care: calling sched_yield in a loop should not show up as a CPU
> > >hog.
> >
> > It has to. What if the only task running is:
> >
> > while(1) sched_yield();
> >
> > What would you expect?
>
> If there is only the one task, then sure it's going to be 100% cpu on that task.
>
> However, if there is anything else other than the idle task that wants to run,
> then it should run until it exhausts its timeslice.
>
> One process looping on sched_yield() and another one doing calculations should
> result in almost the entire system being devoted to calculations.
>
> Chris
>

It's all in the accounting. Use usleep(0) if you want it to "look good".

Cheers,
Dick Johnson

Penguin : Linux version 2.4.18 on an i686 machine (797.90 BogoMips).

Windows-2000/Professional isn't.

2002-06-18 17:13:35

by Robert Love

[permalink] [raw]

Subject: Re: Question about sched_yield()

On Tue, 2002-06-18 at 09:58, Chris Friesen wrote:

> David Schwartz wrote:
>
> > What would you expect?
>
> If there is only the one task, then sure it's going to be 100% cpu on that
> task.
>
> However, if there is anything else other than the idle task that wants to
> run, then it should run until it exhausts its timeslice.
>
> One process looping on sched_yield() and another one doing calculations
> should result in almost the entire system being devoted to calculations.

Exactly. The reason the behavior is odd is not because the sched_yield
task is getting any CPU, David. I realize sched_yield is not equivalent
to blocking.

The reason this behavior is suspect is because the task is receiving a
similar amount of CPU to tasks that are _not_ yielding but in fact doing
useful work for the entire duration of their timeslice.

A task that continually uses its timeslice vs one that yields should
easily receive a greater amount of CPU, but this is not the case.

As someone who works in the scheduler, this points out that sched_yield
is, well, broken. First guess would be it is queuing to the front of
the runqueue (it once did this but I thought it was fixed) or otherwise
exhausting the timeslice wrong.

Someone pointed out this bug existed similarly in 2.5, although it was a
bit different. 2.5 has a different (and better, imo) sched_yield
implementation that tries to overcome certain shortcomings and also
perform optimally and fairly.

Robert Love

2002-06-18 17:19:29

by Emmanuel Mogenet

[permalink] [raw]

Subject: RE: Question about sched_yield()

> It's all in the accounting. Use usleep(0) if you want it to "look good".

Two things:

1. First, I think there's a misunderstanding on what my
original issue was: I am not interested in any way by
CPU cycle accounting, and wether the yielding loop should
log any of it. All I want is: when I run a bunch of
yielders and a actual working process, I want the
working process to not be slown down (wall clock) in
anyway. That's all. What top shows is of little interest
(to me). What matters is how many real world seconds it takes
for the actually working process to complete its task.
And that should not be affected by the presence of running
yielders. And, David, no one is arguing the fact that a yielder
running all by itself should log 100% of the CPU.

2. I have a question about usleep(0). You seem to make the point
that usleep(0) is equivalent to sched_yield(). I can see how
intuitively this should be the case, but I am not sure if it
will always be true. It's certainly documented anywhere.

- Mgix

2002-06-18 17:23:58

by Rik van Riel

[permalink] [raw]

Subject: Re: Question about sched_yield()

On Tue, 18 Jun 2002, Chris Friesen wrote:

> > It has to. What if the only task running is:
> >
> > while(1) sched_yield();
> >
> > What would you expect?
>
> If there is only the one task, then sure it's going to be 100% cpu on
> that task.
>
> However, if there is anything else other than the idle task that wants
> to run, then it should run until it exhausts its timeslice.
>
> One process looping on sched_yield() and another one doing calculations
> should result in almost the entire system being devoted to calculations.

So if you have a database with 20 threads yielding to each other
each time a lock is contended and one CPU hog the database should
be reduced to a very small percentage of the CPU ?

regards,

Rik
--
http://www.linuxsymposium.org/2002/
"You're one of those condescending OLS attendants"
"Here's a nickle kid. Go buy yourself a real t-shirt"

http://www.surriel.com/ http://distro.conectiva.com/

2002-06-18 18:02:03

by David Schwartz

[permalink] [raw]

Subject: RE: Question about sched_yield()

>All I want is: when I run a bunch of
>yielders and a actual working process, I want the
>working process to not be slown down (wall clock) in
>anyway. That's all. What top shows is of little interest
>(to me). What matters is how many real world seconds it takes
>for the actually working process to complete its task.
>And that should not be affected by the presence of running
>yielders. And, David, no one is arguing the fact that a yielder
>running all by itself should log 100% of the CPU.

Your assumptions are just plain wrong. The yielder is being nice, so it
should get preferential treatment, not worse treatment. All threads are
ready-to-run all the time. Yielding is not the same as blocking or lowering
your priority.

DS

2002-06-18 18:00:42

by David Schwartz

[permalink] [raw]

Subject: Re: Question about sched_yield()

>Exactly. The reason the behavior is odd is not because the sched_yield
>task is getting any CPU, David. I realize sched_yield is not equivalent
>to blocking.

Good.

>The reason this behavior is suspect is because the task is receiving a
>similar amount of CPU to tasks that are _not_ yielding but in fact doing
>useful work for the entire duration of their timeslice.

This is the same error repeated again. Since you realize that an endless
loop on sched_yield is *not* equivalent to blocking, why do you then say "in
fact doing useful work"? By what form of ESP is the kernel supposed to
determine that the sched_yield task is not 'doing useful work' and the other
task is?

The kernel doesn't know the loop is endless. For all it knows, as soon as
another process gets a drop of CPU, the yielding process may be able to
succeed. And because the yielding process is being nice (by yielding), it
should get better and better treatment over processes that are burning CPU
rather than yielding.

>A task that continually uses its timeslice vs one that yields should
>easily receive a greater amount of CPU, but this is not the case.

Why should the mean task get preferential treatment over the nice task when
both are always ready-to-run?

DS

2002-06-18 18:05:37

by Emmanuel Mogenet

[permalink] [raw]

Subject: RE: Question about sched_yield()

> Your assumptions are just plain wrong. The yielder is being nice, so it
> should get preferential treatment, not worse treatment. All threads are
> ready-to-run all the time. Yielding is not the same as blocking or lowering
> your priority.

In other words, the more you yield, the nicer you
are and the more CPU you get, and those nasty processes
that are trying to actually use the CPU to do something
with it and wear it down should get it as little as possible.
I get it.

- Mgix

2002-06-18 18:19:40

by Richard B. Johnson

[permalink] [raw]

Subject: RE: Question about sched_yield()

On Tue, 18 Jun 2002 [email protected] wrote:

>
> > It's all in the accounting. Use usleep(0) if you want it to "look good".
>
>
> Two things:
>
> 1. First, I think there's a misunderstanding on what my
> original issue was: I am not interested in any way by
> CPU cycle accounting, and wether the yielding loop should
> log any of it. All I want is: when I run a bunch of
> yielders and a actual working process, I want the
> working process to not be slown down (wall clock) in
> anyway. That's all. What top shows is of little interest
> (to me). What matters is how many real world seconds it takes
> for the actually working process to complete its task.
> And that should not be affected by the presence of running
> yielders. And, David, no one is arguing the fact that a yielder
> running all by itself should log 100% of the CPU.
>

Well the problem seems to be an inconsistancy I reported to 'the list'
some time ago. As I recall, Ingo replied that I should use usleep(0) and
the problem I reported would "go away". It did go away. However, if
you run this on a single-processor machine, 'top' will show that
each task gets 99+ percent of the CPU, which of course can't be correct.

#include <stdio.h>
#include <string.h>
#include <unistd.h>

int main(int c, char *argv[])
{
if(!fork())
{
strcpy(argv[0], "Child");
for(;;)
;
}
strcpy(argv[0], "Parent");

for(;;)
sched_yield();

return c;
}

So, it seems that the guy that's yielding the CPU gets 'charged' for
the CPU time he gave away. Fair enough, I guess. As I see it,
sched_yield() will give the CPU to any single computible task once.
After this, the caller gets the CPU back whether or not he wants it.

> 2. I have a question about usleep(0). You seem to make the point
> that usleep(0) is equivalent to sched_yield(). I can see how
> intuitively this should be the case, but I am not sure if it
> will always be true. It's certainly documented anywhere.
>

No. I did not mention or imply "equivalent", only that you can use it
instead, in many, perhaps most, instances.

Cheers,
Dick Johnson

Penguin : Linux version 2.4.18 on an i686 machine (797.90 BogoMips).

Windows-2000/Professional isn't.

2002-06-18 19:11:16

by David Schwartz

[permalink] [raw]

Subject: RE: Question about sched_yield()

On Tue, 18 Jun 2002 11:05:36 -0700, [email protected] wrote:

>> Your assumptions are just plain wrong. The yielder is being nice, so it
>>should get preferential treatment, not worse treatment. All threads are
>>ready-to-run all the time. Yielding is not the same as blocking or lowering
>>your priority.

>In other words, the more you yield, the nicer you
>are and the more CPU you get, and those nasty processes
>that are trying to actually use the CPU to do something
>with it and wear it down should get it as little as possible.
>I get it.
>
> - Mgix

I'm sorry, but you are being entirely unreasonable. The kernel has no way to
know which processes are doing something useful and which ones are just
wasting CPU. It knows is that they are all ready-to-run and they all have the
same priority and none of them are blocking. The yielding threads are playing
nice and shouldn't be penalized for it.

The following code:

while(1) sched_yield();

Is the problem, not the kernel. You can't expect the kernel's ESP to figure
out that you really meant something else or that your process isn't really
doing anything useful.

If you didn't mean to burn the CPU in an endless loop, WHY DID YOU?

You should never call sched_yield in a loop like this unless your intent is
to burn the CPU until some other thread/process does something. Since you
rarely want to do this, you should seldom if ever call sched_yield in a loop.

What sched_yield is good for is if you encounter a situation where you
need/want some resource and another thread/process has it. You call
sched_yield once, and maybe when you run again, the other thread/process will
have released it. You can also use it as the spin function in spinlocks.

But your expectation that it will reduce CPU usage is just plain wrong. If
you have one thread spinning on sched_yield, on a single CPU machine it will
definitely get 100% of the CPU. If you have two, they will each definitely
get 50% of the CPU. There are blocking functions and scheduler priority
functions for this purpose.

DS

2002-06-18 19:25:33

by Robert Love

[permalink] [raw]

Subject: RE: Question about sched_yield()

On Tue, 2002-06-18 at 12:11, David Schwartz wrote:

> I'm sorry, but you are being entirely unreasonable.

No, sorry, you are. Listen to everyone else here.

> If you didn't mean to burn the CPU in an endless loop, WHY DID YOU?

It is not an endless loop. Here is the problem. You have n tasks. One
type executes:

while(1) ;

the others execute:

while(1)
sched_yield();

the second bunch should _not_ receive has much CPU time as the first.
This has nothing to do with intelligent work or blocking or picking your
nose. It has everything to do with the fact that yielding means
"relinquish my timeslice" and "put me at the end of the runqueue".

If we are doing this, then why does the sched_yield'ing task monopolize
the CPU? BECAUSE IT IS BROKEN.

> You should never call sched_yield in a loop like this unless your intent is
> to burn the CPU until some other thread/process does something. Since you
> rarely want to do this, you should seldom if ever call sched_yield in a loop.

But there are other tasks that wish to do something in these examples...

> But your expectation that it will reduce CPU usage is just plain wrong. If
> you have one thread spinning on sched_yield, on a single CPU machine it will
> definitely get 100% of the CPU. If you have two, they will each definitely
> get 50% of the CPU. There are blocking functions and scheduler priority
> functions for this purpose.

If they are all by themselves, of course they will get 100% of the CPU.
No one is saying sched_yield is equivalent to blocking. I am not even
saying it should get no CPU! It should get a bunch. But all the
processes being equal, one that keeps yielding its timeslice should not
get as much CPU time as one that does not. Why is that not logical to
you?

The original report was that given one task of the second case (above)
and two of the first, everything was fine - the yielding task received
little CPU as it continually relinquishes its timeslice. In the
alternative case, there are two of each types of tasks. Now, the CPU is
split and the yielding tasks are receiving a chunk of it. Why? Because
the yielding behavior is broken and the tasks are continually yielding
and rescheduling back and forth. So there is an example of how it
should work and how it does. It is broken.

There isn't even really an argument. Ingo and I have both identified
this is a problem in 2.4 and 2.5 and Ingo fixed it in 2.5. If 2.5 no
longer has this behavior, then are you saying it is NOW broken?

Robert Love

2002-06-18 19:53:48

by David Schwartz

[permalink] [raw]

Subject: RE: Question about sched_yield()

On 18 Jun 2002 12:25:13 -0700, Robert Love wrote:

>> If you didn't mean to burn the CPU in an endless loop, WHY DID YOU?

>It is not an endless loop. Here is the problem. You have n tasks. One
>type executes:
>
> while(1) ;
>
>the others execute:
>
> while(1)
> sched_yield();

>the second bunch should _not_ receive has much CPU time as the first.

Why not? They're both at the same priority. They're both always
ready-to-run. Neither blocks. Why should the kernel assume that one will do
useful work while the other won't?

>This has nothing to do with intelligent work or blocking or picking your
>nose. It has everything to do with the fact that yielding means
>"relinquish my timeslice" and "put me at the end of the runqueue".

And consider me immediately ready to run again.

>> You should never call sched_yield in a loop like this unless your
>>intent
>>is
>>to burn the CPU until some other thread/process does something. Since you
>>rarely want to do this, you should seldom if ever call sched_yield in a
>>loop.

>But there are other tasks that wish to do something in these examples...

All the tasks are equally situated. Same priority, all ready-to-run.

>> But your expectation that it will reduce CPU usage is just plain wrong.
>>If
>>
>>you have one thread spinning on sched_yield, on a single CPU machine it
>>will
>>definitely get 100% of the CPU. If you have two, they will each definitely
>>get 50% of the CPU. There are blocking functions and scheduler priority
>>functions for this purpose.

>If they are all by themselves, of course they will get 100% of the CPU.
>No one is saying sched_yield is equivalent to blocking. I am not even
>saying it should get no CPU! It should get a bunch. But all the
>processes being equal, one that keeps yielding its timeslice should not
>get as much CPU time as one that does not. Why is that not logical to
>you?

Because a thread/process that voluntarily yields its timeslice should be
rewarded over one that burns it up, not penalized.

>The original report was that given one task of the second case (above)
>and two of the first, everything was fine - the yielding task received
>little CPU as it continually relinquishes its timeslice. In the
>alternative case, there are two of each types of tasks. Now, the CPU is
>split and the yielding tasks are receiving a chunk of it. Why? Because
>the yielding behavior is broken and the tasks are continually yielding
>and rescheduling back and forth. So there is an example of how it
>should work and how it does. It is broken.

So you're saying that the yielding tasks should be penalized for yielding
rather than just being descheduled?

>There isn't even really an argument. Ingo and I have both identified
>this is a problem in 2.4 and 2.5 and Ingo fixed it in 2.5. If 2.5 no
>longer has this behavior, then are you saying it is NOW broken?

I'm saying that the fixed behavior is better than the previous behavior and
that the previous behavior wasn't particularly good. But I'm also saying that
people are misusing sched_yield and have incorrect expectations about it.

If you spin on sched_yield, expect CPU wastage. It's really that simple.
Threads/processes that use sched_yield intelligently should be rewarded for
doing so, not penalized. Otherwise sched_yield becomes much less useful.

DS

2002-06-18 20:05:27

by Richard B. Johnson

[permalink] [raw]

Subject: RE: Question about sched_yield()

On 18 Jun 2002, Robert Love wrote:

> On Tue, 2002-06-18 at 12:11, David Schwartz wrote:
>
> > I'm sorry, but you are being entirely unreasonable.
>
> No, sorry, you are. Listen to everyone else here.
>
> > If you didn't mean to burn the CPU in an endless loop, WHY DID YOU?
>
> It is not an endless loop. Here is the problem. You have n tasks. One
> type executes:
>
> while(1) ;
>
> the others execute:
>
> while(1)
> sched_yield();
>
> the second bunch should _not_ receive has much CPU time as the first.
> This has nothing to do with intelligent work or blocking or picking your
> nose. It has everything to do with the fact that yielding means
> "relinquish my timeslice" and "put me at the end of the runqueue".
>
> If we are doing this, then why does the sched_yield'ing task monopolize
> the CPU? BECAUSE IT IS BROKEN.
>
> > You should never call sched_yield in a loop like this unless your intent is
> > to burn the CPU until some other thread/process does something. Since you
> > rarely want to do this, you should seldom if ever call sched_yield in a loop.
>
> But there are other tasks that wish to do something in these examples...
>
> > But your expectation that it will reduce CPU usage is just plain wrong. If
> > you have one thread spinning on sched_yield, on a single CPU machine it will
> > definitely get 100% of the CPU. If you have two, they will each definitely
> > get 50% of the CPU. There are blocking functions and scheduler priority
> > functions for this purpose.
>
> If they are all by themselves, of course they will get 100% of the CPU.
> No one is saying sched_yield is equivalent to blocking. I am not even
> saying it should get no CPU! It should get a bunch. But all the
> processes being equal, one that keeps yielding its timeslice should not
> get as much CPU time as one that does not. Why is that not logical to
> you?
>
> The original report was that given one task of the second case (above)
> and two of the first, everything was fine - the yielding task received
> little CPU as it continually relinquishes its timeslice. In the
> alternative case, there are two of each types of tasks. Now, the CPU is
> split and the yielding tasks are receiving a chunk of it. Why? Because
> the yielding behavior is broken and the tasks are continually yielding
> and rescheduling back and forth. So there is an example of how it
> should work and how it does. It is broken.
>
> There isn't even really an argument. Ingo and I have both identified
> this is a problem in 2.4 and 2.5 and Ingo fixed it in 2.5. If 2.5 no
> longer has this behavior, then are you saying it is NOW broken?
>
> Robert Love

Lets put in some numbers.

for(;;) Task 0
;

for(;;) Task 1
sched_yield();

I will assume that scheduling takes 0 seconds and system calls
also take 0 seconds. This should be fair if I modify task 0 so
it does:
for(;;)
getuid();

Which means both tasks make continuous system calls, which should
take, roughly, the same time.

Initial conditions has task B spinning for HZ time until preempted
so Task 0 starts up with HZ more time than Task 1. Task 1 now gets
the CPU. When Task 1 gets the CPU, it immediately gives it up, it
does not wait until it's preempted like task 0. Task 0 now gets the
CPU and keeps it for HZ time.

Clearly, task 0 should be getting at least HZ more CPU time than task 1.
And, yawn, it probably does! I just launched 400 of the following
programs:

int main(int c, char *argv[])
{

for(;;)
sched_yield();
return c;
}

This is a dual CPU system. Top shows 184% system CPU usage and 14%
user CPU usage.

3:54pm up 23:30, 2 users, load average: 14.74, 8.08, 3.30
463 processes: 62 sleeping, 401 running, 0 zombie, 0 stopped
CPU states: 13.7% user, 186.3% system, 0.0% nice, 0.0% idle
Mem: 320720K av, 215816K used, 104904K free, 0K shrd, 11752K buff
Swap: 1044208K av, 1044208K used, 0K free 131732K cached
PID USER PRI NI SIZE RSS SHARE STAT LIB %CPU %MEM TIME COMMAND
8627 root 18 0 208 208 172 R 0 15.6 0.0 0:31 xxx[K
8637 root 18 0 208 208 172 R 0 15.6 0.0 0:30 xxx[K
8638 root 18 0 208 208 172 R 0 15.6 0.0 0:30 xxx[K
8624 root 15 0 208 208 172 R 0 13.7 0.0 0:32 xxx[K
8629 root 15 0 208 208 172 R 0 13.7 0.0 0:31 xxx[K
8632 root 15 0 208 208 172 R 0 13.7 0.0 0:30 xxx[K
8626 root 15 0 208 208 172 R 0 12.7 0.0 0:31 xxx[K
8628 root 15 0 208 208 172 R 0 12.7 0.0 0:31 xxx[K
8633 root 15 0 208 208 172 R 0 12.7 0.0 0:30 xxx[K
8639 root 15 0 208 208 172 R 0 12.7 0.0 0:30 xxx[K
8625 root 14 0 208 208 172 R 0 11.7 0.0 0:32 xxx[K
8630 root 15 0 208 208 172 R 0 11.7 0.0 0:30 xxx[K
8634 root 14 0 208 208 172 R 0 11.7 0.0 0:30 xxx[K
8635 root 15 0 208 208 172 R 0 11.7 0.0 0:30 xxx[K
8636 root 15 0 208 208 172 R 0 11.7 0.0 0:30 xxx[K
[SNIPPED]

The system is as responsive as ever even though there are all those
'CPU burners', each showing they take roughly 12 percent of the CPU.
Let's see 400 * 0.12 = 48 == accounting error, nothing more.

Cheers,
Dick Johnson

Penguin : Linux version 2.4.18 on an i686 machine (797.90 BogoMips).

Windows-2000/Professional isn't.

2002-06-18 20:12:36

by Emmanuel Mogenet

[permalink] [raw]

Subject: RE: Question about sched_yield()

Let's do a little thought experiment with 2
naive scheduling algorithms and see what happens:

Algorithm 1: the FIFO (favours those that haven't had it yet over those who already had it)
- All ready to run processes are in a FIFO queue.
- The top of the queue is the yielder
- It's given a slice of CPU time to run on
- It gives it back right away.
- It's sent to back to the queue.
- The next in line is a task that does real work.
- It gets a time slice and fully uses it.
- etc ...

Algorithm 1 will have the expected behaviour: yielders
won't consume anything, workers get it all.

Algorithm 2: the Monte-Carlo scheduler (favours no one, schedules blindly)
- All ready to run processes are are kept in a black bag
- The scheduler randomly grabs one out of the bag
- It's given a slice of CPU time to run on
- If it's a yielder, it gives the CPU right back
- If it's an actual worker, it makes full use of the slice.
- etc ...

Lo and behold, Algorithm 2 exhibits the same behaviour as well:
yielders get nothing since they give it all away, and workers
get it all.

Now, if I understand you well enough David, you'd like an
algorithm where the less you want the CPU, the more you get
it. I'd love if you could actually give us an outlook of
your ideal scheduler so I can try my thought experiment on it,
because from what I've understood so far, your hypothetical
scheduler would allocate all of the CPU to the yielders.

Also, since it seems to worry you: no I'm not using sched_yield
to implement pseudo-blocking behaviour.

- Mgix

2002-06-18 20:42:49

by David Schwartz

[permalink] [raw]

Subject: RE: Question about sched_yield()

>Now, if I understand you well enough David, you'd like an
>algorithm where the less you want the CPU, the more you get
>it.

Exactly. This is the UNIX tradition of static and dynamic priorities. The
more polite you are about yielding the CPU when you don't need it, the more
claim you have to getting it when you do need it.

>I'd love if you could actually give us an outlook of
>your ideal scheduler so I can try my thought experiment on it,
>because from what I've understood so far, your hypothetical
>scheduler would allocate all of the CPU to the yielders.

Not all, just the same share any other process gets. They're all
ready-to-run, they're all at the same priority.

>Also, since it seems to worry you: no I'm not using sched_yield
>to implement pseudo-blocking behaviour.

Then tell us what you are doing so we can tell you the *right* way to do it.
Unless this is just an abstract theoretical exercise, you shouldn't complain
when ready-to-run threads get the CPU.

DS

2002-06-18 20:47:14

by Emmanuel Mogenet

[permalink] [raw]

Subject: RE: Question about sched_yield()

> >Now, if I understand you well enough David, you'd like an
> >algorithm where the less you want the CPU, the more you get
> >it.
>
> Exactly. This is the UNIX tradition of static and dynamic priorities. The
> more polite you are about yielding the CPU when you don't need it, the more
> claim you have to getting it when you do need it.
>
> >I'd love if you could actually give us an outlook of
> >your ideal scheduler so I can try my thought experiment on it,
> >because from what I've understood so far, your hypothetical
> >scheduler would allocate all of the CPU to the yielders.
>
> Not all, just the same share any other process gets. They're all
> ready-to-run, they're all at the same priority.

Correct my logic, please:

1. Rule: The less you want the CPU, the more you get it.
2. A yielder process never wants the CPU
3. As a result of Rule 1, it always gets it.

2002-06-18 22:00:49

by David Schwartz

[permalink] [raw]

Subject: RE: Question about sched_yield()

>Correct my logic, please:
>
> 1. Rule: The less you want the CPU, the more you get it.

The more you relinquish the CPU, the more you get it when you do want it.
(Dynamic priority.)

> 2. A yielder process never wants the CPU

A yielder process *always* wants the CPU, but always relinquishes it when it
gets it. (It's always ready-to-run.)

> 3. As a result of Rule 1, it always gets it.

The correct rules 1 and 2 don't lead to the conclusion you think.

DS

2002-06-18 22:31:15

by Ingo Molnar

[permalink] [raw]

Subject: RE: Question about sched_yield()

On Tue, 18 Jun 2002, David Schwartz wrote:

> Exactly. This is the UNIX tradition of static and dynamic
> priorities. The more polite you are about yielding the CPU when you
> don't need it, the more claim you have to getting it when you do need
> it.

firstly, the thing that defines the scheduler's implementation details is
not tradition but actual, hard, verifiable use. If you've ever seen
sched_yield() code under Linux then you'll quickly realize that it's
mainly used as a "oh damn, i cannot continue now, i have no proper kernel
object to sleep on, lets busy-wait" kind of mechanizm. (Which, by the way,
is as far from polite as it gets.)

secondly, i'd like to ask everyone arguing one way or other, and this
means you and Robert and everyone else as well, to actually read and think
about the fix i did.

The new implementation of sched_yield() is *not* throwing away your
timeslices immediately. In fact it first tries it the 'polite' way to get
some progress by gradually decreasing its priority - *IF* that fails (and
it has to fail and keep looping for at least ~10 times if there are
default priorities) then it will start dropping away timeslices - one by
one. That gives another 20 opportunities for the task to actually get some
work done. Even after this we still let the task run one more jiffy.
*THEN* only is the task pushed back into the expired array.

so the tests quoted here were the extreme example: a task did nothing but
sched_yield(). And the modified scheduler did a good job of supressing
this process, if there was other, non-sched_yield() work around. A
database process with a casual sched_yield() call should not see much
effect.

Ingo

2002-06-18 22:43:56

by Olivier Galibert

[permalink] [raw]

Subject: Re: Question about sched_yield()

On Tue, Jun 18, 2002 at 11:01:53AM -0700, David Schwartz wrote:
> Your assumptions are just plain wrong. The yielder is being nice, so it
> should get preferential treatment, not worse treatment.

Heh? Yielding is not about being nice, if you want to be nice you
lower your priority. Yielding is telling the scheduler "I am
temporarily out of things to do, maybe for a while, but a least until
all the others running threads got a chance to run too". Any
scheduler that runs you immediatly again without running the others
threads is _broken_.

OG.

2002-06-18 22:52:25

by Stephen Oberholtzer

[permalink] [raw]

Subject: Re: Question about sched_yield()

At 11:00 AM 6/18/2002 -0700, David Schwartz wrote:
>This is the same error repeated again. Since you realize that an endless
>loop on sched_yield is *not* equivalent to blocking, why do you then say "in
>fact doing useful work"? By what form of ESP is the kernel supposed to
>determine that the sched_yield task is not 'doing useful work' and the other
>task is?

By this form of ESP: sched_yield() means "I have nothing better to do right now, give my time to someone who does". If a thread is doing useful work, why would it call sched_yield() ?!?

--
Stevie-O

Real programmers use COPY CON PROGRAM.EXE

2002-06-18 22:56:31

by Rob Landley

[permalink] [raw]

Subject: Re: Question about sched_yield()

On Tuesday 18 June 2002 03:11 pm, David Schwartz wrote:

> I'm sorry, but you are being entirely unreasonable. The kernel has no way
> to know which processes are doing something useful and which ones are just
> wasting CPU.

So the fact one of them is calling sched_yield (repeatedly) and the other one
isn't doesn't count as just a little BIT of a hint?

> What sched_yield is good for is if you encounter a situation where you
> need/want some resource and another thread/process has it. You call
> sched_yield once, and maybe when you run again, the other thread/process
> will have released it.

Not if it was a higher-priority task that has already exhausted its time
slice...

> You can also use it as the spin function in
> spinlocks.

In user space?

Rob

2002-06-19 02:11:59

by David Schwartz

[permalink] [raw]

Subject: Re: Question about sched_yield()

On Tue, 18 Jun 2002 18:45:55 -0400, Stevie O wrote:

>At 11:00 AM 6/18/2002 -0700, David Schwartz wrote:

>>This is the same error repeated again. Since you realize that an endless
>>loop on sched_yield is *not* equivalent to blocking, why do you then say
>>"in
>>fact doing useful work"? By what form of ESP is the kernel supposed to
>>determine that the sched_yield task is not 'doing useful work' and the
>>other
>>task is?

>By this form of ESP: sched_yield() means "I have nothing better to do right
>now, give my time to someone who does".

No, this is not what sched_yield means. What 'sched_yield' means is that
you're at a point where it's convenient for another process to run. For
example, perhaps you just released a mutex that you held for a long period of
time, or perhaps you could function more efficiently if you could acquire a
resource another thread holds.

>If a thread is doing useful work,
>why would it call sched_yield() ?!?

Perhaps to allow other threads to make forward progress. Perhaps to give
other threads a chance to use a resource it just released. Perhaps in hopes
that another thread will release a resource it could benefit from being able
to acquire.

DS

2002-06-19 02:59:09

by Stephen Oberholtzer

[permalink] [raw]

Subject: Re: Question about sched_yield()

At 07:11 PM 6/18/2002 -0700, David Schwartz wrote:
>>By this form of ESP: sched_yield() means "I have nothing better to do right
>>now, give my time to someone who does".
>
> No, this is not what sched_yield means. What 'sched_yield' means is that
>you're at a point where it's convenient for another process to run. For
>example, perhaps you just released a mutex that you held for a long period of
>time, or perhaps you could function more efficiently if you could acquire a
>resource another thread holds.

To-may-to, to-mah-to. Okay. I'll try again.

sched_yield() means "It would be convenient (or I would function more efficiently) if another process or thread were to run right now." i.e., "Would you be so kind as to run somebody else? As in, not me.".

From what I've gathered in this thread, something akin to this happens:

You are trying to get information out of any of three people. Two of the three people will have better/more reliable information, so you give them a higher priority.

You choose a person to ask for information -> schedule()
You ask person #1 for information.
Person #1 says: "Ask someone else" -> sched_yield()

You choose a person to ask for information -> schedule()
You ask person #2 for information.
Person #2 says: "Ask someone else" -> sched_yield()

You choose a person to ask for information -> schedule()
Now, any rational human being (who actually wants this information) will certainly proceed as such:

You ask person #3 for information.
Person #3 says: "Here is the information you need."
You go on your merry way.

However, the Linux scheduler will look at its options, see that Person #1 has a higher priority than Person #3, and that Person #1 is marked ready-to-run, it proceeds:

You ask person #1 for information.
Person #1 says: "Ask someone else" -> sched_yield()

Proceed to thrash between #1 and #2, ignoring #3.

Now, don't get me wrong; I understand your argument. Like I said, Person #1 is not blocking (sched_yield() does not block), and is a higher priority than #3. All other things being equal, Person #1 should be scheduled after Person #2 yields.

But things aren't equal. There's a crucial difference here: Person #1 has relinquished his timeslice, and Person #3 hasn't. And I'm arguing that, if a thread/process/whatever calls sched_yield, then that thread should only be run in that timeslice if:
* There are no other threads that are ready-to-run on that processor, or
* Every other thread has called sched_yield().

Yes, the current behavior is technically correct; sched_yield() properly gives up control to another process. But it will sometimes give up the processor to another process that previously did a sched_yield() in the current timeslice. And that's just stupid.

>>If a thread is doing useful work,
>>why would it call sched_yield() ?!?
>
> Perhaps to allow other threads to make forward progress. Perhaps to give
>other threads a chance to use a resource it just released. Perhaps in hopes
>that another thread will release a resource it could benefit from being able
>to acquire.

Yeah. And if two threads decide to be 'nice' -- to allow a third thread to make forward progress -- neither will. The scheduler with thrash between the two threads, in preference to scheduling the third thread. This is in accordance with the strictest interpretation of sched_yield()'s definition.

--
Stevie-O

Real programmers use COPY CON PROGRAM.EXE

2002-06-19 11:15:24

by Bill Davidsen

[permalink] [raw]

Subject: RE: Question about sched_yield()

On 18 Jun 2002, Robert Love wrote:

> the second bunch should _not_ receive has much CPU time as the first.
> This has nothing to do with intelligent work or blocking or picking your
> nose. It has everything to do with the fact that yielding means
> "relinquish my timeslice" and "put me at the end of the runqueue".

I have put some ideas below, I agree with "relinquish" but not end of
queue.

> If we are doing this, then why does the sched_yield'ing task monopolize
> the CPU? BECAUSE IT IS BROKEN.

Consider the case where a threaded application and a CPU hog are running.
In sum the threaded process is also a hog, although any one thread is not.

I find I can figure out problems like this if I start at the desired
result and work back, and what I believe is the desired result is that the
pure hog gets half the CPU and the threaded application, in aggregate, get
the other. So to provide this, I would think that the desired action at
sched_yeild() would look somewhat like this:

If there is less than {sched overhead} on the timeslice
add to end of queue
do accounting to reflect the time used
run the next thread
else
if there is another thread of this process
queue this thread following the last thread of this process
give the remaining time slice to the top thread of this process
else
put the yeild thread at the end of the queue
.
.

One question which arises is cpu affinity, and I confess that I can't
predict the better of process affinity or contested resource affinity (run
the next thread now). But for uni machines, I think what I outline above
does result in the behaviour I find most intuitive and desirable.

On your comment about leaving the yeilded thread at the top of the queue,
putting the yeilded thread at the end of the queue effectively forces the
context switch time to be the sum of all the timeslices given in one pass
through the queue. In a system with a contested resource and at least one
hog, I think that would make the threaded task starve for CPU. As noted by
several people, queue back at the head also can behave badly for some
cases, although the demo program provided is probably not typical.

I think what I suggest will avoid both problems, although it might have
other drawbacks, like sched overhead if the queue was very long and didn't
have other threads. Atypical, but a max "scan for another thread" count
could be provided, or even a per-process count of runable threads in the
queue, which is probably more overhead than just scanning for light load.
At least under light load you can afford it.

Your comments?

--
bill davidsen <[email protected]>
CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.

2002-06-19 12:06:58

by Ingo Molnar

[permalink] [raw]

Subject: RE: Question about sched_yield()

On Wed, 19 Jun 2002, Bill Davidsen wrote:

> Consider the case where a threaded application and a CPU hog are
> running. In sum the threaded process is also a hog, although any one
> thread is not.

it is a mistake to assume that sched_yield() == 'threaded applications'.

sched_yield() is a horrible interface to base userspace spinlocks on, it's
not in any way cheaper than real blocking, even in the best-case situation
when there is no 'useless' yielding back and forth. In the worst-case
situation it can be a few orders more expensive (!) than blocking-based
solutions. LinuxThread's use of sched_yield() was and remains a hack. I'm
actually surprised that it works for real applications.

We are trying to improve things as much as possible on the kernel
scheduler side, but since there is absolutely no object/contention
information available for kernel space (why it wants to schedule away,
which other processes are using the blocked object, etc.), it's impossible
to do a perfect job - in fact it's not even possible to do a 'good' job.

IMO people interested in high-performance threading should concentrate on
the lightweight, kernel-backed userspace semaphores patch(es) that have
been presented a few weeks ago, and merge those concepts into LinuxThreads
(or into their own threading/spinlock solutions). Those patches do it
properly, and the scheduler will be able to give all the help it can.

Ingo

2002-06-20 20:31:51

by David Schwartz

[permalink] [raw]