2001-04-12 08:55:35

by Adam J. Richter

[permalink] [raw]
Subject: PATCH(?): linux-2.4.4-pre2: fork should run child first

I remember sometime in the late 80's a fellow at UniSoft
named Don whose last name escapes me just now told me about a
paper presented at a Usenix symposium that had some measurements
that purported that copy-on-write was a performance lose and
better performance would be achieve by having fork() just copy
all of the writable pages of the parent process.

It turned out that the particular unix-like system on which
these benchmarks were taken had a version of fork that did not run
the child first. As it was explained to me then, most of the time,
the child process from a fork will do just a few things and then do
an exec(), releasing its copy-on-write references to the parent's
pages, and that is the big win of copy-on-write for fork() in practice.
This oversight was considered a big embarassment for the operating
system in question, so I won't name it here.

Guess why you're seeing this email. That's right. Linux-2.4.3's
fork() does not run the child first. Consequently, the parent will
probably generate unnecessary copy-on-write page copies until it burns
through its remaining clock ticks (any COW's that the child causes will
basically happen no matter what the order of execution is) or calls
wait() (and while the wait is blocking, the parent's CPU priority will
decay as the scheduler periodically recalculates process priorities, so
that bit of dynamic priority has probably not been allocated where the
user will be able to use it, if we want to look at "fairness" in such
detail).

I suppose that running the child first also has a minor
advantage for clone() in that it should make programs that spawn lots
of threads to do little bits of work behave better on machines with a
small number of processors, since the threads that do so little work that
they accomplish they finish within their time slice will not pile up
before they have a chance to run. So, rather than give the parent's CPU
priority to the child only if CLONE_VFORK is not set, I have decided to
do a bit of machete surgery and have the child always inherit all of the
parent's CPU priority all of the time. It simplifies the code and
probably saves a few clock cycles (and before you say that this will
cost a context switch, consider that the child will almost always run
at least one time slice anyhow).

I have attached the patch below. I have also adjusted the
comment describing the code. Please let me know if this hand waving
explanation is sufficient. I'm trying to be lazy on not do a measurement
project to justify this relatively simple change. However, I do know, from
a simple test program ("printf ("%d", fork());"), that this patch has
the intended effect of running the child first.

--
Adam J. Richter __ ______________ 4880 Stevens Creek Blvd, Suite 104
[email protected] \ / San Jose, California 95129-1034
+1 408 261-6630 | g g d r a s i l United States of America
fax +1 408 261-6631 "Free Software For The Rest Of Us."


Attachments:
(No filename) (2.93 kB)
fork.patch (1.15 kB)
Download all attachments

2001-04-12 12:39:23

by Horst H. von Brand

[permalink] [raw]
Subject: Re: PATCH(?): linux-2.4.4-pre2: fork should run child first

"Adam J. Richter" <[email protected]> said:

[...]

> I suppose that running the child first also has a minor
> advantage for clone() in that it should make programs that spawn lots
> of threads to do little bits of work behave better on machines with a
> small number of processors, since the threads that do so little work that
> they accomplish they finish within their time slice will not pile up
> before they have a chance to run. So, rather than give the parent's CPU
> priority to the child only if CLONE_VFORK is not set, I have decided to
> do a bit of machete surgery and have the child always inherit all of the
> parent's CPU priority all of the time. It simplifies the code and
> probably saves a few clock cycles (and before you say that this will
> cost a context switch, consider that the child will almost always run
> at least one time slice anyhow).

And opens the system up to DoS attacks: You can't have a process fork(2)
at will and so increase its (aggregate) CPU priority.
--
Dr. Horst H. von Brand mailto:[email protected]
Departamento de Informatica Fono: +56 32 654431
Universidad Tecnica Federico Santa Maria +56 32 654239
Casilla 110-V, Valparaiso, Chile Fax: +56 32 797513

2001-04-12 13:43:22

by Hubertus Franke

[permalink] [raw]
Subject: Re: PATCH(?): linux-2.4.4-pre2: fork should run child first



First the 2.4.3 tries to prefer the child.
Only does it in half of the cases though (odd counter numbers).

Your patch seems a bit radical for what you want to do.
Taking away all tokens from the parents will require that it has to wait
until the next recalculate loop.
Since (p) and (current) share the same <mm> and the same <cpu>
why not simply make sure that the (p->counter) > (current->counter).
If you are concerned that a tick is not enough to fire off the
exec then make it a predefined constant.

Try this ... this will guarantee that (p->counter) > (current->counter)
and it seems not as radical

p->counter = (current->counter + 1) >> 1;
current->counter = (current->counter - 1) >> 1;
if (!current->counter)
current->need_resched = 1;


instead of this


- p->counter = (current->counter + 1) >> 1;
- current->counter >>= 1;
- if (!current->counter)
- current->need_resched = 1;
+ p->counter = current->counter;
+ current->counter = 0;
+ current->need_resched = 1;

Hubertus Franke
Enterprise Linux Group (Mgr), Linux Technology Center (Member Scalability)
, OS-PIC (Chair)
email: [email protected]
(w) 914-945-2003 (fax) 914-945-4425 TL: 862-2003



"Adam J. Richter" <[email protected]>@vger.kernel.org on 04/12/2001
04:55:16 AM

Sent by: [email protected]


To: [email protected], [email protected]
cc:
Subject: PATCH(?): linux-2.4.4-pre2: fork should run child first



I remember sometime in the late 80's a fellow at UniSoft
named Don whose last name escapes me just now told me about a
paper presented at a Usenix symposium that had some measurements
that purported that copy-on-write was a performance lose and
better performance would be achieve by having fork() just copy
all of the writable pages of the parent process.

It turned out that the particular unix-like system on which
these benchmarks were taken had a version of fork that did not run
the child first. As it was explained to me then, most of the time,
the child process from a fork will do just a few things and then do
an exec(), releasing its copy-on-write references to the parent's
pages, and that is the big win of copy-on-write for fork() in practice.
This oversight was considered a big embarassment for the operating
system in question, so I won't name it here.

Guess why you're seeing this email. That's right. Linux-2.4.3's
fork() does not run the child first. Consequently, the parent will
probably generate unnecessary copy-on-write page copies until it burns
through its remaining clock ticks (any COW's that the child causes will
basically happen no matter what the order of execution is) or calls
wait() (and while the wait is blocking, the parent's CPU priority will
decay as the scheduler periodically recalculates process priorities, so
that bit of dynamic priority has probably not been allocated where the
user will be able to use it, if we want to look at "fairness" in such
detail).

I suppose that running the child first also has a minor
advantage for clone() in that it should make programs that spawn lots
of threads to do little bits of work behave better on machines with a
small number of processors, since the threads that do so little work that
they accomplish they finish within their time slice will not pile up
before they have a chance to run. So, rather than give the parent's CPU
priority to the child only if CLONE_VFORK is not set, I have decided to
do a bit of machete surgery and have the child always inherit all of the
parent's CPU priority all of the time. It simplifies the code and
probably saves a few clock cycles (and before you say that this will
cost a context switch, consider that the child will almost always run
at least one time slice anyhow).

I have attached the patch below. I have also adjusted the
comment describing the code. Please let me know if this hand waving
explanation is sufficient. I'm trying to be lazy on not do a measurement
project to justify this relatively simple change. However, I do know, from
a simple test program ("printf ("%d", fork());"), that this patch has
the intended effect of running the child first.

--
Adam J. Richter __ ______________ 4880 Stevens Creek Blvd, Suite
104
[email protected] \ / San Jose, California 95129-1034
+1 408 261-6630 | g g d r a s i l United States of America
fax +1 408 261-6631 "Free Software For The Rest Of Us."






2001-04-12 19:16:13

by Adam J. Richter

[permalink] [raw]
Subject: Re: PATCH(?): linux-2.4.4-pre2: fork should run child first

>> = Adam J. Richter <[email protected]>
> = Horst von Brand <[email protected]>

>> I suppose that running the child first also has a minor
>> advantage for clone() in that it should make programs that spawn lots
>> of threads to do little bits of work behave better on machines with a
>> small number of processors, since the threads that do so little work that
>> they accomplish they finish within their time slice will not pile up
>> before they have a chance to run. So, rather than give the parent's CPU
>> priority to the child only if CLONE_VFORK is not set, I have decided to
>> do a bit of machete surgery and have the child always inherit all of the
>> parent's CPU priority all of the time. It simplifies the code and
>> probably saves a few clock cycles (and before you say that this will
>> cost a context switch, consider that the child will almost always run
>> at least one time slice anyhow).

>And opens the system up to DoS attacks: You can't have a process fork(2)
>at will and so increase its (aggregate) CPU priority.

My change does not increase the aggregate priority of
parent+child. Perhaps I misunderstand your comment.

Adam J. Richter __ ______________ 4880 Stevens Creek Blvd, Suite 104
[email protected] \ / San Jose, California 95129-1034
+1 408 261-6630 | g g d r a s i l United States of America
fax +1 408 261-6631 "Free Software For The Rest Of Us."

2001-04-12 19:48:36

by Adam J. Richter

[permalink] [raw]
Subject: Re: PATCH(?): linux-2.4.4-pre2: fork should run child first

Hubertus Franke <[email protected]> writes:

>Try this ... this will guarantee that (p->counter) > (current->counter)
>and it seems not as radical

> p->counter = (current->counter + 1) >> 1;
> current->counter = (current->counter - 1) >> 1;
> if (!current->counter)
> current->need_resched = 1;

>instead of this


>- p->counter = (current->counter + 1) >> 1;
>- current->counter >>= 1;
>- if (!current->counter)
>- current->need_resched = 1;
>+ p->counter = current->counter;
>+ current->counter = 0;
>+ current->need_resched = 1;


No. I tried your change and also tried it with setting
current->need_resched to 1 in all cases, and it still seems to run the
parent first in at least half of the tries. Evidently,
current->counter must be zero to make the currently running process
give up the CPU immediately, which is the important thing (so that the
parent does not touch its virtual memory for a while).

Adam J. Richter __ ______________ 4880 Stevens Creek Blvd, Suite 104
[email protected] \ / San Jose, California 95129-1034
+1 408 261-6630 | g g d r a s i l United States of America
fax +1 408 261-6631 "Free Software For The Rest Of Us."


2001-04-13 16:35:34

by Hubertus Franke

[permalink] [raw]
Subject: Re: PATCH(?): linux-2.4.4-pre2: fork should run child first



First you are wrong by assuming that setting current->counter=0
will guarantee that the child runs first. In an SMP it only means
that this might initiate another recalculate and could run from
there in parallel with the child.

Actually quickly looking at the source code here.
You don't have to call schedule() at all.
A bit further down wake_up_process(p) is called, which in turn
calls reschedule_idle(p). Hence we don't have to call schedule.

If one satisfies the conditions:

(preemption_goodness(current,p,p->processor) > 1) then the child should
run.
[ with (child==p) and (parent==current) ].

This is for the uniprocessor system. In the SMP both could continue
to run. Looking at goodness computation, since p->mm == current->mm,
p->nice == current->nice and p->processor == current->processor,
all what matters is the difference in the counter values.
My proposed patch always yield 1, which ofcourse doesn't have the desired
effect.


Here is a patch that always yields a diff of 2. However for odd number of
current->counter it looses a token between the two.

{
long parcnt = current->counter;
p->counter = (parcnt+((parcnt&1)?1:2)) >> 1;
parcnt >>= 1;
if (parcnt>0) {
current->counter = 0;
current->need_resched = 1;
} else {
current->counter = parcnt - 1;
}


There is the other view that I should not loose a token.
In that case the following code will add a token in the odd counter case.
I think that this is preferrable over the first solution.


p->counter = (current->counter+3)>>1;
current->counter = (current->counter >> 1) - 1;
if (current->counter <= 0) {
current->counter = 0;
current->need_resched = 1;
}



Hubertus Franke
Enterprise Linux Group (Mgr), Linux Technology Center (Member Scalability)
, OS-PIC (Chair)
email: [email protected]
(w) 914-945-2003 (fax) 914-945-4425 TL: 862-2003



"Adam J. Richter" <[email protected]> on 04/12/2001 08:42:12 PM

To: Hubertus Franke/Watson/IBM@IBMUS
cc:
Subject: Re: PATCH(?): linux-2.4.4-pre2: fork should run child first



> p->counter = (current->counter + 1) >> 1;
> current->counter = (current->counter - 1) >> 1;
> schedule();

I don't have time to try this right now and I'm not sure
what locks are held at that point in the code or whether schedule()
will actually schedule a different process if the current one
has current->counter > 0 (even if current->need_resched is set and
even if another process has a higher proc->counter value).

Even if it did work, your code is more complex and makes it
less likely that the child will reach exec() before the parent
runs again. So, I am not sure I would see the advantage if it did work.

Adam J. Richter __ ______________ 4880 Stevens Creek Blvd, Suite
104
[email protected] \ / San Jose, California 95129-1034
+1 408 261-6630 | g g d r a s i l United States of America
fax +1 408 261-6631 "Free Software For The Rest Of Us."



2001-04-13 21:09:18

by John Fremlin

[permalink] [raw]
Subject: Re: PATCH(?): linux-2.4.4-pre2: fork should run child first

"Adam J. Richter" <[email protected]> writes:

[...]

> It turned out that the particular unix-like system on which
> these benchmarks were taken had a version of fork that did not run
> the child first. As it was explained to me then, most of the time,
> the child process from a fork will do just a few things and then do
> an exec(), releasing its copy-on-write references to the parent's
> pages, and that is the big win of copy-on-write for fork() in practice.
> This oversight was considered a big embarassment for the operating
> system in question, so I won't name it here.
>
> Guess why you're seeing this email. That's right. Linux-2.4.3's
> fork() does not run the child first.

Not always, if I understand correctly. Setting to always is putting
policy in kernel in a small way. If an app wants to fork and exec, it
should use *vfork* and exec, which is a performance win across many
OSs because the COW mappings don't even have to be set up, IIRC.

[...]

--

http://www.penguinpowered.com/~vii

2001-04-13 23:51:59

by Adam J. Richter

[permalink] [raw]
Subject: Re: PATCH(?): linux-2.4.4-pre2: fork should run child first

"John Fremlin" <[email protected]> writes:
> "Adam J. Richter" <[email protected]> writes:
>> Guess why you're seeing this email. That's right. Linux-2.4.3's
>> fork() does not run the child first.

>[...] If an app wants to fork and exec, it
>should use *vfork* and exec, which is a performance win across many
>OSs because the COW mappings don't even have to be set up, IIRC.

Even in that case, you want to run the child first because
it may block on I/O when it does the exec or the new program starts
running, and you are likely to be able to use that time while the
child is waiting on I/O for the parent to run (typically just to
record the process in its internal data structures and then call
wait()). Basically, you want to kick off some new I/O before running
something that can run while that I/O is pending.

Of course, in the vfork case, this change is probably only a
very small win. The real advantage is with regular fork() followed
by an exec, which happens quite a lot. For example, I do not see
vfork anywhere in the bash sources.

Adam J. Richter __ ______________ 4880 Stevens Creek Blvd, Suite 104
[email protected] \ / San Jose, California 95129-1034
+1 408 261-6630 | g g d r a s i l United States of America
fax +1 408 261-6631 "Free Software For The Rest Of Us."

2001-04-14 01:55:59

by John Fremlin

[permalink] [raw]
Subject: Re: PATCH(?): linux-2.4.4-pre2: fork should run child first

"Adam J. Richter" <[email protected]> writes:

> "John Fremlin" <[email protected]> writes:
> > "Adam J. Richter" <[email protected]> writes:
> >> Guess why you're seeing this email. That's right. Linux-2.4.3's
> >> fork() does not run the child first.
>
> >[...] If an app wants to fork and exec, it
> >should use *vfork* and exec, which is a performance win across many
> >OSs because the COW mappings don't even have to be set up, IIRC.
>
> Even in that case, you want to run the child first because

The parent is not allowed to run until the child execs, if I
understand correctly. Read up on CLONE_VFORK.

[...]

> Of course, in the vfork case, this change is probably only a very
> small win. The real advantage is with regular fork() followed by an
> exec, which happens quite a lot. For example, I do not see vfork
> anywhere in the bash sources.

If it is a real advantage you can get a bigger advantage by changing
the app to use vfork, i.e. you can solve the problem (if it exists)
better without hacking the kernel. Further, your change will hurt
those apps which expect the parent to be given a fair chance, so
you'll need to add a fairfork(2) syscall to comply with Californian
anti age discrimmination legislation (humour). In fact, if you think
fork+exec is such a big performance hit why not go for spawn(2) and
have Linus and Al jump on you? ;-)

[...]

--

http://www.penguinpowered.com/~vii

2001-04-14 02:30:40

by Linus Torvalds

[permalink] [raw]
Subject: Re: PATCH(?): linux-2.4.4-pre2: fork should run child first



On 14 Apr 2001, John Fremlin wrote:
>
> . In fact, if you think
> fork+exec is such a big performance hit why not go for spawn(2) and
> have Linus and Al jump on you? ;-)

spawn() is trivial to implement if you want to. I don't think it's all
that much more interesting than vfork()+execve(), though.

Linus

2001-04-14 02:46:11

by Adam J. Richter

[permalink] [raw]
Subject: Re: PATCH(?): linux-2.4.4-pre2: fork should run child first

"John Fremlin" <[email protected]> writes:
>"Adam J. Richter" <[email protected]> writes:
>> "John Fremlin" <[email protected]> writes:
>> > "Adam J. Richter" <[email protected]> writes:

>The parent is not allowed to run until the child execs, if I
>understand correctly. Read up on CLONE_VFORK.

I thought that I had checked this a few months ago and
discovered that Linux let the vfork parent run, but I wrote a
test program just now, and you're apparently right about that,
at least with respect to 2.4.3, although that's all the more
reason to give the short term CPU priority to the process that
can use it (the child), thereby slightly increasing the average
runtime available in a timeslice, which in term slightly decreases
the percentage of time spent in context switch overhead. This will
usually be a really tiny amount, but my point is that since there
is probably a tiny advantage to giving the remaining time slices
to the child even here, there is no need to complicate my patch.

>> Of course, in the vfork case, this change is probably only a very
>> small win. The real advantage is with regular fork() followed by an
>> exec, which happens quite a lot. For example, I do not see vfork
>> anywhere in the bash sources.

>If it is a real advantage you can get a bigger advantage by changing
>the app to use vfork, i.e. you can solve the problem (if it exists)
>better without hacking the kernel.

It is impractical to change every application, including
ones that you don't have access to, and many of them have reaons for
using fork instead of vfork, and you don't even have access to them.
For example, the setup that the child does between the fork and the
exec is complex enough so that it might mess up the parent's memory or,
more commonly, its error handling code for exec failure is.

Even if you could show that vfork was the right choice in all
cases (and it isn't), that would still be no reason for making do_fork
unnecessary slow and complex. My change simplifies do_fork(), makes
it runs a few cycles faster, and, I believe, makes it behave like more
fork on most other systems. If you want to argue against this change,
please justify the real benefits of the performance cost, the
complexity and nonstandard behavior you are advocating. (Admittedly
the last two are really small, but I believe they are positive).

Note that I've dropped Linus's email address for this thread,
as it does not appear to be arguing a real technical advantage to the
old do_fork() behavior. So, while it may be interesting and informative
and on topic for lkml, it is not seem to be an argument to Linus that
he should reject or modify my patch.

Adam J. Richter __ ______________ 4880 Stevens Creek Blvd, Suite 104
[email protected] \ / San Jose, California 95129-1034
+1 408 261-6630 | g g d r a s i l United States of America
fax +1 408 261-6631 "Free Software For The Rest Of Us."

2001-04-14 02:51:43

by Alexander Viro

[permalink] [raw]
Subject: Re: PATCH(?): linux-2.4.4-pre2: fork should run child first



On Fri, 13 Apr 2001, Linus Torvalds wrote:

>
>
> On 14 Apr 2001, John Fremlin wrote:
> >
> > . In fact, if you think
> > fork+exec is such a big performance hit why not go for spawn(2) and
> > have Linus and Al jump on you? ;-)
>
> spawn() is trivial to implement if you want to. I don't think it's all
> that much more interesting than vfork()+execve(), though.

Or faster, for that matter...

2001-04-14 02:52:55

by Ulrich Drepper

[permalink] [raw]
Subject: Re: PATCH(?): linux-2.4.4-pre2: fork should run child first

Linus Torvalds <[email protected]> writes:

> spawn() is trivial to implement if you want to. I don't think it's all
> that much more interesting than vfork()+execve(), though.

spawn() (actually posix_spawn) is currently implemented in the libc.
If anybody for whatever reason thinks it is necessary to implement
this in the kernel look at the interface. It is really only
interesting for systems with limited VMs but it would be trivial to
add another flag which allow different scheduling characteristics
which some people apparently want.

--
---------------. ,-. 1325 Chesapeake Terrace
Ulrich Drepper \ ,-------------------' \ Sunnyvale, CA 94089 USA
Red Hat `--' drepper at redhat.com `------------------------

2001-04-14 05:50:04

by Linus Torvalds

[permalink] [raw]
Subject: Re: PATCH(?): linux-2.4.4-pre2: fork should run child first



On Sat, 14 Apr 2001, Rik van Riel wrote:
>
> Also, have you managed to find a real difference with this?

It actually makes a noticeable difference on lmbench, so I think adam is
100% right.

> If it turns out to be beneficial to run the child first (you
> can measure this), why not leave everything the same as it is
> now but have do_fork() "switch threads" internally ?

Probably doesn't much matter. We've invalidated the TLB anyway due to the
page table copy, so the cost of switching the MM is not all that
noticeable.

Linus

2001-04-14 05:50:13

by Rik van Riel

[permalink] [raw]
Subject: Re: PATCH(?): linux-2.4.4-pre2: fork should run child first

On Thu, 12 Apr 2001, Adam J. Richter wrote:

> I have attached the patch below. I have also adjusted the
> comment describing the code. Please let me know if this hand waving
> explanation is sufficient. I'm trying to be lazy on not do a
> measurement project to justify this relatively simple change.

It could upset programs which use threads to handle
relatively IO poor things (like, waiting on disk IO in a
thread, like glibc does to fake async file IO).

Also, have you managed to find a real difference with this?

If it turns out to be beneficial to run the child first (you
can measure this), why not leave everything the same as it is
now but have do_fork() "switch threads" internally ?

(since everything is COW-ed, it wouldn't even need to do a real
thread switch, this should be fairly easy)

regards,

Rik
--
Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

http://www.surriel.com/
http://www.conectiva.com/ http://distro.conectiva.com.br/

2001-04-14 07:58:59

by Adam J. Richter

[permalink] [raw]
Subject: Re: PATCH(?): linux-2.4.4-pre2: fork should run child first

Rik van Riel <[email protected]> writes, regarding the idea
of having do_fork() give all of the parent's remaining time slice to
the newly created child:

>It could upset programs which use threads to handle
>relatively IO poor things (like, waiting on disk IO in a
>thread, like glibc does to fake async file IO).

Good point.

[...]
>If it turns out to be beneficial to run the child first (you
>can measure this), why not leave everything the same as it is
>now but have do_fork() "switch threads" internally ?

That is an elegant idea. Not only would you save a few cycles by
avoiding what's left of the context switch, but, more imporantly, you
would be sure that no intervening process could be selected to run
between the parent giving up the CPU and the child running (which
could otherwise waste an additional full context swtich). Also, you
then would not necessarily have to make the parent give up all of its
remaining time slice. These two characteristics means that future
tweaks to the scheduler would be much less likely to accidentally
defeat running of the child first.

As far code cleanliness goes, you get to delete a line
from do_fork ("current->need_resched = 1;"), but I think that's about
it. You might even be able to avoid adding "current = p;" to do_fork
by having newly allocating task_struct assume the identity of the
parent and making the changes to "current", although I wonder
if anything else points to the current task or if that might
screw up any interrupts that occur during that process.

Adam J. Richter __ ______________ 4880 Stevens Creek Blvd, Suite 104
[email protected] \ / San Jose, California 95129-1034
+1 408 261-6630 | g g d r a s i l United States of America
fax +1 408 261-6631 "Free Software For The Rest Of Us."




2001-04-14 08:43:40

by Michael O'Reilly

[permalink] [raw]
Subject: Re: PATCH(?): linux-2.4.4-pre2: fork should run child first

"Adam J. Richter" <[email protected]> writes:
> Rik van Riel <[email protected]> writes, regarding the idea
> of having do_fork() give all of the parent's remaining time slice to
> the newly created child:
>
> >It could upset programs which use threads to handle
> >relatively IO poor things (like, waiting on disk IO in a
> >thread, like glibc does to fake async file IO).
>
> Good point.

Is it really? If a program is using thread to handle IO things,
then:

a) It's not going to create a thread for every IO! So I think
the argument is suprious anyway.

b) You _still_ want the child to run first. The child
will start the I/O and block, then switching back
to the parent. This maximises the I/O thruput without
costing you any CPU. (Reasoning: The child running
2nd will increase the latency which automatically
reduces the number of ops/second you can get).

Michael.

2001-04-14 09:01:33

by Linus Torvalds

[permalink] [raw]
Subject: Re: PATCH(?): linux-2.4.4-pre2: fork should run child first



On Sat, 14 Apr 2001, Adam J. Richter wrote:
>
> [...]
> >If it turns out to be beneficial to run the child first (you
> >can measure this), why not leave everything the same as it is
> >now but have do_fork() "switch threads" internally ?
>
> That is an elegant idea.

I doubt it. It sounds like one of those "cool value" ideas that are
actually really stupid except they sound cool because you have to think
about the twists and turns.

So yes, you could "give" your TLB state to the child, and take the childs
state yourself (eventually, when you re-schedule back to the parent).
They're supposed to be the same, after all. And by doing so, you could do
a "switch_to()" to the child, without actually switching mm state at all.
Fine. Cool TLB optimization.

Except you don't actually _have_ any TLB state to optimize away, as you
just invalidated it anyway when you did the COW thing on the page tables.
So you would only optimize away a "mov xxx,%cr3" - which is the least
expensive part of switching TLB's. You would NOT optimize away any actual
TLB reloads.

And oh, btw, it also means that you'd better make sure that /proc knows
about the fact that the MM state is no longer yours, but your childs, so
that a concurrent "ps" doesn't mess us. Maybe it works as-is, and maybe it
doesn't.

And what if the guy who did the fork() had done a clone(CLONE_MM) before,
or was the child of a vfork'ing parent? We can't give the mm state to the
child, because we're sharing it with somebody else who expects to share it
with the _parent_. Oh, and the co-thread, btw, might be _using_ those page
tables on another CPU at any time.

And oh, there's the small special case of "init", which uses a fork() to
create the first user-mode mm state, so we'd have to special-case that one
too - we can't let "init_mm" go to a user process. So at the very least it
would have to be conditional on both that and the thread case.

There's a ton of reasons why you _really_ don't want to play games here.
Switching contexts is tricky enough as it is. Let's not try to be "clever"
about it.

So the best you could do is to do a full context switch to the child.
Which setting "current->need_resched = 1" will already end up doing. Plus
it does the right thing on SMP.

Linus

2001-04-14 13:36:13

by Rik van Riel

[permalink] [raw]
Subject: Re: PATCH(?): linux-2.4.4-pre2: fork should run child first

On Fri, 13 Apr 2001, Linus Torvalds wrote:
> On Sat, 14 Apr 2001, Rik van Riel wrote:
> >
> > Also, have you managed to find a real difference with this?
>
> It actually makes a noticeable difference on lmbench, so I think adam is
> 100% right.
>
> > If it turns out to be beneficial to run the child first (you
> > can measure this), why not leave everything the same as it is
> > now but have do_fork() "switch threads" internally ?
>
> Probably doesn't much matter. We've invalidated the TLB anyway due to
> the page table copy, so the cost of switching the MM is not all that
> noticeable.

And we don't even have to physically switch MM, we could simply
fake stuff by updating pointers in the parent MM instead of the
child so by the time we exit do_fork() we're in the child...

regards,

Rik
--
Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

http://www.surriel.com/
http://www.conectiva.com/ http://distro.conectiva.com.br/

2001-04-14 15:07:21

by Rik van Riel

[permalink] [raw]
Subject: Re: PATCH(?): linux-2.4.4-pre2: fork should run child first

On Sat, 14 Apr 2001, Linus Torvalds wrote:
> On Sat, 14 Apr 2001, Adam J. Richter wrote:
> >
> > [...]
> > >If it turns out to be beneficial to run the child first (you
> > >can measure this), why not leave everything the same as it is
> > >now but have do_fork() "switch threads" internally ?
> >
> > That is an elegant idea.
>
> I doubt it. It sounds like one of those "cool value" ideas that
> are actually really stupid except they sound cool because you
> have to think about the twists and turns.

You're right. Time to put a "don't try to think of cool ideas
after going out at night" sign on the wall ;)

cheers,

Rik
--
Linux MM bugzilla: http://linux-mm.org/bugzilla.shtml

Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

http://www.surriel.com/
http://www.conectiva.com/ http://distro.conectiva.com/

2001-04-14 16:12:05

by Adam J. Richter

[permalink] [raw]
Subject: Re: PATCH(?): linux-2.4.4-pre2: fork should run child first

>>> = Rik van Riel <[email protected]>
>> = Adam J. Richter <[email protected]>
> = Michael O'Reilly <[email protected]>

>> Rik van Riel <[email protected]> writes, regarding the idea
>> of having do_fork() give all of the parent's remaining time slice to
>> the newly created child:
>>
>>>It could upset programs which use threads to handle
>>>relatively IO poor things (like, waiting on disk IO in a
>>>thread, like glibc does to fake async file IO).
>>
>> Good point.

>Is it really? If a program is using thread to handle IO things,
>then:

>a) It's not going to create a thread for every IO! So I think
>the argument is suprious anyway.

Maybe not that often, but, from my incomplete understanding of
linux/kernel/sched.c, it looks like it can be a really long time
before the recalculate loop in schedule() gets called. Currently,
the time slice of a regular "nice 0" process in Linux is 50ms
(NICE_TO_TICKS(20) = 5, and each tick is 10ms). So, if you're
on a multiuser system or you're running some parallel algorithm
that uses a bunch of threads so that nobody has to rendezvous to
block on IO, then this could on the order of one second.

Tangential note: I think the 50ms process time slice makes
Linux boxes that have a lot of runnable threads or processes
unresponsive in ways that will not show up in most benchmarks, making
things like multi-user VNC servers much less scalable than they should
be. I wish the Linux "recalculate" loop would scale the time slices to
#cpu's/#runnable processes, such as by replacing changing the
"p->counter = ..." line in the "recalculate" loop in schedule() to
something like this:

int runnables;
...
runnables = 0;
list_for_each(tmp, &runqueue_head)
runnables++;
runnables /= smp_num_cpus;
runnables = runnables ? runnables : 1; /* prevent division by 0 */
for_each_task(p)
p->counter = (p->counter >> 1) +
(NICE_TO_TICKS(p->nice) / runnables) + 1;

(the "+ 1" at the end would ensure that the increment is never
zero, even if runnables is very high.)


Anyhow, getting back to the topic at hand...

>b) You _still_ want the child to run first. The child
>will start the I/O and block, then switching back
>to the parent. This maximises the I/O thruput without
>costing you any CPU. (Reasoning: The child running
>2nd will increase the latency which automatically
>reduces the number of ops/second you can get).

Absolutely. I never said that it would be a good idea run
the parent first in that case and Rik didn't either. Under Rik's
idea, the child would still run first, but the parent could retain
some CPU priority, so that it could get around to running again
before the next call to the "recalculate" loop in schedule(), which
might be 1 second if the system has 20 runnable runnable threads.


Adam J. Richter __ ______________ 4880 Stevens Creek Blvd, Suite 104
[email protected] \ / San Jose, California 95129-1034
+1 408 261-6630 | g g d r a s i l United States of America
fax +1 408 261-6631 "Free Software For The Rest Of Us."

2001-04-17 09:16:05

by Éric Brunet

[permalink] [raw]
Subject: Re: PATCH(?): linux-2.4.4-pre2: fork should run child first

>"Adam J. Richter" <[email protected]> said:
>
>> I suppose that running the child first also has a minor
>> advantage for clone() in that it should make programs that spawn lots
>> of threads to do little bits of work behave better on machines with a

There is another issue with this proposition. I have begun to write (free
time, slow pace) an userland sandbox which allows me to prevent a process
and its childs to perform some given actions, like removing files or
writing in some directories. This works by ptrace-ing the process,
modifying system calls and faking return values. It also needs,
obviously, to ptrace-attach childs of the sandboxed process. When the
parent in a fork runs first, the sandbox program has time to
ptrace-attach the child before it does any system call. Obviously, if the
child runs first, this is no longer the case.

If it is decided that the child should run first in a fork, there should
be a way to reliably ptrace-attach it before it can do anything.

By the way, I tried to solve this problem in my sandbox program by
masqerading any fork call into a clone system call with the flag
CLONE_PTRACE. I had hoped that the child would in this way start already
ptraced. However, this didn't work as expected. The child did start in a
ptraced state, but the owner of the trace was its parent (which issued
the fork), and not my sandbox process which was ptracing this parent. I
find that this behaviour is really weird and useless. I could simulate
the current behaviour simply by calling ptrace(TRACE_ME,..) in the child.
What is the real use of the CLONE_PTRACE flag ?

?ric Brunet

2001-04-17 14:27:27

by Jesse Pollard

[permalink] [raw]
Subject: Re: PATCH(?): linux-2.4.4-pre2: fork should run child first

Brunet <[email protected]>:
> >"Adam J. Richter" <[email protected]> said:
> >
> >> I suppose that running the child first also has a minor
> >> advantage for clone() in that it should make programs that spawn lots
> >> of threads to do little bits of work behave better on machines with a
>
> There is another issue with this proposition. I have begun to write (free
> time, slow pace) an userland sandbox which allows me to prevent a process
> and its childs to perform some given actions, like removing files or
> writing in some directories. This works by ptrace-ing the process,
> modifying system calls and faking return values. It also needs,
> obviously, to ptrace-attach childs of the sandboxed process. When the
> parent in a fork runs first, the sandbox program has time to
> ptrace-attach the child before it does any system call. Obviously, if the
> child runs first, this is no longer the case.
>
> If it is decided that the child should run first in a fork, there should
> be a way to reliably ptrace-attach it before it can do anything.
>
> By the way, I tried to solve this problem in my sandbox program by
> masqerading any fork call into a clone system call with the flag
> CLONE_PTRACE. I had hoped that the child would in this way start already
> ptraced. However, this didn't work as expected. The child did start in a
> ptraced state, but the owner of the trace was its parent (which issued
> the fork), and not my sandbox process which was ptracing this parent. I
> find that this behaviour is really weird and useless. I could simulate
> the current behaviour simply by calling ptrace(TRACE_ME,..) in the child.
> What is the real use of the CLONE_PTRACE flag ?

I believe it allows the debugger to start the process to be debugged.

-------------------------------------------------------------------------
Jesse I Pollard, II
Email: [email protected]

Any opinions expressed are solely my own.

2001-04-17 15:33:10

by Éric Brunet

[permalink] [raw]
Subject: Re: PATCH(?): linux-2.4.4-pre2: fork should run child first

In ens.mailing-lists.linux-kernel, you wrote:
>
>I believe it allows the debugger to start the process to be debugged.
>
Well, the debugger simply needs to do something like

pid_t child = fork();
if (child == 0) {
ptrace(PTRACE_TRACEME,0,0,0);
execve(the_debugged_process,args,env);
}

It is more portable, more traditionnal and works very well.

?ric Brunet