2001-04-28 11:52:34

by Peter Osterlund

[permalink] [raw]
Subject: 2.4.4 sluggish under fork load

I have noticed that 2.4.4 feels a lot less responsive than 2.4.3 under
fork load. This is caused by the "run child first after fork" patch. I
have tested on two different UP x86 systems running redhat 7.0.

For example, when running the gcc configure script, the X mouse pointer is
very jerky. The configure script itself runs approximately as fast as in
2.4.3.

Another thing is that the bash loop "while true ; do /bin/true ; done" is
not possible to interrupt with ctrl-c.

A third thing I noticed is that starting a gnome session in redhat 7.0
takes longer. (It takes more time for the gnome splash screen to appear.)

Reverting the fork patch makes all these problems go away on my machine.
I'm not saying that this is necessarily a good idea, that patch might be
good for other reasons.


--- linux-2.4.4/kernel/fork.c~ Sat Apr 28 09:46:58 2001
+++ linux-2.4.4/kernel/fork.c Sat Apr 28 11:14:33 2001
@@ -674,9 +674,16 @@
* and then exec(). This is only important in the first timeslice.
* In the long run, the scheduling behavior is unchanged.
*/
+#if 0
p->counter = current->counter;
current->counter = 0;
current->need_resched = 1;
+#else
+ p->counter = (current->counter + 1) >> 1;
+ current->counter >>= 1;
+ if (!current->counter)
+ current->need_resched = 1;
+#endif

/*
* Ok, add it to the run-queues and make it

--
Peter ?sterlund [email protected]
Sk?ndalsv?gen 35 http://home1.swipnet.se/~w-15919
S-128 66 Sk?ndal +46 8 942647
Sweden



2001-04-28 14:16:59

by J.A. Magallon

[permalink] [raw]
Subject: Re: 2.4.4 sluggish under fork load


On 04.28 Peter Osterlund wrote:
>
> Another thing is that the bash loop "while true ; do /bin/true ; done" is
> not possible to interrupt with ctrl-c.
>

Just tried that under 2.4.4 on two terminals at the same time and the system
even noticed it. Both cpus were running at about 45%user+55%sys, and was
able to use balsa to read mail (disk access) and both loops stopped
immediatley under Ctrl-C.

--
J.A. Magallon # Let the source
mailto:[email protected] # be with you, Luke...

Linux werewolf 2.4.4 #1 SMP Sat Apr 28 11:45:02 CEST 2001 i686

2001-04-28 14:26:51

by Mohammad A. Haque

[permalink] [raw]
Subject: Re: 2.4.4 sluggish under fork load

Peter Osterlund wrote:
>
> I have noticed that 2.4.4 feels a lot less responsive than 2.4.3 under
> fork load. This is caused by the "run child first after fork" patch. I
> have tested on two different UP x86 systems running redhat 7.0.
>
> For example, when running the gcc configure script, the X mouse pointer is
> very jerky. The configure script itself runs approximately as fast as in
> 2.4.3.
>
> Another thing is that the bash loop "while true ; do /bin/true ; done" is
> not possible to interrupt with ctrl-c.
>

Just as a data point, I'm experiencing this also.

> Reverting the fork patch makes all these problems go away on my machine.
> I'm not saying that this is necessarily a good idea, that patch might be
> good for other reasons.

I'll try out this patch soon.
--

=====================================================================
Mohammad A. Haque http://www.haque.net/
[email protected]

"Alcohol and calculus don't mix. Project Lead
Don't drink and derive." --Unknown http://wm.themes.org/
[email protected]
=====================================================================

2001-04-28 15:38:29

by Rene Puls

[permalink] [raw]
Subject: Re: 2.4.4 sluggish under fork load

Peter Osterlund wrote:
>
> Another thing is that the bash loop "while true ; do /bin/true ; done" is
> not possible to interrupt with ctrl-c.

Same thing here.

> A third thing I noticed is that starting a gnome session in redhat 7.0
> takes longer. (It takes more time for the gnome splash screen to
> appear.)

I had similar problems with Sawfish: Starting a program from
the root menu would take about one or two seconds under 2.4.4.

> Reverting the fork patch makes all these problems go away on my
> machine.

This patch worked for me as well.

bye,
Rene

--
Rene Puls <[email protected]> 0x8652FFE2
http://www.lionking.org/~kianga/ personal/pgp-key

2001-04-28 16:25:17

by John Kacur

[permalink] [raw]
Subject: Re: 2.4.4 sluggish under fork load

>Peter Osterlund wrote:
>>
>> Another thing is that the bash loop "while true ; do /bin/true ; done" is
>> not possible to interrupt with ctrl-c.

> Same thing here.

I'm not having any problems. Just a quick question, is everyone who is
having a problem running with more than one cpu?

John Kacur

2001-04-28 17:55:01

by Linus Torvalds

[permalink] [raw]
Subject: Re: 2.4.4 sluggish under fork load


On Sat, 28 Apr 2001, Peter Osterlund wrote:
>
> For example, when running the gcc configure script, the X mouse pointer is
> very jerky. The configure script itself runs approximately as fast as in
> 2.4.3.

Ok. Fair enough. The new "run the child first" approach has advantages,
but it is entirely possible that the advantages unfairly prioritize things
that do a lot of forking.

> Another thing is that the bash loop "while true ; do /bin/true ; done" is
> not possible to interrupt with ctrl-c.

This, however, is a bash bug, not a kernel issue. Bash does something
strange with the terminal and ignores ^C at times, and basically only
react correctly to the ^C under the right circumstances. Changing the
child to run first probably makes the pre-existing bug much easier to see.

> Reverting the fork patch makes all these problems go away on my machine.

Reverting it outright may be an acceptable approach. I'll think about
it: the arguments _for_ the patch are true and real, and it shows up as
real improvements on some things..

An alternative approach might be to not give the child the _whole_
timeslice, but give it more than half. Partition it out 66% - 33% or
something.

Linus


2001-04-28 18:02:00

by Peter Osterlund

[permalink] [raw]
Subject: Re: 2.4.4 sluggish under fork load

John Kacur <[email protected]> writes:

> >Peter Osterlund wrote:
> >>
> >> Another thing is that the bash loop "while true ; do /bin/true ; done" is
> >> not possible to interrupt with ctrl-c.
>
> > Same thing here.
>
> I'm not having any problems. Just a quick question, is everyone who is
> having a problem running with more than one cpu?

A clarification. The bash loop above doesn't cause any sluggishness on
my single cpu system. The non-working ctrl-c is probably just a bash
bug. The child process must eat some cpu time to provoke the
sluggishness, like in the following test program where the child busy
waits 100ms and then exits:

#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <sys/time.h>

int main(int argc, char* argv[])
{
double childTime = 0.10;
if (argc > 1)
childTime = atof(argv[1]);

for (;;) {
int child = fork();
if (child == -1) {
printf("fork error\n");
exit(0);
} else if (child > 0) {
while (waitpid(child, NULL, 0) != child)
;
printf("."); fflush(stdout);
} else {
struct timeval tv1, tv2;
double t;
gettimeofday(&tv1, NULL);
for (;;) {
gettimeofday(&tv2, NULL);
t = (tv2.tv_sec - tv1.tv_sec) +
(tv2.tv_usec - tv1.tv_usec) / 1000000.0;
if (t > childTime)
break;
}
_exit(0);
}
}

return 0;
}

--
Peter ?sterlund [email protected]
Sk?ndalsv?gen 35 http://home1.swipnet.se/~w-15919
S-128 66 Sk?ndal +46 8 942647
Sweden

2001-04-28 19:16:13

by Peter Osterlund

[permalink] [raw]
Subject: Re: 2.4.4 sluggish under fork load

On Sat, 28 Apr 2001, Linus Torvalds wrote:

> > Reverting the fork patch makes all these problems go away on my machine.
>
> Reverting it outright may be an acceptable approach. I'll think about
> it: the arguments _for_ the patch are true and real, and it shows up as
> real improvements on some things..

I agree with the reasoning for running the child first. Maybe the real
problem is somewhere else. I wrote two test programs to quantify the
behaviour. If I run "./fork 0.2" and "./lat 0.15" at the same time, lat
shows regular 160ms scheduling delays. (With the old fork.c the scheduling
delay is 20ms + epsilon as expected.)

Maybe some code path just forgets to reschedule?

-------- fork.c --------

#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <sys/time.h>

int main(int argc, char* argv[])
{
double childTime = atof(argv[1]);

for (;;) {
int child = fork();
if (child == -1) {
printf("fork error\n");
exit(0);
} else if (child > 0) {
while (waitpid(child, NULL, 0) != child)
;
printf("."); fflush(stdout);
} else {
struct timeval tv1, tv2;
double t;
gettimeofday(&tv1, NULL);
for (;;) {
gettimeofday(&tv2, NULL);
t = (tv2.tv_sec - tv1.tv_sec) +
(tv2.tv_usec - tv1.tv_usec) / 1000000.0;
if (t > childTime)
break;
}
_exit(0);
}
}

return 0;
}


-------- lat.c --------

#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <sys/time.h>

int main(int argc, char* argv[])
{
double tLimit = 0.03;
if (argc > 1)
tLimit = atof(argv[1]);

for (;;) {
struct timeval tv1, tv2;
double t;

gettimeofday(&tv1, NULL);
usleep(10000);
gettimeofday(&tv2, NULL);
t = (tv2.tv_sec - tv1.tv_sec) +
(tv2.tv_usec - tv1.tv_usec) / 1000000.0;
if (t > tLimit)
printf("t:%g\n", t);
}
return 0;
}

--
Peter ?sterlund [email protected]
Sk?ndalsv?gen 35 http://home1.swipnet.se/~w-15919
S-128 66 Sk?ndal +46 8 942647
Sweden


2001-04-28 20:01:11

by Harald Dunkel

[permalink] [raw]
Subject: Re: 2.4.4 sluggish under fork load

Peter Osterlund wrote:
>
> I have noticed that 2.4.4 feels a lot less responsive than 2.4.3 under
> fork load. This is caused by the "run child first after fork" patch. I
> have tested on two different UP x86 systems running redhat 7.0.
>
> For example, when running the gcc configure script, the X mouse pointer is
> very jerky. The configure script itself runs approximately as fast as in
> 2.4.3.
>

That explains why xtoolwait did not work anymore. After applying the
patch everything is OK again.


Many thanx

Harri

2001-04-29 07:14:44

by Adam J. Richter

[permalink] [raw]
Subject: Re: 2.4.4 sluggish under fork load

Peter Osterlund wrote:
> Another thing is that the bash loop "while true ; do /bin/true ; done" is
> not possible to interrupt with ctrl-c.

I have reproduced this on a uniprocessor machine and determined
that it is a bash bug. I will submit a bash bug report and sample
patch that fixes the problem (but may be incorrect in other ways), and
will cc it to linux-kernel. Look for the subject "Patch(?): bash-2.05/jobs.c
loses interrupts."

I have not yet investigated the other report of "sluggish" behavior.

Adam J. Richter __ ______________ 4880 Stevens Creek Blvd, Suite 104
[email protected] \ / San Jose, California 95129-1034
+1 408 261-6630 | g g d r a s i l United States of America
fax +1 408 261-6631 "Free Software For The Rest Of Us."

2001-04-29 08:04:25

by Adam J. Richter

[permalink] [raw]
Subject: Re: 2.4.4 sluggish under fork load

On rereading Linus's message, I see that he indicated that
"while true ; do /bin/true ; done" was known to be a bash bug, not
just a suggested possibility. Sorry for acting as if this were
a new discovery. Anyhow, I hope that at least the proposed bash
patch that I submitted may be of some use.

Adam J. Richter __ ______________ 4880 Stevens Creek Blvd, Suite 104
[email protected] \ / San Jose, California 95129-1034
+1 408 261-6630 | g g d r a s i l United States of America
fax +1 408 261-6631 "Free Software For The Rest Of Us."

2001-04-29 08:28:20

by Peter Osterlund

[permalink] [raw]
Subject: Re: 2.4.4 sluggish under fork load

On Sat, 28 Apr 2001, Linus Torvalds wrote:

> > could we leave it at half, but set the parent to SCHED_YIELD?
>
> Sounds like a good idea. Peter, how does that feel to you? I bet that I'v
> enever seen it simply because all my machines are (a) much too powerful
> for any reasonable use and (b) SMP.

That seems to work. The scheduling delays are back to 20ms and the
sluggishness feeling is gone. I wrote a simple test program to verify that
the child is still scheduled before the parent, so the performance
advantage should still be there. The only annoying thing is that it hides
the bash bug ;)

Patch below:

--- linux-2.4.4.orig/kernel/fork.c Sat Apr 28 10:17:00 2001
+++ linux-2.4.4/kernel/fork.c Sun Apr 29 10:06:42 2001
@@ -666,16 +666,18 @@
p->pdeath_signal = 0;

/*
- * Give the parent's dynamic priority entirely to the child. The
- * total amount of dynamic priorities in the system doesn't change
- * (more scheduling fairness), but the child will run first, which
- * is especially useful in avoiding a lot of copy-on-write faults
- * if the child for a fork() just wants to do a few simple things
- * and then exec(). This is only important in the first timeslice.
- * In the long run, the scheduling behavior is unchanged.
+ * "share" dynamic priority between parent and child, thus the
+ * total amount of dynamic priorities in the system doesn't change,
+ * more scheduling fairness. The parent yields to let the child run
+ * first, which is especially useful in avoiding a lot of
+ * copy-on-write faults if the child for a fork() just wants to do a
+ * few simple things and then exec(). This is only important in the
+ * first timeslice. In the long run, the scheduling behavior is
+ * unchanged.
*/
- p->counter = current->counter;
- current->counter = 0;
+ p->counter = (current->counter + 1) >> 1;
+ current->counter >>= 1;
+ current->policy |= SCHED_YIELD;
current->need_resched = 1;

/*

--
Peter ?sterlund [email protected]
Sk?ndalsv?gen 35 http://home1.swipnet.se/~w-15919
S-128 66 Sk?ndal +46 8 942647
Sweden


2001-04-30 17:54:08

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: 2.4.4 sluggish under fork load

On Sun, Apr 29, 2001 at 10:26:57AM +0200, Peter Osterlund wrote:
> On Sat, 28 Apr 2001, Linus Torvalds wrote:
>
> > > could we leave it at half, but set the parent to SCHED_YIELD?
> >
> > Sounds like a good idea. Peter, how does that feel to you? I bet that I'v
> > enever seen it simply because all my machines are (a) much too powerful
> > for any reasonable use and (b) SMP.
>
> That seems to work. The scheduling delays are back to 20ms and the
> sluggishness feeling is gone. I wrote a simple test program to verify that
> the child is still scheduled before the parent, so the performance
> advantage should still be there. The only annoying thing is that it hides
> the bash bug ;)
>
> Patch below:
>
> --- linux-2.4.4.orig/kernel/fork.c Sat Apr 28 10:17:00 2001
> +++ linux-2.4.4/kernel/fork.c Sun Apr 29 10:06:42 2001
> @@ -666,16 +666,18 @@
> p->pdeath_signal = 0;
>
> /*
> - * Give the parent's dynamic priority entirely to the child. The
> - * total amount of dynamic priorities in the system doesn't change
> - * (more scheduling fairness), but the child will run first, which
> - * is especially useful in avoiding a lot of copy-on-write faults
> - * if the child for a fork() just wants to do a few simple things
> - * and then exec(). This is only important in the first timeslice.
> - * In the long run, the scheduling behavior is unchanged.
> + * "share" dynamic priority between parent and child, thus the
> + * total amount of dynamic priorities in the system doesn't change,
> + * more scheduling fairness. The parent yields to let the child run
> + * first, which is especially useful in avoiding a lot of
> + * copy-on-write faults if the child for a fork() just wants to do a
> + * few simple things and then exec(). This is only important in the
> + * first timeslice. In the long run, the scheduling behavior is
> + * unchanged.
> */
> - p->counter = current->counter;
> - current->counter = 0;
> + p->counter = (current->counter + 1) >> 1;
> + current->counter >>= 1;
> + current->policy |= SCHED_YIELD;
> current->need_resched = 1;
>
> /*

please try to reproduce the bad behaviour with 2.4.4aa2. There's a bug
in the parent-timeslice patch in 2.4 that I fixed while backporting it
to 2.2aa and that I now forward ported the fix to 2.4aa. The fact 2.4.4
gives the whole timeslice to the child just gives more light to such
bug. Unfortunately the fix doesn't apply cleanly to 2.4.4 (it's
incremental with the numa-scheduler patch) and I need to finish a few
more things before I can backport it myself.

Andrea

2001-04-30 21:51:12

by Peter Osterlund

[permalink] [raw]
Subject: Re: 2.4.4 sluggish under fork load

On Mon, 30 Apr 2001, Andrea Arcangeli wrote:

> please try to reproduce the bad behaviour with 2.4.4aa2. There's a bug
> in the parent-timeslice patch in 2.4 that I fixed while backporting it
> to 2.2aa and that I now forward ported the fix to 2.4aa. The fact
> 2.4.4 gives the whole timeslice to the child just gives more light to
> such bug. Unfortunately the fix doesn't apply cleanly to 2.4.4 (it's
> incremental with the numa-scheduler patch) and I need to finish a few
> more things before I can backport it myself.

I applied the 10_parent-timeslice-5 patch to 2.4.4 and tested. (If I
understood correctly, the idea of that patch is to give the remaining
child time-slice back to the parent when the child exits, but only if
there have been no time-slice recalculation since the child was created.)

It is somewhat better than plain 2.4.4, but not much. I still see
scheduling delays in the range 30-120ms when running "./fork 0.4". (fork
is a program that starts a child, the child busy waits some time (0.4s)
and then exits. The parent then immediately respawns another child, etc.
See one of my previous messages.)

--
Peter ?sterlund [email protected]
Sk?ndalsv?gen 35 http://home1.swipnet.se/~w-15919
S-128 66 Sk?ndal +46 8 942647
Sweden


2001-05-01 02:39:07

by Rik van Riel

[permalink] [raw]
Subject: Re: 2.4.4 sluggish under fork load

On Mon, 30 Apr 2001, Andrea Arcangeli wrote:
> On Sun, Apr 29, 2001 at 10:26:57AM +0200, Peter Osterlund wrote:

> > - p->counter = current->counter;
> > - current->counter = 0;
> > + p->counter = (current->counter + 1) >> 1;
> > + current->counter >>= 1;
> > + current->policy |= SCHED_YIELD;
> > current->need_resched = 1;
>
> please try to reproduce the bad behaviour with 2.4.4aa2. There's a bug
> in the parent-timeslice patch in 2.4 that I fixed while backporting it
> to 2.2aa and that I now forward ported the fix to 2.4aa. The fact
> 2.4.4 gives the whole timeslice to the child just gives more light to
> such bug.

The fact that 2.4.4 gives the whole timeslice to the child
is just bogus to begin with.

The problem people tried to solve was "make sure the kernel
runs the child first after a fork", this has just about
NOTHING to do with how the timeslice is distributed.

Now, since we are in a supposedly stable branch of the kernel,
why mess with the timeslice distribution between parent and
child? The timeslice distribution that has worked very well
for the last YEARS...

I agree when people want to fix problems, but I really don't
think 2.4 is the time to also "fix" non-problems.

regards,

Rik
--
Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

http://www.surriel.com/ http://distro.conectiva.com/

Send all your spam to [email protected] (spam digging piggy)