2005-09-18 11:37:59

by Bernardo Innocenti

[permalink] [raw]
Subject: RFA: Changing scheduler quantum (Was: REQUEST: OpenLDAP 2.3.7)

Arjan van de Ven wrote:

> On Sun, Sep 18, 2005 at 04:27:38AM +0200, Bernardo Innocenti wrote:
>
>>It's more meaningful to interpret sched_yield() as "give up the processor,
>>as if the scheduler quantum had expired".
>
> afaik this is *exactly* what the new sched_yield() does ;)

Oops :-)


>>The scheduler wouldn't normally allow a lower priority process to
>>preempt a high-priority ready process for 30+ ms. Unless I'm
>>mistaken about Linux's scheduling policy...
>
> if your quantum is up... all other tasks get theirs of course

I assumed dynamic priorities affected the length of the
quantum, but maybe it just changes the number of times
the process is scheduled wrt other processes, with the
quantum being fixed at 20-30ms.

(...a few seconds later...)

Skimming through sched.c, it seems my first guess was
right: the quantum varies with the priority from 5ms
to 800ms.

The DEF_TIMESLICE of 400ms looks a bit too gross for
most applications and the maximum 800ms is just
ridicolously high.

IIRC, the 7.14MHz 68000 in the Amiga 500 did task-switching
at 20ms intervals, with a negligible performance hit.
Couldn't do much better on today's CPUs?

--
// Bernardo Innocenti - Develer S.r.l., R&D dept.
\X/ http://www.develer.com/


2005-09-18 11:45:17

by Con Kolivas

[permalink] [raw]
Subject: Re: RFA: Changing scheduler quantum (Was: REQUEST: OpenLDAP 2.3.7)

On Sun, 18 Sep 2005 21:37, Bernardo Innocenti wrote:
> Arjan van de Ven wrote:
> > On Sun, Sep 18, 2005 at 04:27:38AM +0200, Bernardo Innocenti wrote:
> >>It's more meaningful to interpret sched_yield() as "give up the
> >> processor, as if the scheduler quantum had expired".
> >
> > afaik this is *exactly* what the new sched_yield() does ;)
>
> Oops :-)
>
> >>The scheduler wouldn't normally allow a lower priority process to
> >>preempt a high-priority ready process for 30+ ms. Unless I'm
> >>mistaken about Linux's scheduling policy...
> >
> > if your quantum is up... all other tasks get theirs of course
>
> I assumed dynamic priorities affected the length of the
> quantum, but maybe it just changes the number of times
> the process is scheduled wrt other processes, with the
> quantum being fixed at 20-30ms.
>
> (...a few seconds later...)
>
> Skimming through sched.c, it seems my first guess was
> right: the quantum varies with the priority from 5ms
> to 800ms.
>
> The DEF_TIMESLICE of 400ms looks a bit too gross for
> most applications and the maximum 800ms is just
> ridicolously high.
>
> IIRC, the 7.14MHz 68000 in the Amiga 500 did task-switching
> at 20ms intervals, with a negligible performance hit.
> Couldn't do much better on today's CPUs?

Not quite.

The default timeslice of nice 0 tasks is 100ms. The timeslice is not altered
the way you have read sched.c. It is altered thus:
1. For 'nice' levels it varies from 5ms at nice 19 to 800ms at nice -20.
2. For interactive tasks, it is cut up into smaller pieces down to 10ms and
round robins with other tasks at the same dynamic priority, but still is
based on the nice levels for the full length of cpu time before expiration
overall.

Cheers,
Con

2005-09-18 21:53:42

by Bernardo Innocenti

[permalink] [raw]
Subject: Re: RFA: Changing scheduler quantum (Was: REQUEST: OpenLDAP 2.3.7)

Con Kolivas wrote:
> On Sun, 18 Sep 2005 21:37, Bernardo Innocenti wrote:

>>The DEF_TIMESLICE of 400ms looks a bit too gross for
>>most applications and the maximum 800ms is just
>>ridicolously high.
>
> Not quite.
>
> The default timeslice of nice 0 tasks is 100ms. The timeslice is not altered
> the way you have read sched.c. It is altered thus:
> 1. For 'nice' levels it varies from 5ms at nice 19 to 800ms at nice -20.
> 2. For interactive tasks, it is cut up into smaller pieces down to 10ms and
> round robins with other tasks at the same dynamic priority, but still is
> based on the nice levels for the full length of cpu time before expiration
> overall.

I see. Then there must be something else to explain
the behavior I'm observing with slapd.

Each and every call to sched_yield() makes the process
sleep for over *50ms* while a "nice make bootstrap" is
running in the background:

[pid 8780] 0.000033 stat64("gidNumber.dbb", 0xb7b3ebcc) = -1 EACCES (Permission denied)
[pid 8780] 0.000059 pread(20, "\0\0\0\0\1\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0\2\0\344\17\2\3"..., 4096, 4096) = 4096
[pid 8780] 0.000083 pread(20, "\0\0\0\0\1\0\0\0\4\0\0\0\3\0\0\0\0\0\0\0\222\0<\7\1\5\370"..., 4096, 16384) = 4096
[pid 8780] 0.000078 time(NULL) = 1124322520
[pid 8780] 0.000066 pread(11, "\0\0\0\0\1\0\0\0\250\0\0\0\231\0\0\0\235\0\0\0\16\0000"..., 4096, 688128) = 4096
[pid 8780] 0.000241 write(19, "0e\2\1\3d`\4$cn=bernie,ou=group,dc=d"..., 103) = 103
[pid 8780] 0.000137 sched_yield( <unfinished ...>
...zzzz...
[pid 8781] 0.050020 <... sched_yield resumed> ) = 0
[pid 8780] 0.000025 <... sched_yield resumed> ) = 0
[pid 8781] 0.000060 futex(0x925ab20, FUTEX_WAIT, 33, NULL <unfinished ...>
[pid 8780] 0.000026 write(19, "0\f\2\1\3e\7\n\1\0\4\0\4\0", 14) = 14
[pid 8774] 0.000774 <... select resumed> ) = 1 (in [19])


Actually, I'm now noticing that several slapd threads were
involved here. Depending how strace handles relative
timestamps of multiple processes, it may mean both 8780 and
8781 slept too much or just 8781 did and 8780 was quick.

Any idea? I'm planning to patch my kernel to print the
time_slice value in /proc/*/stat. This way I can check
it's being computed as intended for both slapd and gcc.

--
// Bernardo Innocenti - Develer S.r.l., R&D dept.
\X/ http://www.develer.com/

2005-09-19 00:47:29

by Con Kolivas

[permalink] [raw]
Subject: Re: RFA: Changing scheduler quantum (Was: REQUEST: OpenLDAP 2.3.7)

On Mon, 19 Sep 2005 07:53, Bernardo Innocenti wrote:
> Con Kolivas wrote:
> > On Sun, 18 Sep 2005 21:37, Bernardo Innocenti wrote:
> >>The DEF_TIMESLICE of 400ms looks a bit too gross for
> >>most applications and the maximum 800ms is just
> >>ridicolously high.
> >
> > Not quite.
> >
> > The default timeslice of nice 0 tasks is 100ms. The timeslice is not
> > altered the way you have read sched.c. It is altered thus:
> > 1. For 'nice' levels it varies from 5ms at nice 19 to 800ms at nice -20.
> > 2. For interactive tasks, it is cut up into smaller pieces down to 10ms
> > and round robins with other tasks at the same dynamic priority, but still
> > is based on the nice levels for the full length of cpu time before
> > expiration overall.

Please do not cc mailing lists that reply with the "your email is awaiting
moderator approval" to lkml.

> I see. Then there must be something else to explain
> the behavior I'm observing with slapd.
>
> Each and every call to sched_yield() makes the process
> sleep for over *50ms* while a "nice make bootstrap" is
> running in the background:

Why this preoccupation with how long sched_yield takes? We've already
established that it takes a variable unpredictable (yet long) time for
SCHED_NORMAL tasks. No, cancel that question or we'll start having people
tell us what the kernel should do all over again.

You're almost certainly seeing the effect of fork during 'make bootstrap' and
multiple tasks are running prior to expiration on the active runqueue.
SCHED_NORMAL tasks that have done sched_yield will yield till nothing is left
wanting cpu time on the active runqueue.

> Actually, I'm now noticing that several slapd threads were
> involved here. Depending how strace handles relative
> timestamps of multiple processes, it may mean both 8780 and
> 8781 slept too much or just 8781 did and 8780 was quick.
>
> Any idea? I'm planning to patch my kernel to print the
> time_slice value in /proc/*/stat. This way I can check
> it's being computed as intended for both slapd and gcc.

Feel free to do as much checking on kernel code as you like.

Cheers,
Con