LinuxLists.cc - CFS Scheduler and real-time tasks

2007-05-17 19:14:22

Subject: CFS Scheduler and real-time tasks

Hi,

I have tested your new CFS scheduler (v2.6.21.1-v12) on a Dell
Inspiron 6400 (Intel Core Duo @ 1.66 Ghz, SMP).

I tested with the bash-for-loop, and the massiv_intr.c workload you
suggested here http://lkml.org/lkml/2007/5/14/227 , then I tested with
a modified version of massiv_intr.c.
My modifications were: add a new task, that modifies work_msecs after
each iteration, and also increases sleep_time; and run this task with
realtime static priority 1. The purpose isn't to test how much 'time'
the rt process gets (it should get all the CPU while it runs, right?),
but to test if a RT priority task can 'unbalance' non-rt tasks.
[diff can be found at end of my mail, it should be applied to
massive_intr.c from http://lkml.org/lkml/2007/3/26/319]

In short, my results are: CFS is much more precise than standard
scheduler (on my system, with my workload), but:
- the precision of CFS on my system is worse than on your system
- the precision of CFS decreases if real-time tasks are run

Ingo, you have obtained much more accurate results (massive_intr
output max error is 1; vs. my results, where error is: 20 - 40).
Is there anything I can change in my kernel to increase precision of CFS?
[I have tried running test with both ondemand and performance
governors, same results]

Is there something I am doing wrong, or is the precision decrease
expected, when running real-time tasks too?

I have run the test using 'Linux localhost 2.6.21-gentoo', gentoo 2007.0.

Results of bash-for-loop:
7730 edwin 20 0 2888 552 256 R 20 0.1 0:05.69 bash
7731 edwin 20 0 2888 552 256 R 19 0.1 0:05.68 bash
7732 edwin 20 0 2888 552 256 R 19 0.1 0:05.69 bash
7728 edwin 20 0 2888 552 256 R 19 0.1 0:05.70 bash
7729 edwin 20 0 2888 552 256 R 19 0.1 0:05.67 bash
7733 edwin 20 0 2888 552 256 R 19 0.1 0:05.58 bash
7734 edwin 20 0 2888 552 256 R 19 0.1 0:05.57 bash
7735 edwin 20 0 2888 552 256 R 19 0.1 0:05.58 bash
7736 edwin 20 0 2888 552 256 R 19 0.1 0:05.58 bash
7737 edwin 20 0 2888 552 256 R 19 0.1 0:05.58 bash

Very precise indeed, the default scheduler produces lots of different
timings there.

Running original massiv_intr.c, I can see that the maximum error is ~20.
[gcc massive_intr_orig.c -o mo -lrt]

edwin@localhost ~ $ ./mo 9 10
025633 00000182
025630 00000182
025627 00000185
025628 00000187
025634 00000194
025632 00000181
025626 00000195
025629 00000181
025631 00000194
edwin@localhost ~ $ ./mo 9 10
025840 00000175
025845 00000175
025842 00000172
025848 00000174
025841 00000205
025847 00000207
025846 00000208
025844 00000173
025843 00000207
edwin@localhost ~ $ ./mo 9 10
026318 00000216
026319 00000212
026324 00000221
026322 00000218
026320 00000213
026321 00000213
026323 00000212
026317 00000194
026316 00000220
edwin@localhost ~ $ ./mo 9 10
026646 00000230
026645 00000219
026653 00000227
026651 00000233
026650 00000228
026647 00000228
026648 00000217
026652 00000208
026649 00000230

Running my version of massiv_intr.c:
[gcc massive_intr_new.c -o massive_new -lrt]

./massiv_new 9 10
027216 00000205
027218 00000202
027220 00000227
027217 00000205
027219 00000228
027222 00000226
027215 00000227
027221 00000206

RT JOB:027223 00000061

edwin@localhost ~ $ ./massiv_new 9 10
027547 00000196
027542 00000205
027546 00000223
027544 00000198
027541 00000224
027548 00000218
027545 00000226
027543 00000202

RT JOB:027549 00000059

Everything fine, now I run it with root privileges, so that the
real-time priority task gets to run with RT priority:
localhost edwin # ./massiv_new 9 10
027957 00000158
027954 00000172
027956 00000163
027955 00000146
027958 00000140
027960 00000184
027959 00000181
027961 00000157

RT JOB:027962 00000187

localhost edwin # ./massiv_new 9 10
028333 00000177
028335 00000159
028332 00000185
028337 00000136
028336 00000156
028334 00000145
028338 00000175
028339 00000162

RT JOB:028340 00000183

The error is ~40, which is double the error when running with no RT task.

Best regards,
Edwin

--- massiv_orig.c 2007-05-17 21:20:21.000000000 +0000
+++ massive_intr.c 2007-05-17 21:39:31.000000000 +0000
@@ -43,6 +43,7 @@
#include <sys/types.h>
#include <sys/stat.h>
#include <sys/mman.h>
+#include <sched.h>
#include <sys/wait.h>
#include <fcntl.h>
#include <unistd.h>
@@ -56,6 +57,8 @@

#define WORK_MSECS 8
#define SLEEP_MSECS 1
+#define RT_MSECS_INCREMENT 1
+#define RT_MSECS_MAX 50

#define MAX_PROC 1024
#define SAMPLE_COUNT 1000000000
@@ -113,6 +116,74 @@
after = tv[1].tv_sec*USECS_PER_SEC+tv[1].tv_usec;
return SAMPLE_COUNT/(after - before)*USECS_PER_MSEC;
}
+static void *test_job_rt(void *arg)
+{
+ int l = (int)arg;
+ int count = 0;
+ time_t current;
+ sigset_t sigset;
+ struct sigaction sa;
+ struct timespec ts0 = { 0, NSECS_PER_MSEC*SLEEP_MSECS};
+ struct timespec ts1 = { 0, NSECS_PER_MSEC*RT_MSECS_MAX};
+ struct timespec ts_delta = { 0, NSECS_PER_MSEC*RT_MSECS_INCREMENT};
+ struct timespec ts = ts0;
+ struct sched_param sp;
+ int work_msecs = WORK_MSECS;
+
+ sa.sa_handler = sighandler;
+ if (sigemptyset(&sa.sa_mask) < 0) {
+ warn("sigemptyset() failed");
+ abnormal_exit();
+ }
+ sa.sa_flags = 0;
+ if (sigaction(SIGUSR1, &sa, NULL) < 0) {
+ warn("sigaction() failed");
+ abnormal_exit();
+ }
+ if (sigemptyset(&sigset) < 0) {
+ warn("sigfillset() failed");
+ abnormal_exit();
+ }
+ sigsuspend(&sigset);
+ if (errno != EINTR) {
+ warn("sigsuspend() failed");
+ abnormal_exit();
+ }
+ sp.sched_priority = 1;
+
+ sched_setscheduler(0, SCHED_FIFO, &sp);
+ /* main loop */
+ do {
+ loopfnc(work_msecs*l);
+ if (nanosleep(&ts, NULL) < 0) {
+ warn("nanosleep() failed");
+ abnormal_exit();
+ }
+ count++;
+ if (time(&current) == -1) {
+ warn("time() failed");
+ abnormal_exit();
+ }
+ ts.tv_nsec += ts_delta.tv_nsec;
+ if(ts.tv_nsec > ts1.tv_nsec)
+ ts = ts0;
+ work_msecs += RT_MSECS_INCREMENT;
+ if(work_msecs > RT_MSECS_MAX)
+ work_msecs = WORK_MSECS;
+ } while (difftime(current, *first) < runtime);
+ sp.sched_priority = 0;
+ sched_setscheduler(0, SCHED_OTHER, &sp);
+ if (sem_wait(printsem) < 0) {
+ warn("sem_wait() failed");
+ abnormal_exit();
+ }
+ printf("\nRT JOB:%06d\t%08d\n\n", getpid(), count);
+ if (sem_post(printsem) < 0) {
+ warn("sem_post() failed");
+ abnormal_exit();
+ }
+ exit(EXIT_SUCCESS);
+}
static void *test_job(void *arg)
{
int l = (int)arg;
@@ -237,8 +308,12 @@
warn("kill() failed");
goto err_sem;
}
- if (pid[i] == 0)
- test_job((void *)c);
+ if (pid[i] == 0) {
+ if(i == nproc-1)
+ test_job_rt((void *)c);
+ else
+ test_job((void*)c);
+ }
}
if (sigemptyset(&sigset) < 0) {
warn("sigemptyset() failed");

2007-05-17 19:24:01

by Ingo Molnar

[permalink] [raw]

Subject: Re: CFS Scheduler and real-time tasks

* T?r?k Edvin <[email protected]> wrote:

> Is there something I am doing wrong, or is the precision decrease
> expected, when running real-time tasks too?

yeah: real-time tasks are ueber-tasks that just get all CPU time
immediately. I guess we could make SCHED_RR be covered by CFS accounting
eventually.

> Very precise indeed, the default scheduler produces lots of different
> timings there.

note: you ran it on a dual-core system and the SMP load-balancer needs
time to gain precision in the non-RT test - so i'd suggest to test it
much longer than 10 seconds: 300 seconds at least.

Ingo

2007-05-17 20:10:18

by Török Edwin

[permalink] [raw]

Subject: Re: CFS Scheduler and real-time tasks

On 5/17/07, Ingo Molnar <[email protected]> wrote:
>
> note: you ran it on a dual-core system and the SMP load-balancer needs
> time to gain precision in the non-RT test - so i'd suggest to test it
> much longer than 10 seconds: 300 seconds at least.

Thanks, running for 300 seconds shows only a 2% error.
Even running a I/O workload while running the massive_intr test keeps
the same error.
BTW, the system is more 'responsive'. For example starting
applications while running the tests, switching windows, moving
windows is much "smoother" than on default scheduler.

[while true;do cp /usr/portage/distfiles/linux-2.6.21.tar.bz2 . &&
sync && rm linux-2.6.21.tar.bz2 && sync; done]

./massiv_new 9 300
020860 00017782
020858 00017774

RT JOB:020865 00003785

020859 00017048
020862 00017387
020855 00017494
020861 00017245
020863 00017492
020864 00017122

Best regards,
Edwin

2007-05-17 20:13:54

by Ingo Molnar

[permalink] [raw]

Subject: Re: CFS Scheduler and real-time tasks

* T?r?k Edvin <[email protected]> wrote:

> On 5/17/07, Ingo Molnar <[email protected]> wrote:
> >
> >note: you ran it on a dual-core system and the SMP load-balancer needs
> >time to gain precision in the non-RT test - so i'd suggest to test it
> >much longer than 10 seconds: 300 seconds at least.
>
> Thanks, running for 300 seconds shows only a 2% error. Even running a
> I/O workload while running the massive_intr test keeps the same error.

hey, cool! Also try -v13 - it should be even more tighter.

> BTW, the system is more 'responsive'. For example starting
> applications while running the tests, switching windows, moving
> windows is much "smoother" than on default scheduler.

yeah :)

> [while true;do cp /usr/portage/distfiles/linux-2.6.21.tar.bz2 . &&
> sync && rm linux-2.6.21.tar.bz2 && sync; done]
>
> ./massiv_new 9 300
> 020860 00017782
> 020858 00017774
>
> RT JOB:020865 00003785
>
> 020859 00017048
> 020862 00017387
> 020855 00017494
> 020861 00017245
> 020863 00017492
> 020864 00017122

looks pretty good!

Ingo

2007-05-18 21:07:36

by Török Edwin

[permalink] [raw]

Subject: Re: CFS Scheduler and real-time tasks

On 5/17/07, Ingo Molnar <[email protected]> wrote:
> hey, cool! Also try -v13 - it should be even more tighter.

I tried -v13. However the scheduling "error" is now 10% (vs 2% with -v12).

I also noticed strange behaviour with CPU hotplug. I offlined cpu1
(echo 0 >/sys/devices/system/cpu/cpu1/online), and the typing speed on
my terminal decreased noticably. I could hardly type something there.
The system was otherwise idle (running in Xorg, and gnome-terminal).
As soon as I started massive_intr.c workload, the keyboard typing
speed became normal again. If I have a workload running, while I
offline a CPU the keyboard is fine after cpu1 is offlined.
Also the entire system froze once, when I offlined cpu1. I couldn't
reproduce this.

I don't know how -v12 behaved wrt to CPU hotplug.

> > ./massiv_new 9 300
> > 020860 00017782
> > 020858 00017774
> >
> > RT JOB:020865 00003785
> >
> > 020859 00017048
> > 020862 00017387
> > 020855 00017494
> > 020861 00017245
> > 020863 00017492
> > 020864 00017122

Results with -v13:

edwin@localhost ~ $ ./mo 9 300
006203 00006071
006198 00005712
006201 00006088
006196 00006208
006200 00006166
006195 00006306
006199 00006403
006197 00006288
006202 00006237

./massiv_new 9 300
013803 00010073
013801 00009889
013804 00010303
013806 00009953
013800 00010010
013802 00010171
013807 00009956
013805 00010037

RT JOB:013808 00002230

Interesting, massiv_new has better precision than massiv_orig.

Just in case it helps, here's me /proc/sched_debug (when system is
idle, and later, when running the test):

Sched Debug Version: v0.02
now at 1187357967816 nsecs

cpu: 0
.nr_running : 0
.raw_weighted_load : 0
.nr_switches : 555712
.nr_load_updates : 221754
.nr_uninterruptible : 4294967190
.jiffies : 221839
.next_balance : 221835
.curr->pid : 0
.clock : 755701854416
.prev_clock_raw : 871997978419
.clock_warps : 0
.clock_unstable_events : 97254
.clock_max_delta : 4749235
.fair_clock : 184031607224
.prev_fair_clock : 0
.exec_clock : 749010316890
.prev_exec_clock : 0
.wait_runtime : 59753136705
.wait_runtime_overruns : 2689
.wait_runtime_underruns: 1141
.cpu_load[0] : 0
.cpu_load[1] : 0
.cpu_load[2] : 0
.cpu_load[3] : 0
.cpu_load[4] : 0
.wait_runtime_rq_sum : 0

runnable tasks:
task PID tree-key delta waiting
switches prio sum-exec sum-wait sum-sleep
wait-overrun wait-underrun
------------------------------------------------------------------------------------------------------------------------------------------------------------------

cpu: 1
.nr_running : 1
.raw_weighted_load : 1024
.nr_switches : 378755
.nr_load_updates : 202398
.nr_uninterruptible : 106
.jiffies : 221839
.next_balance : 221830
.curr->pid : 20421
.clock : 794219468967
.prev_clock_raw : 874002443396
.clock_warps : 0
.clock_unstable_events : 116368
.clock_max_delta : 4531475
.fair_clock : 175306014212
.prev_fair_clock : 0
.exec_clock : 718501418488
.prev_exec_clock : 0
.wait_runtime : -59753140134
.wait_runtime_overruns : 1245
.wait_runtime_underruns: 1604
.cpu_load[0] : 0
.cpu_load[1] : 0
.cpu_load[2] : 0
.cpu_load[3] : 0
.cpu_load[4] : 0
.wait_runtime_rq_sum : -34602

runnable tasks:
task PID tree-key delta waiting
switches prio sum-exec sum-wait sum-sleep
wait-overrun wait-underrun
------------------------------------------------------------------------------------------------------------------------------------------------------------------
R cat 20421 175306048814 34602 -34602
2 120 69205 -34602 853
0 0

And sched_debug output while running the test:

Sched Debug Version: v0.02
now at 1261993795754 nsecs

cpu: 0
.nr_running : 5
.raw_weighted_load : 5120
.nr_switches : 588021
.nr_load_updates : 232164
.nr_uninterruptible : 4294967190
.jiffies : 240498
.next_balance : 240508
.curr->pid : 21096
.clock : 770190363479
.prev_clock_raw : 895084968848
.clock_warps : 0
.clock_unstable_events : 125214
.clock_max_delta : 0
.fair_clock : 190695995910
.prev_fair_clock : 0
.exec_clock : 763286081256
.prev_exec_clock : 0
.wait_runtime : 68703429598
.wait_runtime_overruns : 2978
.wait_runtime_underruns: 1273
.cpu_load[0] : 5120
.cpu_load[1] : 5119
.cpu_load[2] : 5110
.cpu_load[3] : 5077
.cpu_load[4] : 5037
.wait_runtime_rq_sum : -54771343

runnable tasks:
task PID tree-key delta waiting
switches prio sum-exec sum-wait sum-sleep
wait-overrun wait-underrun
------------------------------------------------------------------------------------------------------------------------------------------------------------------
R mo 21096 190703231293 7235383 -7235383
445 120 1813432308 -8250501 211806979
0 1
mo 21093 190704201003 8205093 -8472612
450 120 1818737228 -12394371 206140729
0 6
mo 21094 190704732208 8736298 -9665761
464 120 1816895620 -12925451 204333757
0 4
mo 21092 190704971831 8975921 -15574122
456 120 1817099155 -16680247 208648471
0 2
mo 21091 190705631487 9635577 -13823465
457 120 1816983212 -16160000 204693203
0 4

cpu: 1
.nr_running : 5
.raw_weighted_load : 5120
.nr_switches : 415103
.nr_load_updates : 211674
.nr_uninterruptible : 106
.jiffies : 240498
.next_balance : 240516
.curr->pid : 21291
.clock : 806335506481
.prev_clock_raw : 897090740058
.clock_warps : 0
.clock_unstable_events : 150133
.clock_max_delta : 0
.fair_clock : 179777580248
.prev_fair_clock : 0
.exec_clock : 730473748729
.prev_exec_clock : 0
.wait_runtime : -68769955037
.wait_runtime_overruns : 1250
.wait_runtime_underruns: 1885
.cpu_load[0] : 4096
.cpu_load[1] : 4293
.cpu_load[2] : 5044
.cpu_load[3] : 7236
.cpu_load[4] : 8445
.wait_runtime_rq_sum : -11785269

runnable tasks:
task PID tree-key delta waiting
switches prio sum-exec sum-wait sum-sleep
wait-overrun wait-underrun
------------------------------------------------------------------------------------------------------------------------------------------------------------------
R cat 21291 179777631414 51166 -51166
2 120 66773 -51166 53218
0 0
mo 21095 179778195659 615411 -799325
737 120 2088825958 -799325 201402067
0 0
mo 21090 179778267640 687392 -711781
760 120 2094378519 -711781 208607795
0 0
mo 21097 179781627293 4047045 -5262328
789 120 2090034861 -5262328 202105919
0 0
mo 21089 179782321359 4741111 -4960669
757 120 2094677840 -4960669 202830299
0 0

2007-06-16 07:21:19

by Török Edwin

[permalink] [raw]

Subject: Re: CFS Scheduler and real-time tasks

On 5/19/07, T?r?k Edvin <[email protected]> wrote:
> I tried -v13. However the scheduling "error" is now 10% (vs 2% with -v12).
>
> I also noticed strange behaviour with CPU hotplug. I offlined cpu1
> (echo 0 >/sys/devices/system/cpu/cpu1/online), and the typing speed on
> my terminal decreased noticably. I could hardly type something there.

These both have been fixed in cfs-v17. Smp load balancing is accurate,
and I noticed no problems with CPU hotplug.

Thanks,
Edwin