2002-01-05 00:56:05

by Mikael Pettersson

[permalink] [raw]
Subject: 2.5.2-pre performance degradation on an old 486

When running 2.5.2-pre7 on my old for-testing-only 486(*),
file-system accesses seem to come in distinct bursts preceded
by lengthy pauses. Overall performance is down quite significantly
compared to 2.4.18pre1 and 2.2.20pre2. To measure it I ran two
simple tests:

Test 1: time to boot the kernel, from hitting enter at the LILO
prompt to getting a login prompt
Test 2: time to "rm -rf" a clean linux-2.4.17 source tree, using
the newly booted kernel (no other access to the tree before that,
so it wasn't cached in any way, and the machine was otherwise idle)

Test 1 Test 2
2.2.21pre2: 71 sec 75 sec
2.4.18pre1: 64 sec 72 sec
2.5.2-pre7: 97 sec 251 sec

I haven't noticed any slowdowns on my other boxes, so I didn't
do any measurements on them. On the 486 it's very very obvious.

/Mikael

(*) 100MHz 486DX4, 28MB ram, no L2 cache, two old and slow IDE disks,
small custom no-nonsense RedHat 7.2, kernels compiled with gcc 2.95.3.


2002-01-05 08:32:56

by Matthias Hanisch

[permalink] [raw]
Subject: Re: 2.5.2-pre performance degradation on an old 486

On Sat, 5 Jan 2002, Mikael Pettersson wrote:

> When running 2.5.2-pre7 on my old for-testing-only 486(*),
> file-system accesses seem to come in distinct bursts preceded
> by lengthy pauses. Overall performance is down quite significantly
> compared to 2.4.18pre1 and 2.2.20pre2. To measure it I ran two
> simple tests:
>
> Test 1: time to boot the kernel, from hitting enter at the LILO
> prompt to getting a login prompt
> Test 2: time to "rm -rf" a clean linux-2.4.17 source tree, using
> the newly booted kernel (no other access to the tree before that,
> so it wasn't cached in any way, and the machine was otherwise idle)
>
> Test 1 Test 2
> 2.2.21pre2: 71 sec 75 sec
> 2.4.18pre1: 64 sec 72 sec
> 2.5.2-pre7: 97 sec 251 sec
>
> I haven't noticed any slowdowns on my other boxes, so I didn't
> do any measurements on them. On the 486 it's very very obvious.

This is exactly, what I see with my old 486 box. It started with
2.5.2-pre3, which contained two major items:

- bio changes from Jens
- scheduler changes from Davide

Surprisingly, backing out the bio changes didn't help. Backing out the
scheduler changes from Davide did!!

Maybe the problem lies somewhere in between, because it is often I/O
related, e.g. first call of ldconfig is horrible slow, as is e2fsck.

But I also see system hiccups from time to time, where console switching
does not work for 1 second on an idle box.


> (*) 100MHz 486DX4, 28MB ram, no L2 cache, two old and slow IDE disks,
> small custom no-nonsense RedHat 7.2, kernels compiled with gcc 2.95.3.

Is this ISA (maybe it has something to do with ISA bouncing)? Mine is:

486 DX/2 ISA, Adaptec 1542, two slow scsi disks and a self-made
slackware-based system.

Can you also backout the scheduler changes to verify this? I have a
backout patch for 2.5.2-pre6, if you don't want to do this for yourself.

Regards,
Matze (trying 2.5.2-pre8 now)



2002-01-05 23:05:48

by Davide Libenzi

[permalink] [raw]
Subject: Re: 2.5.2-pre performance degradation on an old 486

On Sat, 5 Jan 2002, Matthias Hanisch wrote:

> On Sat, 5 Jan 2002, Mikael Pettersson wrote:
>
> > When running 2.5.2-pre7 on my old for-testing-only 486(*),
> > file-system accesses seem to come in distinct bursts preceded
> > by lengthy pauses. Overall performance is down quite significantly
> > compared to 2.4.18pre1 and 2.2.20pre2. To measure it I ran two
> > simple tests:
> >
> > Test 1: time to boot the kernel, from hitting enter at the LILO
> > prompt to getting a login prompt
> > Test 2: time to "rm -rf" a clean linux-2.4.17 source tree, using
> > the newly booted kernel (no other access to the tree before that,
> > so it wasn't cached in any way, and the machine was otherwise idle)
> >
> > Test 1 Test 2
> > 2.2.21pre2: 71 sec 75 sec
> > 2.4.18pre1: 64 sec 72 sec
> > 2.5.2-pre7: 97 sec 251 sec
> >
> > I haven't noticed any slowdowns on my other boxes, so I didn't
> > do any measurements on them. On the 486 it's very very obvious.
>
> This is exactly, what I see with my old 486 box. It started with
> 2.5.2-pre3, which contained two major items:
>
> - bio changes from Jens
> - scheduler changes from Davide
>
> Surprisingly, backing out the bio changes didn't help. Backing out the
> scheduler changes from Davide did!!
>
> Maybe the problem lies somewhere in between, because it is often I/O
> related, e.g. first call of ldconfig is horrible slow, as is e2fsck.
>
> But I also see system hiccups from time to time, where console switching
> does not work for 1 second on an idle box.
>
>
> > (*) 100MHz 486DX4, 28MB ram, no L2 cache, two old and slow IDE disks,
> > small custom no-nonsense RedHat 7.2, kernels compiled with gcc 2.95.3.
>
> Is this ISA (maybe it has something to do with ISA bouncing)? Mine is:
>
> 486 DX/2 ISA, Adaptec 1542, two slow scsi disks and a self-made
> slackware-based system.
>
> Can you also backout the scheduler changes to verify this? I have a
> backout patch for 2.5.2-pre6, if you don't want to do this for yourself.

There should be some part of the kernel that assume a certain scheduler
behavior. There was a guy that reported a bad hdparm performance and i
tried it. By running hdparm -t my system has a context switch of 20-30
and an irq load of about 100-110.
The scheduler itself, even if you code it in visual basic, cannot make
this with such loads.
Did you try to profile the kernel ?




- Davide


2002-01-06 10:22:46

by Jens Axboe

[permalink] [raw]
Subject: Re: 2.5.2-pre performance degradation on an old 486

On Sat, Jan 05 2002, Davide Libenzi wrote:
> > > (*) 100MHz 486DX4, 28MB ram, no L2 cache, two old and slow IDE disks,
> > > small custom no-nonsense RedHat 7.2, kernels compiled with gcc 2.95.3.
> >
> > Is this ISA (maybe it has something to do with ISA bouncing)? Mine is:
> >
> > 486 DX/2 ISA, Adaptec 1542, two slow scsi disks and a self-made
> > slackware-based system.
> >
> > Can you also backout the scheduler changes to verify this? I have a
> > backout patch for 2.5.2-pre6, if you don't want to do this for yourself.
>
> There should be some part of the kernel that assume a certain scheduler
> behavior. There was a guy that reported a bad hdparm performance and i
> tried it. By running hdparm -t my system has a context switch of 20-30
> and an irq load of about 100-110.
> The scheduler itself, even if you code it in visual basic, cannot make
> this with such loads.
> Did you try to profile the kernel ?

Davide,

If this is caused by ISA bounce problems, then you should be able to
reproduce by doing something ala

[ drivers/ide/ide-dma.c ]

ide_toggle_bounce()
{
...

+ addr = BLK_BOUNCE_ISA;
blk_queue_bounce_limit(&drive->queue, addr);
}

pseudo-diff, just add the addr = line. Now compare performance with and
without your scheduler changes.

--
Jens Axboe

2002-01-06 10:38:50

by Andre Hedrick

[permalink] [raw]
Subject: Re: 2.5.2-pre performance degradation on an old 486

On Sun, 6 Jan 2002, Jens Axboe wrote:

> On Sat, Jan 05 2002, Davide Libenzi wrote:
> > > > (*) 100MHz 486DX4, 28MB ram, no L2 cache, two old and slow IDE disks,
> > > > small custom no-nonsense RedHat 7.2, kernels compiled with gcc 2.95.3.
> > >
> > > Is this ISA (maybe it has something to do with ISA bouncing)? Mine is:
> > >
> > > 486 DX/2 ISA, Adaptec 1542, two slow scsi disks and a self-made
> > > slackware-based system.
> > >
> > > Can you also backout the scheduler changes to verify this? I have a
> > > backout patch for 2.5.2-pre6, if you don't want to do this for yourself.
> >
> > There should be some part of the kernel that assume a certain scheduler
> > behavior. There was a guy that reported a bad hdparm performance and i
> > tried it. By running hdparm -t my system has a context switch of 20-30
> > and an irq load of about 100-110.
> > The scheduler itself, even if you code it in visual basic, cannot make
> > this with such loads.
> > Did you try to profile the kernel ?
>
> Davide,
>
> If this is caused by ISA bounce problems, then you should be able to
> reproduce by doing something ala
>
> [ drivers/ide/ide-dma.c ]
>
> ide_toggle_bounce()
> {
> ...
>
> + addr = BLK_BOUNCE_ISA;
> blk_queue_bounce_limit(&drive->queue, addr);
> }

Jens, how about getting a hardware list because I have prime2/3 ISA DMA
cards. Just not ready to test in 2.5.

Regards,


Andre Hedrick
CEO/President, LAD Storage Consulting Group
Linux ATA Development
Linux Disk Certification Project

2002-01-06 23:55:23

by Davide Libenzi

[permalink] [raw]
Subject: [patch] 2.5.2 scheduler code for 2.4.18-pre1 ( was 2.5.2-pre performance degradation on an old 486 )

On Sun, 6 Jan 2002, Jens Axboe wrote:

> On Sat, Jan 05 2002, Davide Libenzi wrote:
> > > > (*) 100MHz 486DX4, 28MB ram, no L2 cache, two old and slow IDE disks,
> > > > small custom no-nonsense RedHat 7.2, kernels compiled with gcc 2.95.3.
> > >
> > > Is this ISA (maybe it has something to do with ISA bouncing)? Mine is:
> > >
> > > 486 DX/2 ISA, Adaptec 1542, two slow scsi disks and a self-made
> > > slackware-based system.
> > >
> > > Can you also backout the scheduler changes to verify this? I have a
> > > backout patch for 2.5.2-pre6, if you don't want to do this for yourself.
> >
> > There should be some part of the kernel that assume a certain scheduler
> > behavior. There was a guy that reported a bad hdparm performance and i
> > tried it. By running hdparm -t my system has a context switch of 20-30
> > and an irq load of about 100-110.
> > The scheduler itself, even if you code it in visual basic, cannot make
> > this with such loads.
> > Did you try to profile the kernel ?
>
> Davide,
>
> If this is caused by ISA bounce problems, then you should be able to
> reproduce by doing something ala
>
> [ drivers/ide/ide-dma.c ]
>
> ide_toggle_bounce()
> {
> ...
>
> + addr = BLK_BOUNCE_ISA;
> blk_queue_bounce_limit(&drive->queue, addr);
> }
>
> pseudo-diff, just add the addr = line. Now compare performance with and
> without your scheduler changes.

I fail to understand where the scheduler code can influence this.
There's basically nothing inside blk_queue_bounce_limit()
I made this patch for Andrea and it's the scheduler code for 2.4.18-pre1
Could someone give it a try on old 486s




- Davide





diff -Nru linux-2.4.18-pre1.vanilla/arch/alpha/kernel/process.c linux-2.4.18-pre1.tsss/arch/alpha/kernel/process.c
--- linux-2.4.18-pre1.vanilla/arch/alpha/kernel/process.c Sun Sep 30 12:26:08 2001
+++ linux-2.4.18-pre1.tsss/arch/alpha/kernel/process.c Sat Jan 5 19:38:57 2002
@@ -75,7 +75,6 @@
{
/* An endless idle loop with no priority at all. */
current->nice = 20;
- current->counter = -100;

while (1) {
/* FIXME -- EV6 and LCA45 know how to power down
diff -Nru linux-2.4.18-pre1.vanilla/arch/arm/kernel/process.c linux-2.4.18-pre1.tsss/arch/arm/kernel/process.c
--- linux-2.4.18-pre1.vanilla/arch/arm/kernel/process.c Sun Sep 30 12:26:08 2001
+++ linux-2.4.18-pre1.tsss/arch/arm/kernel/process.c Sat Jan 5 19:38:57 2002
@@ -85,7 +85,6 @@
/* endless idle loop with no priority at all */
init_idle();
current->nice = 20;
- current->counter = -100;

while (1) {
void (*idle)(void) = pm_idle;
diff -Nru linux-2.4.18-pre1.vanilla/arch/cris/kernel/process.c linux-2.4.18-pre1.tsss/arch/cris/kernel/process.c
--- linux-2.4.18-pre1.vanilla/arch/cris/kernel/process.c Fri Nov 9 13:58:02 2001
+++ linux-2.4.18-pre1.tsss/arch/cris/kernel/process.c Sat Jan 5 19:38:57 2002
@@ -119,7 +119,6 @@
int cpu_idle(void *unused)
{
while(1) {
- current->counter = -100;
schedule();
}
}
diff -Nru linux-2.4.18-pre1.vanilla/arch/i386/kernel/process.c linux-2.4.18-pre1.tsss/arch/i386/kernel/process.c
--- linux-2.4.18-pre1.vanilla/arch/i386/kernel/process.c Thu Oct 4 18:42:54 2001
+++ linux-2.4.18-pre1.tsss/arch/i386/kernel/process.c Sat Jan 5 19:38:57 2002
@@ -125,7 +125,6 @@
/* endless idle loop with no priority at all */
init_idle();
current->nice = 20;
- current->counter = -100;

while (1) {
void (*idle)(void) = pm_idle;
diff -Nru linux-2.4.18-pre1.vanilla/arch/ia64/kernel/process.c linux-2.4.18-pre1.tsss/arch/ia64/kernel/process.c
--- linux-2.4.18-pre1.vanilla/arch/ia64/kernel/process.c Fri Nov 9 14:26:17 2001
+++ linux-2.4.18-pre1.tsss/arch/ia64/kernel/process.c Sat Jan 5 19:38:57 2002
@@ -114,8 +114,6 @@
/* endless idle loop with no priority at all */
init_idle();
current->nice = 20;
- current->counter = -100;
-

while (1) {
#ifdef CONFIG_SMP
diff -Nru linux-2.4.18-pre1.vanilla/arch/m68k/kernel/process.c linux-2.4.18-pre1.tsss/arch/m68k/kernel/process.c
--- linux-2.4.18-pre1.vanilla/arch/m68k/kernel/process.c Sun Sep 30 12:26:08 2001
+++ linux-2.4.18-pre1.tsss/arch/m68k/kernel/process.c Sat Jan 5 19:38:57 2002
@@ -81,7 +81,6 @@
/* endless idle loop with no priority at all */
init_idle();
current->nice = 20;
- current->counter = -100;
idle();
}

diff -Nru linux-2.4.18-pre1.vanilla/arch/mips/kernel/process.c linux-2.4.18-pre1.tsss/arch/mips/kernel/process.c
--- linux-2.4.18-pre1.vanilla/arch/mips/kernel/process.c Sun Sep 9 10:43:01 2001
+++ linux-2.4.18-pre1.tsss/arch/mips/kernel/process.c Sat Jan 5 19:38:57 2002
@@ -36,7 +36,6 @@
{
/* endless idle loop with no priority at all */
current->nice = 20;
- current->counter = -100;
init_idle();

while (1) {
diff -Nru linux-2.4.18-pre1.vanilla/arch/mips64/kernel/process.c linux-2.4.18-pre1.tsss/arch/mips64/kernel/process.c
--- linux-2.4.18-pre1.vanilla/arch/mips64/kernel/process.c Fri Feb 9 11:29:44 2001
+++ linux-2.4.18-pre1.tsss/arch/mips64/kernel/process.c Sat Jan 5 19:38:57 2002
@@ -34,7 +34,6 @@
/* endless idle loop with no priority at all */
init_idle();
current->nice = 20;
- current->counter = -100;
while (1) {
while (!current->need_resched)
if (wait_available)
diff -Nru linux-2.4.18-pre1.vanilla/arch/parisc/kernel/process.c linux-2.4.18-pre1.tsss/arch/parisc/kernel/process.c
--- linux-2.4.18-pre1.vanilla/arch/parisc/kernel/process.c Fri Feb 9 11:29:44 2001
+++ linux-2.4.18-pre1.tsss/arch/parisc/kernel/process.c Sat Jan 5 19:38:57 2002
@@ -71,7 +71,6 @@
/* endless idle loop with no priority at all */
init_idle();
current->nice = 20;
- current->counter = -100;

while (1) {
while (!current->need_resched) {
diff -Nru linux-2.4.18-pre1.vanilla/arch/ppc/8260_io/uart.c linux-2.4.18-pre1.tsss/arch/ppc/8260_io/uart.c
--- linux-2.4.18-pre1.vanilla/arch/ppc/8260_io/uart.c Sat Jan 5 19:34:50 2002
+++ linux-2.4.18-pre1.tsss/arch/ppc/8260_io/uart.c Sat Jan 5 19:38:57 2002
@@ -1732,7 +1732,7 @@
printk("lsr = %d (jiff=%lu)...", lsr, jiffies);
#endif
current->state = TASK_INTERRUPTIBLE;
-/* current->counter = 0; make us low-priority */
+/* current->dyn_prio = 0; make us low-priority */
schedule_timeout(char_time);
if (signal_pending(current))
break;
diff -Nru linux-2.4.18-pre1.vanilla/arch/ppc/8xx_io/uart.c linux-2.4.18-pre1.tsss/arch/ppc/8xx_io/uart.c
--- linux-2.4.18-pre1.vanilla/arch/ppc/8xx_io/uart.c Sat Jan 5 19:34:50 2002
+++ linux-2.4.18-pre1.tsss/arch/ppc/8xx_io/uart.c Sat Jan 5 19:38:57 2002
@@ -1798,7 +1798,7 @@
printk("lsr = %d (jiff=%lu)...", lsr, jiffies);
#endif
current->state = TASK_INTERRUPTIBLE;
-/* current->counter = 0; make us low-priority */
+/* current->dyn_prio = 0; make us low-priority */
schedule_timeout(char_time);
if (signal_pending(current))
break;
diff -Nru linux-2.4.18-pre1.vanilla/arch/ppc/kernel/idle.c linux-2.4.18-pre1.tsss/arch/ppc/kernel/idle.c
--- linux-2.4.18-pre1.vanilla/arch/ppc/kernel/idle.c Sat Jan 5 19:34:50 2002
+++ linux-2.4.18-pre1.tsss/arch/ppc/kernel/idle.c Sat Jan 5 19:38:57 2002
@@ -54,7 +54,6 @@

/* endless loop with no priority at all */
current->nice = 20;
- current->counter = -100;
init_idle();
for (;;) {
#ifdef CONFIG_SMP
diff -Nru linux-2.4.18-pre1.vanilla/arch/s390/kernel/process.c linux-2.4.18-pre1.tsss/arch/s390/kernel/process.c
--- linux-2.4.18-pre1.vanilla/arch/s390/kernel/process.c Sat Jan 5 19:34:51 2002
+++ linux-2.4.18-pre1.tsss/arch/s390/kernel/process.c Sat Jan 5 19:38:57 2002
@@ -57,7 +57,6 @@
/* endless idle loop with no priority at all */
init_idle();
current->nice = 20;
- current->counter = -100;
wait_psw.mask = _WAIT_PSW_MASK;
wait_psw.addr = (unsigned long) &&idle_wakeup | 0x80000000L;
while(1) {
diff -Nru linux-2.4.18-pre1.vanilla/arch/s390x/kernel/process.c linux-2.4.18-pre1.tsss/arch/s390x/kernel/process.c
--- linux-2.4.18-pre1.vanilla/arch/s390x/kernel/process.c Sat Jan 5 19:34:51 2002
+++ linux-2.4.18-pre1.tsss/arch/s390x/kernel/process.c Sat Jan 5 19:38:57 2002
@@ -57,7 +57,6 @@
/* endless idle loop with no priority at all */
init_idle();
current->nice = 20;
- current->counter = -100;
wait_psw.mask = _WAIT_PSW_MASK;
wait_psw.addr = (unsigned long) &&idle_wakeup;
while(1) {
diff -Nru linux-2.4.18-pre1.vanilla/arch/sh/kernel/process.c linux-2.4.18-pre1.tsss/arch/sh/kernel/process.c
--- linux-2.4.18-pre1.vanilla/arch/sh/kernel/process.c Mon Oct 15 13:36:48 2001
+++ linux-2.4.18-pre1.tsss/arch/sh/kernel/process.c Sat Jan 5 19:38:57 2002
@@ -41,7 +41,6 @@
/* endless idle loop with no priority at all */
init_idle();
current->nice = 20;
- current->counter = -100;

while (1) {
if (hlt_counter) {
diff -Nru linux-2.4.18-pre1.vanilla/arch/sparc/kernel/process.c linux-2.4.18-pre1.tsss/arch/sparc/kernel/process.c
--- linux-2.4.18-pre1.vanilla/arch/sparc/kernel/process.c Fri Dec 21 09:41:53 2001
+++ linux-2.4.18-pre1.tsss/arch/sparc/kernel/process.c Sat Jan 5 19:38:57 2002
@@ -61,7 +61,6 @@

/* endless idle loop with no priority at all */
current->nice = 20;
- current->counter = -100;
init_idle();

for (;;) {
@@ -110,7 +109,6 @@
{
/* endless idle loop with no priority at all */
current->nice = 20;
- current->counter = -100;
init_idle();

while(1) {
diff -Nru linux-2.4.18-pre1.vanilla/arch/sparc64/kernel/process.c linux-2.4.18-pre1.tsss/arch/sparc64/kernel/process.c
--- linux-2.4.18-pre1.vanilla/arch/sparc64/kernel/process.c Fri Dec 21 09:41:53 2001
+++ linux-2.4.18-pre1.tsss/arch/sparc64/kernel/process.c Sat Jan 5 19:38:57 2002
@@ -54,7 +54,6 @@

/* endless idle loop with no priority at all */
current->nice = 20;
- current->counter = -100;
init_idle();

for (;;) {
@@ -84,7 +83,6 @@
int cpu_idle(void)
{
current->nice = 20;
- current->counter = -100;
init_idle();

while(1) {
diff -Nru linux-2.4.18-pre1.vanilla/drivers/net/slip.c linux-2.4.18-pre1.tsss/drivers/net/slip.c
--- linux-2.4.18-pre1.vanilla/drivers/net/slip.c Sun Sep 30 12:26:07 2001
+++ linux-2.4.18-pre1.tsss/drivers/net/slip.c Sat Jan 5 19:38:57 2002
@@ -1394,7 +1394,7 @@
*/
do {
if (busy) {
- current->counter = 0;
+ current->time_slice = 0;
schedule();
}

diff -Nru linux-2.4.18-pre1.vanilla/fs/proc/array.c linux-2.4.18-pre1.tsss/fs/proc/array.c
--- linux-2.4.18-pre1.vanilla/fs/proc/array.c Thu Oct 11 09:00:01 2001
+++ linux-2.4.18-pre1.tsss/fs/proc/array.c Sat Jan 5 19:38:57 2002
@@ -335,8 +335,7 @@

/* scale priority and nice values from timeslices to -20..20 */
/* to make it look like a "normal" Unix priority/nice value */
- priority = task->counter;
- priority = 20 - (priority * 10 + DEF_COUNTER / 2) / DEF_COUNTER;
+ priority = task->dyn_prio;
nice = task->nice;

read_lock(&tasklist_lock);
diff -Nru linux-2.4.18-pre1.vanilla/include/linux/sched.h linux-2.4.18-pre1.tsss/include/linux/sched.h
--- linux-2.4.18-pre1.vanilla/include/linux/sched.h Fri Dec 21 09:42:03 2001
+++ linux-2.4.18-pre1.tsss/include/linux/sched.h Sat Jan 5 19:56:14 2002
@@ -150,6 +150,7 @@
extern void update_process_times(int user);
extern void update_one_process(struct task_struct *p, unsigned long user,
unsigned long system, int cpu);
+extern void expire_task(struct task_struct *p);

#define MAX_SCHEDULE_TIMEOUT LONG_MAX
extern signed long FASTCALL(schedule_timeout(signed long timeout));
@@ -300,7 +301,7 @@
* all fields in a single cacheline that are needed for
* the goodness() loop in schedule().
*/
- long counter;
+ unsigned long dyn_prio;
long nice;
unsigned long policy;
struct mm_struct *mm;
@@ -319,7 +320,9 @@
* that's just fine.)
*/
struct list_head run_list;
- unsigned long sleep_time;
+ long time_slice;
+ /* recalculation loop checkpoint */
+ unsigned long rcl_last;

struct task_struct *next_task, *prev_task;
struct mm_struct *active_mm;
@@ -446,8 +449,9 @@
*/
#define _STK_LIM (8*1024*1024)

-#define DEF_COUNTER (10*HZ/100) /* 100 ms time slice */
-#define MAX_COUNTER (20*HZ/100)
+#define MAX_DYNPRIO 40
+#define DEF_TSLICE (5 * HZ / 100)
+#define MAX_TSLICE (20 * HZ / 100)
#define DEF_NICE (0)


@@ -468,14 +472,16 @@
addr_limit: KERNEL_DS, \
exec_domain: &default_exec_domain, \
lock_depth: -1, \
- counter: DEF_COUNTER, \
+ dyn_prio: 0, \
nice: DEF_NICE, \
policy: SCHED_OTHER, \
mm: NULL, \
active_mm: &init_mm, \
cpus_runnable: -1, \
cpus_allowed: -1, \
- run_list: LIST_HEAD_INIT(tsk.run_list), \
+ run_list: { NULL, NULL }, \
+ rcl_last: 0, \
+ time_slice: DEF_TSLICE, \
next_task: &tsk, \
prev_task: &tsk, \
p_opptr: &tsk, \
@@ -876,7 +882,6 @@
static inline void del_from_runqueue(struct task_struct * p)
{
nr_running--;
- p->sleep_time = jiffies;
list_del(&p->run_list);
p->run_list.next = NULL;
}
diff -Nru linux-2.4.18-pre1.vanilla/kernel/exit.c linux-2.4.18-pre1.tsss/kernel/exit.c
--- linux-2.4.18-pre1.vanilla/kernel/exit.c Sat Jan 5 19:34:51 2002
+++ linux-2.4.18-pre1.tsss/kernel/exit.c Sat Jan 5 19:38:57 2002
@@ -62,9 +62,9 @@
* timeslices, because any timeslice recovered here
* was given away by the parent in the first place.)
*/
- current->counter += p->counter;
- if (current->counter >= MAX_COUNTER)
- current->counter = MAX_COUNTER;
+ current->time_slice += p->time_slice;
+ if (current->time_slice > MAX_TSLICE)
+ current->time_slice = MAX_TSLICE;
p->pid = 0;
free_task_struct(p);
} else {
diff -Nru linux-2.4.18-pre1.vanilla/kernel/fork.c linux-2.4.18-pre1.tsss/kernel/fork.c
--- linux-2.4.18-pre1.vanilla/kernel/fork.c Wed Nov 21 10:18:42 2001
+++ linux-2.4.18-pre1.tsss/kernel/fork.c Sat Jan 5 19:38:57 2002
@@ -682,9 +682,9 @@
* more scheduling fairness. This is only important in the first
* timeslice, on the long run the scheduling behaviour is unchanged.
*/
- p->counter = (current->counter + 1) >> 1;
- current->counter >>= 1;
- if (!current->counter)
+ p->time_slice = (current->time_slice + 1) >> 1;
+ current->time_slice >>= 1;
+ if (!current->time_slice)
current->need_resched = 1;

/*
diff -Nru linux-2.4.18-pre1.vanilla/kernel/sched.c linux-2.4.18-pre1.tsss/kernel/sched.c
--- linux-2.4.18-pre1.vanilla/kernel/sched.c Fri Dec 21 09:42:04 2001
+++ linux-2.4.18-pre1.tsss/kernel/sched.c Sat Jan 5 19:52:29 2002
@@ -51,24 +51,16 @@
* NOTE! The unix "nice" value influences how long a process
* gets. The nice value ranges from -20 to +19, where a -20
* is a "high-priority" task, and a "+10" is a low-priority
- * task.
- *
- * We want the time-slice to be around 50ms or so, so this
- * calculation depends on the value of HZ.
+ * task. The default time slice for zero-nice tasks will be 37ms.
*/
-#if HZ < 200
-#define TICK_SCALE(x) ((x) >> 2)
-#elif HZ < 400
-#define TICK_SCALE(x) ((x) >> 1)
-#elif HZ < 800
-#define TICK_SCALE(x) (x)
-#elif HZ < 1600
-#define TICK_SCALE(x) ((x) << 1)
-#else
-#define TICK_SCALE(x) ((x) << 2)
-#endif
+#define NICE_RANGE 40
+#define MIN_NICE_TSLICE 10000
+#define MAX_NICE_TSLICE 90000
+#define TASK_TIMESLICE(p) ((int) ts_table[19 - (p)->nice])
+
+static unsigned char ts_table[NICE_RANGE];

-#define NICE_TO_TICKS(nice) (TICK_SCALE(20-(nice))+1)
+#define MM_AFFINITY_BONUS 1


/*
@@ -94,6 +86,8 @@

static LIST_HEAD(runqueue_head);

+static unsigned long rcl_curr = 0;
+
/*
* We align per-CPU scheduling data on cacheline boundaries,
* to prevent cacheline ping-pong.
@@ -165,10 +159,11 @@
* Don't do any other calculations if the time slice is
* over..
*/
- weight = p->counter;
- if (!weight)
- goto out;
-
+ if (!p->time_slice)
+ return 0;
+
+ weight = p->dyn_prio + 1;
+
#ifdef CONFIG_SMP
/* Give a largish advantage to the same processor... */
/* (this is equivalent to penalizing other processors) */
@@ -178,7 +173,7 @@

/* .. and a slight advantage to the current MM */
if (p->mm == this_mm || !p->mm)
- weight += 1;
+ weight += MM_AFFINITY_BONUS;
weight += 20 - p->nice;
goto out;
}
@@ -324,6 +319,9 @@
*/
static inline void add_to_runqueue(struct task_struct * p)
{
+ p->dyn_prio += rcl_curr - p->rcl_last;
+ p->rcl_last = rcl_curr;
+ if (p->dyn_prio > MAX_DYNPRIO) p->dyn_prio = MAX_DYNPRIO;
list_add(&p->run_list, &runqueue_head);
nr_running++;
}
@@ -536,6 +534,19 @@
__schedule_tail(prev);
}

+void expire_task(struct task_struct *p)
+{
+ if (unlikely(!p->time_slice))
+ goto need_resched;
+
+ if (!--p->time_slice) {
+ if (p->dyn_prio)
+ p->dyn_prio--;
+ need_resched:
+ p->need_resched = 1;
+ }
+}
+
/*
* 'schedule()' is the scheduler function. It's a very simple and nice
* scheduler: it's not perfect, but certainly works for most things.
@@ -578,20 +589,20 @@

/* move an exhausted RR process to be last.. */
if (unlikely(prev->policy == SCHED_RR))
- if (!prev->counter) {
- prev->counter = NICE_TO_TICKS(prev->nice);
+ if (!prev->time_slice) {
+ prev->time_slice = TASK_TIMESLICE(prev);
move_last_runqueue(prev);
}

switch (prev->state) {
- case TASK_INTERRUPTIBLE:
- if (signal_pending(prev)) {
- prev->state = TASK_RUNNING;
- break;
- }
- default:
- del_from_runqueue(prev);
- case TASK_RUNNING:;
+ case TASK_INTERRUPTIBLE:
+ if (signal_pending(prev)) {
+ prev->state = TASK_RUNNING;
+ break;
+ }
+ default:
+ del_from_runqueue(prev);
+ case TASK_RUNNING:;
}
prev->need_resched = 0;

@@ -616,14 +627,12 @@

/* Do we need to re-calculate counters? */
if (unlikely(!c)) {
- struct task_struct *p;
-
- spin_unlock_irq(&runqueue_lock);
- read_lock(&tasklist_lock);
- for_each_task(p)
- p->counter = (p->counter >> 1) + NICE_TO_TICKS(p->nice);
- read_unlock(&tasklist_lock);
- spin_lock_irq(&runqueue_lock);
+ ++rcl_curr;
+ list_for_each(tmp, &runqueue_head) {
+ p = list_entry(tmp, struct task_struct, run_list);
+ p->time_slice = TASK_TIMESLICE(p);
+ p->rcl_last = rcl_curr;
+ }
goto repeat_schedule;
}

@@ -1056,17 +1065,17 @@
nr_pending--;
#endif
if (nr_pending) {
+ struct task_struct *ctsk = current;
/*
* This process can only be rescheduled by us,
* so this is safe without any locking.
*/
- if (current->policy == SCHED_OTHER)
- current->policy |= SCHED_YIELD;
- current->need_resched = 1;
-
- spin_lock_irq(&runqueue_lock);
- move_last_runqueue(current);
- spin_unlock_irq(&runqueue_lock);
+ if (ctsk->policy == SCHED_OTHER)
+ ctsk->policy |= SCHED_YIELD;
+ ctsk->need_resched = 1;
+
+ ctsk->time_slice = 0;
+ ++ctsk->dyn_prio;
}
return 0;
}
@@ -1115,7 +1124,7 @@
read_lock(&tasklist_lock);
p = find_process_by_pid(pid);
if (p)
- jiffies_to_timespec(p->policy & SCHED_FIFO ? 0 : NICE_TO_TICKS(p->nice),
+ jiffies_to_timespec(p->policy & SCHED_FIFO ? 0 : TASK_TIMESLICE(p),
&t);
read_unlock(&tasklist_lock);
if (p)
@@ -1306,9 +1315,10 @@

if (current != &init_task && task_on_runqueue(current)) {
printk("UGH! (%d:%d) was on the runqueue, removing.\n",
- smp_processor_id(), current->pid);
+ smp_processor_id(), current->pid);
del_from_runqueue(current);
}
+ current->dyn_prio = 0;
sched_data->curr = current;
sched_data->last_schedule = get_cycles();
clear_bit(current->processor, &wait_init_idle);
@@ -1316,6 +1326,18 @@

extern void init_timervecs (void);

+static void fill_tslice_map(void)
+{
+ int i;
+
+ for (i = 0; i < NICE_RANGE; i++) {
+ ts_table[i] = ((MIN_NICE_TSLICE +
+ ((MAX_NICE_TSLICE -
+ MIN_NICE_TSLICE) / (NICE_RANGE - 1)) * i) * HZ) / 1000000;
+ if (!ts_table[i]) ts_table[i] = 1;
+ }
+}
+
void __init sched_init(void)
{
/*
@@ -1329,6 +1351,8 @@

for(nr = 0; nr < PIDHASH_SZ; nr++)
pidhash[nr] = NULL;
+
+ fill_tslice_map();

init_timervecs();

diff -Nru linux-2.4.18-pre1.vanilla/kernel/timer.c linux-2.4.18-pre1.tsss/kernel/timer.c
--- linux-2.4.18-pre1.vanilla/kernel/timer.c Mon Oct 8 10:41:41 2001
+++ linux-2.4.18-pre1.tsss/kernel/timer.c Sat Jan 5 19:38:57 2002
@@ -583,10 +583,7 @@

update_one_process(p, user_tick, system, cpu);
if (p->pid) {
- if (--p->counter <= 0) {
- p->counter = 0;
- p->need_resched = 1;
- }
+ expire_task(p);
if (p->nice > 0)
kstat.per_cpu_nice[cpu] += user_tick;
else
diff -Nru linux-2.4.18-pre1.vanilla/mm/oom_kill.c linux-2.4.18-pre1.tsss/mm/oom_kill.c
--- linux-2.4.18-pre1.vanilla/mm/oom_kill.c Sat Nov 3 17:05:25 2001
+++ linux-2.4.18-pre1.tsss/mm/oom_kill.c Sat Jan 5 19:38:57 2002
@@ -149,7 +149,8 @@
* all the memory it needs. That way it should be able to
* exit() and clear out its resources quickly...
*/
- p->counter = 5 * HZ;
+ p->time_slice = 2 * MAX_TSLICE;
+ p->dyn_prio = MAX_DYNPRIO + 1;
p->flags |= PF_MEMALLOC | PF_MEMDIE;

/* This process has hardware access, be more careful. */


2002-01-07 01:33:45

by Mikael Pettersson

[permalink] [raw]
Subject: Re: [patch] 2.5.2 scheduler code for 2.4.18-pre1 ( was 2.5.2-pre performance degradation on an old 486 )

On Sun, 6 Jan 2002 15:59:05 -0800 (PST), Davide Libenzi wrote:
>I made this patch for Andrea and it's the scheduler code for 2.4.18-pre1
>Could someone give it a try on old 486s

Done. On my '93 vintage 486, 2.4.18p1 + your scheduler results in very
bursty I/O and poor performance, just like I reported for 2.5.2-pre7.

/Mikael

2002-01-07 01:40:05

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: [patch] 2.5.2 scheduler code for 2.4.18-pre1 ( was 2.5.2-pre performance degradation on an old 486 )

On Sun, Jan 06, 2002 at 03:59:05PM -0800, Davide Libenzi wrote:
> On Sun, 6 Jan 2002, Jens Axboe wrote:
>
> > On Sat, Jan 05 2002, Davide Libenzi wrote:
> > > > > (*) 100MHz 486DX4, 28MB ram, no L2 cache, two old and slow IDE disks,
> > > > > small custom no-nonsense RedHat 7.2, kernels compiled with gcc 2.95.3.
> > > >
> > > > Is this ISA (maybe it has something to do with ISA bouncing)? Mine is:
> > > >
> > > > 486 DX/2 ISA, Adaptec 1542, two slow scsi disks and a self-made
> > > > slackware-based system.
> > > >
> > > > Can you also backout the scheduler changes to verify this? I have a
> > > > backout patch for 2.5.2-pre6, if you don't want to do this for yourself.
> > >
> > > There should be some part of the kernel that assume a certain scheduler
> > > behavior. There was a guy that reported a bad hdparm performance and i
> > > tried it. By running hdparm -t my system has a context switch of 20-30
> > > and an irq load of about 100-110.
> > > The scheduler itself, even if you code it in visual basic, cannot make
> > > this with such loads.
> > > Did you try to profile the kernel ?
> >
> > Davide,
> >
> > If this is caused by ISA bounce problems, then you should be able to
> > reproduce by doing something ala
> >
> > [ drivers/ide/ide-dma.c ]
> >
> > ide_toggle_bounce()
> > {
> > ...
> >
> > + addr = BLK_BOUNCE_ISA;
> > blk_queue_bounce_limit(&drive->queue, addr);
> > }
> >
> > pseudo-diff, just add the addr = line. Now compare performance with and
> > without your scheduler changes.
>
> I fail to understand where the scheduler code can influence this.
> There's basically nothing inside blk_queue_bounce_limit()
> I made this patch for Andrea and it's the scheduler code for 2.4.18-pre1
> Could someone give it a try on old 486s

yes please (feel free to CC me on the answers), I'd really like to
reduce the scheduler O(N) overhead to the number of the running tasks,
rather than doing the recalculate all over the processes in the machine.
O(1) scheduler would be even better of course, but the below would
ensure not to hurt the 1 task running case, and it's way simpler to
check for correctness (so it's easier to include it as a start).

Andrea

2002-01-07 02:33:35

by Davide Libenzi

[permalink] [raw]
Subject: Re: [patch] 2.5.2 scheduler code for 2.4.18-pre1 ( was 2.5.2-pre performance degradation on an old 486 )

On Mon, 7 Jan 2002, Mikael Pettersson wrote:

> On Sun, 6 Jan 2002 15:59:05 -0800 (PST), Davide Libenzi wrote:
> >I made this patch for Andrea and it's the scheduler code for 2.4.18-pre1
> >Could someone give it a try on old 486s
>
> Done. On my '93 vintage 486, 2.4.18p1 + your scheduler results in very
> bursty I/O and poor performance, just like I reported for 2.5.2-pre7.

Can you try some changes that i'll tell you ?




- Davide


2002-01-07 07:30:09

by Matthias Hanisch

[permalink] [raw]
Subject: Re: 2.5.2-pre performance degradation on an old 486

On Sat, 5 Jan 2002, Davide Libenzi wrote:

> There should be some part of the kernel that assume a certain scheduler
> behavior. There was a guy that reported a bad hdparm performance and i
> tried it. By running hdparm -t my system has a context switch of 20-30
> and an irq load of about 100-110.

This guy was me, IMHO (just with my office email address :).


> The scheduler itself, even if you code it in visual basic, cannot make
> this with such loads.
> Did you try to profile the kernel ?

To answer your question, I wanted to profile 2.5.2-pre8 against
2.5.2-pre8-old-scheduler. _Fortunately_ I made some mistake and forgot to
back out the following chunk of memory.

--- v2.5.1/linux/arch/i386/kernel/process.c Thu Oct 4 18:42:54 2001
+++ linux/arch/i386/kernel/process.c Thu Dec 27 08:21:28 2001
@@ -125,7 +125,6 @@
/* endless idle loop with no priority at all */
init_idle();
current->nice = 20;
- current->counter = -100;

while (1) {
void (*idle)(void) = pm_idle;

So it seems, that removing this line from kernel sources with the old
scheduler causes this unresponsive behavior. This chunk looks also a
little bit strange. In most (all?) the other chunks "counter" gots
replaced with "dyn_prio", not completely removed.

I'll verify this tonight (have to earn some money at first :). I'll do
also some profiling.

Mikael, if you have time, maybe you can try to apply only this chunk of
patch (or only remove the line) to a clean 2.4.18-pre1 and report the
behavior.


Davide, regarding your question in the other mail:

> Can you try some changes that i'll tell you ?

Please forward to me also. Sometimes it takes a little bit longer, because
there is also life without LKML, but I want to get this understood and
fixed, so I'll try to help you as much as I can.


Regards,
Matze


2002-01-07 07:33:29

by Jens Axboe

[permalink] [raw]
Subject: Re: [patch] 2.5.2 scheduler code for 2.4.18-pre1 ( was 2.5.2-pre performance degradation on an old 486 )

On Sun, Jan 06 2002, Davide Libenzi wrote:
> > Davide,
> >
> > If this is caused by ISA bounce problems, then you should be able to
> > reproduce by doing something ala
> >
> > [ drivers/ide/ide-dma.c ]
> >
> > ide_toggle_bounce()
> > {
> > ...
> >
> > + addr = BLK_BOUNCE_ISA;
> > blk_queue_bounce_limit(&drive->queue, addr);
> > }
> >
> > pseudo-diff, just add the addr = line. Now compare performance with and
> > without your scheduler changes.
>
> I fail to understand where the scheduler code can influence this.
> There's basically nothing inside blk_queue_bounce_limit()

Eh of course not, no time will be spent inside blk_queue_bounce_limit. I
don't think you looked very long at this :-)

The point is that ISA bouncing will spend some time scheduling waiting
for available memory in the __GFP_DMA zone.

--
Jens Axboe

2002-01-07 07:34:09

by Jens Axboe

[permalink] [raw]
Subject: Re: [patch] 2.5.2 scheduler code for 2.4.18-pre1 ( was 2.5.2-pre performance degradation on an old 486 )

On Sun, Jan 06 2002, Davide Libenzi wrote:
> On Mon, 7 Jan 2002, Mikael Pettersson wrote:
>
> > On Sun, 6 Jan 2002 15:59:05 -0800 (PST), Davide Libenzi wrote:
> > >I made this patch for Andrea and it's the scheduler code for 2.4.18-pre1
> > >Could someone give it a try on old 486s
> >
> > Done. On my '93 vintage 486, 2.4.18p1 + your scheduler results in very
> > bursty I/O and poor performance, just like I reported for 2.5.2-pre7.
>
> Can you try some changes that i'll tell you ?

Did you _try_ the ISA bounce trick to reproduce locally??

--
Jens Axboe

2002-01-07 14:32:00

by J.A. Magallon

[permalink] [raw]
Subject: Re: [patch] 2.5.2 scheduler code for 2.4.18-pre1 ( was 2.5.2-pre performance degradation on an old 486 )


On 20020107 Andrea Arcangeli wrote:
>
>yes please (feel free to CC me on the answers), I'd really like to
>reduce the scheduler O(N) overhead to the number of the running tasks,
>rather than doing the recalculate all over the processes in the machine.
>O(1) scheduler would be even better of course, but the below would
>ensure not to hurt the 1 task running case, and it's way simpler to
>check for correctness (so it's easier to include it as a start).
>

It looks like you all are going to turn the scheduler upside-down.
Hmm, as a non-kernel-hacker observer from the world outside, could I
make a suggestion ?
Is it easy to split the thing in steps:
- Move from single-queue to per-cpu-queue, with just the same algorithm
that is running now for per-queue scheduling.
- Get that running for 2.18.18 and 2.5.2
- Then start to play with the per-queue scheduling algorithm:
* better O(n)
* O(1)
* O(1) with different queues for RT and non RT
etc...

Is it easy enough or are both steps so related that can not be split ?

Thanks.

(a linux user that tries experimental kernels and is seeing them grow
like mushrooms in latest weeks...)

--
J.A. Magallon # Let the source be with you...
mailto:[email protected]
Mandrake Linux release 8.2 (Cooker) for i586
Linux werewolf 2.4.18-pre1-beo #1 SMP Fri Jan 4 02:25:59 CET 2002 i686

2002-01-07 14:39:00

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: [patch] 2.5.2 scheduler code for 2.4.18-pre1 ( was 2.5.2-pre performance degradation on an old 486 )

On Mon, Jan 07, 2002 at 03:35:33PM +0100, J.A. Magallon wrote:
>
> On 20020107 Andrea Arcangeli wrote:
> >
> >yes please (feel free to CC me on the answers), I'd really like to
> >reduce the scheduler O(N) overhead to the number of the running tasks,
> >rather than doing the recalculate all over the processes in the machine.
> >O(1) scheduler would be even better of course, but the below would
> >ensure not to hurt the 1 task running case, and it's way simpler to
> >check for correctness (so it's easier to include it as a start).
> >
>
> It looks like you all are going to turn the scheduler upside-down.
> Hmm, as a non-kernel-hacker observer from the world outside, could I
> make a suggestion ?
> Is it easy to split the thing in steps:
> - Move from single-queue to per-cpu-queue, with just the same algorithm
> that is running now for per-queue scheduling.

I don't mind about SMP (I don't think SMP scalability of the scheduler
is that bad to require this change in 2.4), I'd only like an UP (or SMP
as well of course) box not to follow a linked list of 2k tasks during a
reschedule if only 1 is running all the time.

> - Get that running for 2.18.18 and 2.5.2
> - Then start to play with the per-queue scheduling algorithm:
> * better O(n)
> * O(1)
> * O(1) with different queues for RT and non RT
> etc...
>
> Is it easy enough or are both steps so related that can not be split ?
>
> Thanks.
>
> (a linux user that tries experimental kernels and is seeing them grow
> like mushrooms in latest weeks...)
>
> --
> J.A. Magallon # Let the source be with you...
> mailto:[email protected]
> Mandrake Linux release 8.2 (Cooker) for i586
> Linux werewolf 2.4.18-pre1-beo #1 SMP Fri Jan 4 02:25:59 CET 2002 i686


Andrea

2002-01-07 16:44:51

by Linus Torvalds

[permalink] [raw]
Subject: Re: 2.5.2-pre performance degradation on an old 486


On Mon, 7 Jan 2002, Matthias Hanisch wrote:
>
> To answer your question, I wanted to profile 2.5.2-pre8 against
> 2.5.2-pre8-old-scheduler. _Fortunately_ I made some mistake and forgot to
> back out the following chunk of memory.
>
> --- v2.5.1/linux/arch/i386/kernel/process.c Thu Oct 4 18:42:54 2001
> +++ linux/arch/i386/kernel/process.c Thu Dec 27 08:21:28 2001
> @@ -125,7 +125,6 @@
> /* endless idle loop with no priority at all */
> init_idle();
> current->nice = 20;
> - current->counter = -100;
>
> while (1) {
> void (*idle)(void) = pm_idle;

Hey, that would do it. It looks like the idle task ends up being a
_normal_ process (just nice'd down), so it will get real CPU time instead
of only getting scheduled when nothing else is runnable.

Davide, I think the bounce-buffer is a red herring, it's simply that we're
wasting time in idle..

Linus

2002-01-07 18:01:48

by Mikael Pettersson

[permalink] [raw]
Subject: Re: 2.5.2-pre performance degradation on an old 486

On Mon, 7 Jan 2002 08:43:04 -0800 (PST), Linus Torvalds wrote:
>Hey, that would do it. It looks like the idle task ends up being a
>_normal_ process (just nice'd down), so it will get real CPU time instead
>of only getting scheduled when nothing else is runnable.
>
>Davide, I think the bounce-buffer is a red herring, it's simply that we're
>wasting time in idle..

This does seem to be the case. As a quick hack I added

if (p == &init_task) return -50;

at the start of kernel/sched.c:goodness() [to approximate the old
scheduler's behaviour], and this immediately restored performance
on my 486 to the old scheduler's levels.

/Mikael

2002-01-07 18:07:08

by Davide Libenzi

[permalink] [raw]
Subject: Re: [patch] 2.5.2 scheduler code for 2.4.18-pre1 ( was 2.5.2-pre performance degradation on an old 486 )

On Mon, 7 Jan 2002, Jens Axboe wrote:

> On Sun, Jan 06 2002, Davide Libenzi wrote:
> > > Davide,
> > >
> > > If this is caused by ISA bounce problems, then you should be able to
> > > reproduce by doing something ala
> > >
> > > [ drivers/ide/ide-dma.c ]
> > >
> > > ide_toggle_bounce()
> > > {
> > > ...
> > >
> > > + addr = BLK_BOUNCE_ISA;
> > > blk_queue_bounce_limit(&drive->queue, addr);
> > > }
> > >
> > > pseudo-diff, just add the addr = line. Now compare performance with and
> > > without your scheduler changes.
> >
> > I fail to understand where the scheduler code can influence this.
> > There's basically nothing inside blk_queue_bounce_limit()
>
> Eh of course not, no time will be spent inside blk_queue_bounce_limit. I
> don't think you looked very long at this :-)
>
> The point is that ISA bouncing will spend some time scheduling waiting
> for available memory in the __GFP_DMA zone.

I looked and i already pointed out this to Linus.
The memory pool creation ends up by calling alloc_pages and there could
exist race.
I've not had the time for expariments.



- Davide


2002-01-07 18:07:18

by Davide Libenzi

[permalink] [raw]
Subject: Re: [patch] 2.5.2 scheduler code for 2.4.18-pre1 ( was 2.5.2-pre performance degradation on an old 486 )

On Mon, 7 Jan 2002, Jens Axboe wrote:

> On Sun, Jan 06 2002, Davide Libenzi wrote:
> > On Mon, 7 Jan 2002, Mikael Pettersson wrote:
> >
> > > On Sun, 6 Jan 2002 15:59:05 -0800 (PST), Davide Libenzi wrote:
> > > >I made this patch for Andrea and it's the scheduler code for 2.4.18-pre1
> > > >Could someone give it a try on old 486s
> > >
> > > Done. On my '93 vintage 486, 2.4.18p1 + your scheduler results in very
> > > bursty I/O and poor performance, just like I reported for 2.5.2-pre7.
> >
> > Can you try some changes that i'll tell you ?
>
> Did you _try_ the ISA bounce trick to reproduce locally??

I'll try today even if i think that one of the guy that had problems
pointed out it.




- Davide


2002-01-07 18:27:18

by Davide Libenzi

[permalink] [raw]
Subject: Re: 2.5.2-pre performance degradation on an old 486

On Mon, 7 Jan 2002, Matthias Hanisch wrote:

> On Sat, 5 Jan 2002, Davide Libenzi wrote:
>
> > There should be some part of the kernel that assume a certain scheduler
> > behavior. There was a guy that reported a bad hdparm performance and i
> > tried it. By running hdparm -t my system has a context switch of 20-30
> > and an irq load of about 100-110.
>
> This guy was me, IMHO (just with my office email address :).
>
>
> > The scheduler itself, even if you code it in visual basic, cannot make
> > this with such loads.
> > Did you try to profile the kernel ?
>
> To answer your question, I wanted to profile 2.5.2-pre8 against
> 2.5.2-pre8-old-scheduler. _Fortunately_ I made some mistake and forgot to
> back out the following chunk of memory.
>
> --- v2.5.1/linux/arch/i386/kernel/process.c Thu Oct 4 18:42:54 2001
> +++ linux/arch/i386/kernel/process.c Thu Dec 27 08:21:28 2001
> @@ -125,7 +125,6 @@
> /* endless idle loop with no priority at all */
> init_idle();
> current->nice = 20;
> - current->counter = -100;

In sched.c::init_idle() :

current->dyn_prio = -100;

Let me know.




- Davide


2002-01-07 18:31:38

by Davide Libenzi

[permalink] [raw]
Subject: Re: 2.5.2-pre performance degradation on an old 486

On Mon, 7 Jan 2002, Mikael Pettersson wrote:

> On Mon, 7 Jan 2002 08:43:04 -0800 (PST), Linus Torvalds wrote:
> >Hey, that would do it. It looks like the idle task ends up being a
> >_normal_ process (just nice'd down), so it will get real CPU time instead
> >of only getting scheduled when nothing else is runnable.
> >
> >Davide, I think the bounce-buffer is a red herring, it's simply that we're
> >wasting time in idle..
>
> This does seem to be the case. As a quick hack I added
>
> if (p == &init_task) return -50;
>
> at the start of kernel/sched.c:goodness() [to approximate the old
> scheduler's behaviour], and this immediately restored performance
> on my 486 to the old scheduler's levels.

I'll post a patch to Linus in 20 minutes otherwise Linus simply

sched.c::init_idle()

current->dyn_prio = -100;




- Davide


2002-01-07 21:51:22

by Matthias Hanisch

[permalink] [raw]
Subject: Re: 2.5.2-pre performance degradation on an old 486

On Mon, 7 Jan 2002, Davide Libenzi wrote:

> In sched.c::init_idle() :
>
> current->dyn_prio = -100;
>
> Let me know.

Aehm. I already added the same line at the beginning of cpu_idle() in
arch/i386/process.c, which brought back the old performance. Your patch
should be analogous, but cleaner.

So: Bingo!!!!

I just wonder, why only two people with slow machines saw this behavior...

Now 2.5.2 can come :)

Regards,
Matze


2002-01-07 22:13:02

by Davide Libenzi

[permalink] [raw]
Subject: Re: 2.5.2-pre performance degradation on an old 486

On Mon, 7 Jan 2002, Matthias Hanisch wrote:

> On Mon, 7 Jan 2002, Davide Libenzi wrote:
>
> > In sched.c::init_idle() :
> >
> > current->dyn_prio = -100;
> >
> > Let me know.
>
> Aehm. I already added the same line at the beginning of cpu_idle() in
> arch/i386/process.c, which brought back the old performance. Your patch
> should be analogous, but cleaner.
>
> So: Bingo!!!!
>
> I just wonder, why only two people with slow machines saw this behavior...
>
> Now 2.5.2 can come :)

The problem is that slow machines shows different dyn_prio distribution.
What happened was that if a process with dyn_prio == was wake up while the
idle was running, preemption_goodness() failed to kick out the idle ( with
dyn_prio == 0 ) because of the strict > 0




- Davide