2002-09-15 10:55:25

by Con Kolivas

[permalink] [raw]
Subject: Revealing benchmarks and new version of contest.



After my first incarnation of the responsiveness benchmark (contest), Rik helped
me get the memory load working for 2.5.x testing. Now Andrew Morton has helped
me improve the IO load. The previous IO load was "nice" to VM systems. Now for
IO load there are two separate tests. First it continually rewrites a file half
the size of the physical memory on the machine. Secondly it rewrites a file the
same size as the physical memory. Below are the new benchmarks with these loads:

Noload:
Kernel Time CPU
2.4.19-ck7 1:07.74 99%
2.4.19 1:14.00 99%
2.5.34(-mm4) 1:09.61 99%
2.4.19-ck7-rmap 1:08.50 99%
2.4.20-pre7 1:08.00 99%
2.5.34 1:09.70 99%

CPU Load:
Kernel Time CPU
2.4.19-ck7 1:10.39 93%
2.4.19 1:27.94 80%
2.5.34(-mm4) 1:11.42 94%
2.4.19-ck7-rmap 1:11.32 92%
2.4.20-pre7 1:21.91 80%
2.5.34 1:11.46 94%

Mem Load:
Kernel Time CPU
2.4.19-ck7 1:11.10 93%
2.4.19 1:33.69 77%
2.5.34(-mm4) 1:24.03 83%
2.4.19-ck7-rmap 1:35.30 71%
2.4.20-pre7 1:26.39 78%
2.5.34 1:25.54 81%

IO Load Half:
Kernel Time CPU
2.4.19-ck7 1:26.22 78%
2.4.19 2:16.66 56%
2.5.34(-mm4) 4:30.70 28%
2.4.19-ck7-rmap 1:22.90 84%
2.4.20-pre7 2:25.78 48%
2.5.34 1:23.67 82%

IO Load Full:
Kernel Time CPU
2.4.19-ck7 2:34.04 43%
2.4.19 3:14.52 40%
2.5.34(-mm4) 14:59.79 8%
2.4.19-ck7-rmap 1:32.34 74%
2.4.20-pre7 3:37.75 32%
2.5.34 1:49.62 63%

A quick reminder. Faster times are better, and higher cpu% is also better.

As you can see there are stark differences in these kernels, particularly the
mm4 changes. This time the -rmap VM shows us significant improvement under very
heavy IO load. Repeat tests show similar results.

The updated version of contest (v0.20) can be downloaded from my site:
http://kernel.kolivas.net (look under the FAQ).

Comments (and please cc me)?
Con.

P.S. !


2002-09-15 17:01:37

by Paolo Ciarrocchi

[permalink] [raw]
Subject: Re: Revealing benchmarks and new version of contest.

From: Con Kolivas <[email protected]>
[...]
> Below are the new benchmarks with these loads:
Con,
I have different results:

_NOLOAD_
Kernel Time CPU
2.4.19 2:04.34 99%
2.4.19-ck7 2:03.70 99%
2.4.19-0.24pre4 2:03.81 99%
2.5.34 2:07.24 99%

_CPULOAD_
Kernel Time CPU
2.4.19 2:27.98 81%
2.4.19-ck7 2:19.14 87%
2.4.19-0.24pre4 2:27.56 81%
2.5.34 2:22.09 88%

_MEMLOAD_
Kernel Time CPU
2.4.19 2:50.46 74%
2.4.19-ck7 2:34.80 80%
2.4.19-0.24pre4 2:59.07 77%
2.5.34 3:11.77 67%

_IOLOADHALF_ (compressed cache kerenel is the winner)
Kernel Time CPU
2.4.19 6:12.45 33%
2.4.19-ck7 9:35.92 21%
2.4.19-0.24pre4 3:55.21 53%
2.5.34 8:08.52 26%

_IOLOADFULL_
(Compressed Cache Kernel is the winner)
(I stopped 2.5.34 after 2 hours!!!, hard reboot needed)
Kernel Time CPU
2.4.19 6:45.87 31%
2.4.19-ck7 16:45.95 12%
2.4.19-0.24pre4 3:16.63 63%

2.5.34 is preemption ON
HW is a HP omnibook6000, 256 MiB RAM, PIII@800

Ciao,
Paolo

--
Get your free email from http://www.linuxmail.org


Powered by Outblaze

2002-09-15 20:30:47

by Andrew Morton

[permalink] [raw]
Subject: Re: Revealing benchmarks and new version of contest.

Con Kolivas wrote:
>
> ...
> IO Load Full:
> Kernel Time CPU
> 2.4.19-ck7 2:34.04 43%
> 2.4.19 3:14.52 40%
> 2.5.34(-mm4) 14:59.79 8%
> 2.4.19-ck7-rmap 1:32.34 74%
> 2.4.20-pre7 3:37.75 32%
> 2.5.34 1:49.62 63%
>

OK, I can reproduce this. It's the elevator problem.

$ dd if=/dev/zero of=foo bs=1M count=8000 &
$ sleep 10
$ time cat kernel/*.c > /dev/null
cat kernel/*.c > /dev/null 0.01s user 0.51s system 0% cpu 1:41.76 total

Nearly two minutes to read three megabytes.

procs memory swap io system cpu
r b w swpd free buff cache si so bi bo in cs us sy id
1 1 3 6172 3396 464 164988 0 0 12 30156 1163 1411 2 96 2
0 2 1 6172 2832 472 165596 0 0 4 27048 1146 352 0 100 0
1 1 2 6172 2496 492 165928 0 0 0 34144 1161 984 0 100 0
1 1 2 6184 2448 500 166348 0 0 12 26076 1287 1318 2 98 0
0 2 2 6184 2620 480 166132 0 0 0 42900 1118 402 0 100 0
0 2 2 6184 3560 492 165256 0 0 4 15124 1100 187 8 92 0
0 2 2 6184 2424 524 166380 0 0 4 31988 1082 379 0 100 0
0 2 3 6184 2460 544 166324 0 0 0 20152 1082 276 0 100 0
1 1 4 6184 2468 400 166312 0 0 0 31440 1135 473 3 97 0
0 2 3 6188 2804 420 166272 0 0 8 35204 1093 422 3 97 0
0 2 2 6188 2580 428 166548 0 0 4 18860 1104 191 0 90 10
0 2 3 6228 3348 468 165892 0 0 0 40232 1109 542 0 100 0
1 1 4 6228 2436 496 166800 0 0 0 18788 1097 355 0 100 0
0 2 4 6228 3168 520 166028 0 0 8 29764 1093 376 3 97 0
0 2 4 6228 2860 536 166324 0 0 0 18748 1068 303 0 100 0
0 2 1 6228 2764 548 166396 0 0 4 33124 1148 259 7 93 0

No read activity.

(gdb) info threads
88 Thread 21769 io_schedule () at /usr/src/25/include/asm/atomic.h:122
87 Thread 21768 schedule_timeout (timeout=-937779520) at timer.c:866
86 Thread 21767 schedule_timeout (timeout=-1032683840) at timer.c:866

(gdb) thread 88
[Switching to thread 88 (Thread 21769)]#0 io_schedule () at /usr/src/25/include/asm/atomic.h:122
122 __asm__ __volatile__(

(gdb) comm25
$7 = "cat\000\000og\000\000\000\000\000\000\000\000"

(gdb) bt
#0 io_schedule () at /usr/src/25/include/asm/atomic.h:122
#1 0xc01305b4 in __lock_page (page=0xc1120780) at filemap.c:370
#2 0xc0130a7f in do_generic_file_read (filp=0xc8db33c0, ppos=0xc8db33e0, desc=0xc07c3ecc, actor=0xc0130c10 <file_read_actor>)
at /usr/src/25/include/linux/pagemap.h:86
#3 0xc0130f7f in __generic_file_aio_read (iocb=0xc07c3f04, iov=0xc07c3efc, nr_segs=1, ppos=0xc8db33e0) at filemap.c:867
#4 0xc0131032 in generic_file_read (filp=0xc8db33c0, buf=0x804dcc0 "sync", count=4096, ppos=0xc8db33e0) at filemap.c:895
#5 0xc0143040 in vfs_read (file=0xc8db33c0, buf=0x804dcc0 "sync", count=4096, pos=0xc8db33e0) at read_write.c:193
#6 0xc01431ee in sys_read (fd=3, buf=0x804dcc0 "sync", count=4096) at read_write.c:232
#7 0xc01090e3 in syscall_call () at stats.c:204

`cat' has issued a read and is waiting for IO to complete.




The disk elevator is supposed to stop servicing the streaming write
at some point and give the disk head to the reads, but that isn't
working.

The same happens with reads-versus-reads in some situations. It's
complex, and I've been putting it off. Jens has a new IO scheduler
in the works; and perhaps time spent investigating this would be wasted.

Why is it worse with -mm4? With other kernels, `dd' ends up waiting
on one of its own locked pages and it goes for a big snooze, allowing the
queue to empty. -mm4 won't do that. It will keep the queue filled all
the time.

This is all happening because you're running io_fullmem against the
same disk as the one on which the kernel build is being performed.
That is a valid and interesting test, but it's more an IO scheduler
test than an VM test. I'd suggest that you also test when the
heavy write is against a different disk.

I'll go see if Jens' deadline-iosched-5 patch fixes it.

2002-09-15 21:28:17

by Andrew Morton

[permalink] [raw]
Subject: Re: Revealing benchmarks and new version of contest.

Andrew Morton wrote:
>
> ..
> I'll go see if Jens' deadline-iosched-5 patch fixes it.

Can't tell. It triggers the "IDE-fails-to-deliver-IO-completion"
lockup which has been lurking around the tree for a couple of months.

#0 io_schedule () at /usr/src/25/include/asm/atomic.h:122
#1 0xc01305b4 in __lock_page (page=0xc10b6df0) at filemap.c:370
#2 0xc013164e in read_cache_page (mapping=0xc3e0e244, index=0, filler=0xc01495e0 <blkdev_readpage>, data=0x0)
at /usr/src/25/include/linux/pagemap.h:86
#3 0xc016c352 in read_dev_sector (bdev=0xc3cf7f60, n=0, p=0xc3fcbec4) at check.c:447
#4 0xc016c764 in msdos_partition (state=0xc3cf6000, bdev=0xc3cf7f60) at msdos.c:397
#5 0xc016c016 in check_partition (hd=0xc3da6800, bdev=0xc3cf7f60) at check.c:241
#6 0xc016c134 in register_disk (disk=0xc3da6800, dev={value = 5632}, minors=64, ops=0xc0340cf0, size=120064896) at check.c:381
#7 0xc0221bb0 in idedisk_attach (drive=0xc03b0230) at ide-disk.c:1710
#8 0xc021df60 in ata_attach (drive=0xc03b0230) at ide.c:2449
#9 0xc021ed15 in ide_register_driver (driver=0xc0340de0) at ide.c:3427
#10 0xc0221bd1 in idedisk_init () at ide-disk.c:1725
#11 0xc034c8d4 in do_initcalls () at main.c:483
#12 0xc034c903 in do_basic_setup () at main.c:515

No quick fix here; this is all going to take some time to
work through :(

2002-09-15 22:31:57

by Con Kolivas

[permalink] [raw]
Subject: Re: Revealing benchmarks and new version of contest.

Quoting Paolo Ciarrocchi <[email protected]>:

> From: Con Kolivas <[email protected]>
> [...]
> > Below are the new benchmarks with these loads:
> Con,
> I have different results:
>

> 2.5.34 is preemption ON
> HW is a HP omnibook6000, 256 MiB RAM, PIII@800

It is clear that different hardware and hardware settings will show up different
areas of weakness. Furthermore, different file systems will affect the load very
differently. This test is just in it's infancy and has already undergone five
revisions in the two days it has existed. It will be interesting to see these
hardware effects, and how the different subsystems affect different areas.

My test machine is also a laptop, 256Mb Ram PIII@1133 and it runs reiserFS

The test is now at version 0.23 with a different mem_load so may show up more
mem differences. V 0.30 will be a code cleanup mainly by Rik.

It now has a homepage:
http://contest.kolivas.net

Con.