2002-10-05 05:56:45

by Con Kolivas

[permalink] [raw]
Subject: [BENCHMARK] contest 0.50 results to date

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Here are the updated contest (http://contest.kolivas.net) benchmarks with
version 0.50

noload:
Kernel [runs] Time CPU% Loads LCPU% Ratio
2.4.19 [3] 67.7 98 0 0 1.01
2.4.19-cc [3] 67.9 97 0 0 1.01
2.5.38 [3] 72.0 93 0 0 1.07
2.5.38-mm3 [2] 71.8 93 0 0 1.07
2.5.39 [2] 72.2 93 0 0 1.07
2.5.39-mm1 [2] 72.3 93 0 0 1.08
2.5.40 [1] 72.5 93 0 0 1.08
2.5.40-mm1 [1] 72.9 93 0 0 1.09

process_load:
Kernel [runs] Time CPU% Loads LCPU% Ratio
2.4.19 [3] 106.5 59 112 43 1.59
2.4.19-cc [3] 105.0 59 110 42 1.56
2.5.38 [3] 89.5 74 34 28 1.33
2.5.38-mm3 [1] 86.0 78 29 25 1.28
2.5.39 [2] 91.2 73 36 28 1.36
2.5.39-mm1 [2] 92.0 73 37 29 1.37
2.5.40 [2] 82.8 80 25 23 1.23
2.5.40-mm1 [2] 86.9 77 30 25 1.29

io_load:
Kernel [runs] Time CPU% Loads LCPU% Ratio
2.4.19 [3] 492.6 14 38 10 7.33
2.4.19-cc [3] 156.0 48 12 10 2.32
2.5.38 [1] 4000.0 1 500 1 59.55
2.5.38-mm3 [1] 303.5 25 23 11 4.52
2.5.39 [2] 423.9 18 30 11 6.31
2.5.39-mm1 [2] 550.7 14 44 12 8.20
2.5.40 [1] 315.7 25 22 10 4.70
2.5.40-mm1 [1] 326.2 24 23 11 4.86

mem_load:
Kernel [runs] Time CPU% Loads LCPU% Ratio
2.4.19 [3] 100.0 72 33 3 1.49
2.4.19-cc [3] 92.7 76 146 21 1.38
2.5.38 [3] 107.3 70 34 3 1.60
2.5.38-mm3 [1] 100.3 72 27 2 1.49
2.5.39 [2] 103.1 72 31 3 1.53
2.5.39-mm1 [2] 103.3 72 32 3 1.54
2.5.40 [2] 102.5 72 31 3 1.53
2.5.40-mm1 [2] 107.7 68 29 2 1.60

Note the io_load value for 2.5.38 was an estimate as every time I tried to run
it it took too long and I stopped it (the longest I waited was 4000 seconds);
showing very clearly the write starves read problem.

Of most interest is the performance of 2.4.19 with the latest version of
compressed cache under mem_load (2.4.19-cc). Note that although the
performance is only slightly better timewise, the difference in actual work
done by the background load during that time is _enormous_. This demonstrates
most clearly the limitations in previous versions of contest.

Comments?
Con
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.7 (GNU/Linux)

iD8DBQE9nn+8F6dfvkL3i1gRApHxAJ9CANpp1CA+chu+DxEghiNXgP0VjwCfdHsm
qf7yp7W6sBOnkNx/cmTLPQY=
=7oEd
-----END PGP SIGNATURE-----


2002-10-05 18:24:27

by Paolo Ciarrocchi

[permalink] [raw]
Subject: Re: [BENCHMARK] contest 0.50 results to date

And here are my results:
noload:
Kernel [runs] Time CPU% Loads LCPU% Ratio
2.4.19 [3] 128.8 97 0 0 1.01
2.4.19-0.24pre4 [3] 127.4 98 0 0 0.99
2.5.40 [3] 134.4 96 0 0 1.05
2.5.40-nopree [3] 133.7 96 0 0 1.04

process_load:
Kernel [runs] Time CPU% Loads LCPU% Ratio
2.4.19 [3] 194.1 60 134 40 1.52
2.4.19-0.24pre4 [3] 193.2 60 133 40 1.51
2.5.40 [3] 184.5 70 53 31 1.44
2.5.40-nopree [3] 286.4 45 163 55 2.24

io_load:
Kernel [runs] Time CPU% Loads LCPU% Ratio
2.4.19 [3] 461.0 28 46 8 3.60
2.4.19-0.24pre4 [3] 235.4 55 26 10 1.84
2.5.40 [3] 293.6 45 25 8 2.29
2.5.40-nopree [3] 269.4 50 20 7 2.10

mem_load:
Kernel [runs] Time CPU% Loads LCPU% Ratio
2.4.19 [3] 161.1 80 38 2 1.26
2.4.19-0.24pre4 [3] 181.2 76 253 19 1.41
2.5.40 [3] 163.0 80 34 2 1.27
2.5.40-nopree [3] 161.7 80 34 2 1.26

Comments ?

Paolo
--
Get your free email from http://www.linuxmail.org


Powered by Outblaze

2002-10-05 19:10:07

by Andrew Morton

[permalink] [raw]
Subject: Re: [BENCHMARK] contest 0.50 results to date

Paolo Ciarrocchi wrote:
>
> And here are my results:
> noload:
> Kernel [runs] Time CPU% Loads LCPU% Ratio
> 2.4.19 [3] 128.8 97 0 0 1.01
> 2.4.19-0.24pre4 [3] 127.4 98 0 0 0.99
> 2.5.40 [3] 134.4 96 0 0 1.05
> 2.5.40-nopree [3] 133.7 96 0 0 1.04
>
> process_load:
> Kernel [runs] Time CPU% Loads LCPU% Ratio
> 2.4.19 [3] 194.1 60 134 40 1.52
> 2.4.19-0.24pre4 [3] 193.2 60 133 40 1.51
> 2.5.40 [3] 184.5 70 53 31 1.44
> 2.5.40-nopree [3] 286.4 45 163 55 2.24
>
> io_load:
> Kernel [runs] Time CPU% Loads LCPU% Ratio
> 2.4.19 [3] 461.0 28 46 8 3.60
> 2.4.19-0.24pre4 [3] 235.4 55 26 10 1.84
> 2.5.40 [3] 293.6 45 25 8 2.29
> 2.5.40-nopree [3] 269.4 50 20 7 2.10
>
> mem_load:
> Kernel [runs] Time CPU% Loads LCPU% Ratio
> 2.4.19 [3] 161.1 80 38 2 1.26
> 2.4.19-0.24pre4 [3] 181.2 76 253 19 1.41
> 2.5.40 [3] 163.0 80 34 2 1.27
> 2.5.40-nopree [3] 161.7 80 34 2 1.26
>

I think I'm going to have to be reminded what "Loads" and "LCPU"
mean, please.

For these sorts of tests, I think system-wide CPU% is an interesting
thing to track - both user and system. If it is high then we're doing
well - doing real work.

The same isn't necessarily true of the compressed-cache kernel, because
it's doing extra work in-kernel, so CPU load comparisons there need
to be made with some caution.

Apart from observing overall CPU occupancy, we also need to monitor
fairness - one way of doing that is to measure the total kernel build
elapsed time. Another way would be to observe how much actual progress
the streaming IO makes during the kernel build.

What is "2.4.19-0.24pre4"?

I'd suggest that more tests be added. Perhaps

- one competing streaming read

- several competing streaming reads

- competing "tar cf foo ./linux"

- competing "tar xf foo"

- competing "ls -lR > /dev/null"

It would be interesting to test -aa kernels as well - Andrea's kernels
tend to be well tuned.

2002-10-05 19:23:39

by Paolo Ciarrocchi

[permalink] [raw]
Subject: Re: [BENCHMARK] contest 0.50 results to date

From: Andrew Morton <[email protected]>
> I think I'm going to have to be reminded what "Loads" and "LCPU"
> mean, please.
>From an email of Con:
The "loads" variable presented is an internal number (the absolute value is not important) and makes comparisons easier. The LCPU% is the cpu%
the load used while running.
Note if you look for example at process_load the CPU% + LCPU% can be >100 because the load runs for longer than the kernel compile.
However, this has been accounted for in the "loads" result, to take into account the variable extra duration the load runs relative to the kernel compile. "


> What is "2.4.19-0.24pre4"?
My fault ;-(
2.4.19-0.24pre4 is a compressed cache kernel.

Ciao,
Paolo
--
Get your free email from http://www.linuxmail.org


Powered by Outblaze

2002-10-05 20:50:56

by Rodrigo Souza de Castro

[permalink] [raw]
Subject: Re: [BENCHMARK] contest 0.50 results to date

On Sat, Oct 05, 2002 at 12:15:30PM -0700, Andrew Morton wrote:
> Paolo Ciarrocchi wrote:
> >
[snip]
> > mem_load:
> > Kernel [runs] Time CPU% Loads LCPU% Ratio
> > 2.4.19 [3] 161.1 80 38 2 1.26
> > 2.4.19-0.24pre4 [3] 181.2 76 253 19 1.41
> > 2.5.40 [3] 163.0 80 34 2 1.27
> > 2.5.40-nopree [3] 161.7 80 34 2 1.26
> >
>
> I think I'm going to have to be reminded what "Loads" and "LCPU"
> mean, please.
>
> For these sorts of tests, I think system-wide CPU% is an interesting
> thing to track - both user and system. If it is high then we're doing
> well - doing real work.
>
> The same isn't necessarily true of the compressed-cache kernel, because
> it's doing extra work in-kernel, so CPU load comparisons there need
> to be made with some caution.

Agreed.

I guess the scheduling is another important point here. Firstly, this
extra work, usually not substantial, may change a little the
scheduling in the system.

Secondly, given that compressed cache usually reduces the IO performed
by the system, it may change the scheduling depending on how much IO
it saves and on what the applications do. For example, mem_load
doesn't swap any page when compressed cache is enabled (those data are
highly compressible), turning out to use most of its CPU time
slice. In vanilla, mem_load is scheduled all the time to service page
faults.

--
Rodrigo


2002-10-06 01:00:40

by Con Kolivas

[permalink] [raw]
Subject: Re: [BENCHMARK] contest 0.50 results to date

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Sunday 06 Oct 2002 5:15 am, Andrew Morton wrote:
> Paolo Ciarrocchi wrote:
> > And here are my results:
> I think I'm going to have to be reminded what "Loads" and "LCPU"
> mean, please.

Loads for process_load is the number of iterations each load manages to
succeed doing divided by 10000. Loads for mem_load is the number of times
mem_load manages to access the ram divided by 1000. Loads for io_load is the
approximate number of megabytes divided by 100 it writes during the kernel
compile. The denominator for loads was chosen to easily represent the data,
and also correlates well with significant digits.

LCPU% is the load's cpu% usage while it is running. The load is started 3
seconds before the kernel compile and takes a variable length of time to
finish, so it can be more than 100-cpu%

> For these sorts of tests, I think system-wide CPU% is an interesting
> thing to track - both user and system. If it is high then we're doing
> well - doing real work.

So total cpu% being used during the kernel compile? The cpu% + lcpu% should be
very close to this. However I'm not sure of the most accurate way to find
out average total cpu% used during just the kernel compile - suggestion?

> The same isn't necessarily true of the compressed-cache kernel, because
> it's doing extra work in-kernel, so CPU load comparisons there need
> to be made with some caution.

That is clear, and also the reason I have a measure of work done by the load
as well as just the lcpu% (which by itself is not very helpful).

> Apart from observing overall CPU occupancy, we also need to monitor
> fairness - one way of doing that is to measure the total kernel build
> elapsed time. Another way would be to observe how much actual progress
> the streaming IO makes during the kernel build.

I believe that is what I'm already showing with the time for each load ==
kernel build time, and loads==io work done.

> What is "2.4.19-0.24pre4"?

Latest version of compressed cache. Note that in my testing of cc I used the
optional LZO compression.

> I'd suggest that more tests be added. Perhaps
>
> - one competing streaming read
>
> - several competing streaming reads
>
> - competing "tar cf foo ./linux"
>
> - competing "tar xf foo"
>
> - competing "ls -lR > /dev/null"
>

Sure, adding loads is easy enough. Just exactly what to add I wasn't sure
about. I'll give those a shot soon.

> It would be interesting to test -aa kernels as well - Andrea's kernels
> tend to be well tuned.

Where time permits sure.

Regards,
Con.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.7 (GNU/Linux)

iD8DBQE9n4viF6dfvkL3i1gRArn8AJ9c+jKc/CMPxV0GWaXbVJasmBNX5QCghVYX
dbvST9mdltwuwmqEk0HHXYY=
=pcOu
-----END PGP SIGNATURE-----

2002-10-06 05:35:45

by Con Kolivas

[permalink] [raw]
Subject: load additions to contest

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

I've added some load conditions to an experimental version of contest
(http://contest.kolivas.net) and here are some of the results I've obtained
so far:

First an explanation
The time column shows how long it took to conduct the kernel compile in the
presence of the load
The cpu% shows what percentage of the cpu the kernel compile managed to use
during compilation
Loads shows how many times the load managed to run while the kernel compile
was happening
Lcpu% shows the percentage cpu the load used while running
Ratio shows a ratio of kernel compilation time to the reference (2.4.19)

Use a fixed width font to see the tables correctly.

tarc_load:
Kernel [runs] Time CPU% Loads LCPU% Ratio
2.4.19 [2] 88.0 74 50 25 1.31
2.4.19-cc [1] 86.1 78 51 26 1.28
2.5.38 [1] 91.8 74 46 22 1.37
2.5.39 [1] 94.4 71 58 27 1.41
2.5.40 [1] 95.0 71 59 27 1.41
2.5.40-mm1 [1] 93.8 72 56 26 1.40

This load repeatedly creates a tar of the include directory of the linux
kernel. You can see a decrease in performance was visible at 2.5.38 without a
concomitant increase in loads, but this improved by 2.5.39.


tarx_load:
Kernel [runs] Time CPU% Loads LCPU% Ratio
2.4.19 [2] 87.6 74 13 24 1.30
2.4.19-cc [1] 81.5 80 12 24 1.21
2.5.38 [1] 296.5 23 54 28 4.41
2.5.39 [1] 108.2 64 9 12 1.61
2.5.40 [1] 107.0 64 8 11 1.59
2.5.40-mm1 [1] 120.5 58 12 16 1.79

This load repeatedly extracts a tar of the include directory of the linux
kernel. A performance boost is noted by the compressed cache kernel
consistent with this data being cached better (less IO). 2.5.38 shows very
heavy writing and a performance penalty with that. All the 2.5 kernels show
worse performance than the 2.4 kernels as the time taken to compile the
kernel is longer even though the amount of work done by the load has
decreased.


read_load:
Kernel [runs] Time CPU% Loads LCPU% Ratio
2.4.19 [2] 134.1 54 14 5 2.00
2.4.19-cc [2] 92.5 72 22 20 1.38
2.5.38 [2] 100.5 76 9 5 1.50
2.5.39 [2] 101.3 74 14 6 1.51
2.5.40 [1] 101.5 73 13 5 1.51
2.5.40-mm1 [1] 104.5 74 9 5 1.56

This load repeatedly copies a file the size of the physical memory to
/dev/null. Compressed caching shows the performance boost of caching more of
this data in physical ram - caveat is that this data would be simple to
compress so the advantage is overstated. The 2.5 kernels show equivalent
performance at 2.5.38 (time down at the expense of load down) but have better
performance at 2.5.39-40 (time down with equivalent load being performed).
2.5.40-mm1 seems to exhibit the same performance as 2.5.38.


lslr_load:
Kernel [runs] Time CPU% Loads LCPU% Ratio
2.4.19 [2] 83.1 77 34 24 1.24
2.4.19-cc [1] 82.8 79 34 24 1.23
2.5.38 [1] 74.8 89 16 13 1.11
2.5.39 [1] 76.7 88 18 14 1.14
2.5.40 [1] 74.9 89 15 12 1.12
2.5.40-mm1 [1] 76.0 89 15 12 1.13

This load repeatedly does a `ls -lR >/dev/null`. The performance seems to be
overall similar, with the bias towards the kernel compilation being performed
sooner.

These were very interesting loads to conduct as suggested by AKPM and
depending on the feedback I get I will probably incorporate them into
contest.

Comments?
Con
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.7 (GNU/Linux)

iD8DBQE9n8xCF6dfvkL3i1gRAqQMAJwJ1lgYI0ebW1yw7frZt7lncYBFVQCeIsYN
NNgrrWyrqTWGLO11IlxtyPs=
=Ldnh
-----END PGP SIGNATURE-----

2002-10-06 06:05:53

by Andrew Morton

[permalink] [raw]
Subject: Re: load additions to contest

Con Kolivas wrote:
>
> ...
>
> tarc_load:
> Kernel [runs] Time CPU% Loads LCPU% Ratio
> 2.4.19 [2] 88.0 74 50 25 1.31
> 2.4.19-cc [1] 86.1 78 51 26 1.28
> 2.5.38 [1] 91.8 74 46 22 1.37
> 2.5.39 [1] 94.4 71 58 27 1.41
> 2.5.40 [1] 95.0 71 59 27 1.41
> 2.5.40-mm1 [1] 93.8 72 56 26 1.40
>
> This load repeatedly creates a tar of the include directory of the linux
> kernel. You can see a decrease in performance was visible at 2.5.38 without a
> concomitant increase in loads, but this improved by 2.5.39.

Well the kernel compile took 7% longer, but the tar got 10% more
work done. I expect this is a CPU scheduler artifact. The scheduler
has changed so much, it's hard to draw any conclusions.

Everything there will be in cache. I'd suggest that you increase the
size of the tarball a *lot*, so the two activities are competing for
disk.

> tarx_load:
> Kernel [runs] Time CPU% Loads LCPU% Ratio
> 2.4.19 [2] 87.6 74 13 24 1.30
> 2.4.19-cc [1] 81.5 80 12 24 1.21
> 2.5.38 [1] 296.5 23 54 28 4.41
> 2.5.39 [1] 108.2 64 9 12 1.61
> 2.5.40 [1] 107.0 64 8 11 1.59
> 2.5.40-mm1 [1] 120.5 58 12 16 1.79
>
> This load repeatedly extracts a tar of the include directory of the linux
> kernel. A performance boost is noted by the compressed cache kernel
> consistent with this data being cached better (less IO). 2.5.38 shows very
> heavy writing and a performance penalty with that. All the 2.5 kernels show
> worse performance than the 2.4 kernels as the time taken to compile the
> kernel is longer even though the amount of work done by the load has
> decreased.

hm, that's interesting. I assume the tar file is being extracted
into the same place each time? Is tar overwriting the old version,
or are you unlinking the destination first?

It would be most interesting to rename the untarred tree, so nothing
is getting deleted.

Which filesystem are you using here?

> read_load:
> Kernel [runs] Time CPU% Loads LCPU% Ratio
> 2.4.19 [2] 134.1 54 14 5 2.00
> 2.4.19-cc [2] 92.5 72 22 20 1.38
> 2.5.38 [2] 100.5 76 9 5 1.50
> 2.5.39 [2] 101.3 74 14 6 1.51
> 2.5.40 [1] 101.5 73 13 5 1.51
> 2.5.40-mm1 [1] 104.5 74 9 5 1.56
>
> This load repeatedly copies a file the size of the physical memory to
> /dev/null. Compressed caching shows the performance boost of caching more of
> this data in physical ram - caveat is that this data would be simple to
> compress so the advantage is overstated. The 2.5 kernels show equivalent
> performance at 2.5.38 (time down at the expense of load down) but have better
> performance at 2.5.39-40 (time down with equivalent load being performed).
> 2.5.40-mm1 seems to exhibit the same performance as 2.5.38.

That's complex. I expect there's a lot of eviction of executable
text happening here. I'm working on tuning that up a bit.

> lslr_load:
> Kernel [runs] Time CPU% Loads LCPU% Ratio
> 2.4.19 [2] 83.1 77 34 24 1.24
> 2.4.19-cc [1] 82.8 79 34 24 1.23
> 2.5.38 [1] 74.8 89 16 13 1.11
> 2.5.39 [1] 76.7 88 18 14 1.14
> 2.5.40 [1] 74.9 89 15 12 1.12
> 2.5.40-mm1 [1] 76.0 89 15 12 1.13
>
> This load repeatedly does a `ls -lR >/dev/null`. The performance seems to be
> overall similar, with the bias towards the kernel compilation being performed
> sooner.

How many files were under the `ls -lR'? I'd suggest "zillions", so
we get heavily into slab reclaim, and lots of inode and directory
cache thrashing and seeking...

2002-10-06 06:53:51

by Con Kolivas

[permalink] [raw]
Subject: Re: load additions to contest

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Sunday 06 Oct 2002 4:11 pm, Andrew Morton wrote:
> Con Kolivas wrote:
> > ...
> >
> > tarc_load:
> > Kernel [runs] Time CPU% Loads LCPU% Ratio
> > 2.4.19 [2] 88.0 74 50 25 1.31
> > 2.4.19-cc [1] 86.1 78 51 26 1.28
> > 2.5.38 [1] 91.8 74 46 22 1.37
> > 2.5.39 [1] 94.4 71 58 27 1.41
> > 2.5.40 [1] 95.0 71 59 27 1.41
> > 2.5.40-mm1 [1] 93.8 72 56 26 1.40
> >
> > This load repeatedly creates a tar of the include directory of the linux
> > kernel. You can see a decrease in performance was visible at 2.5.38
> > without a concomitant increase in loads, but this improved by 2.5.39.
>
> Well the kernel compile took 7% longer, but the tar got 10% more
> work done. I expect this is a CPU scheduler artifact. The scheduler
> has changed so much, it's hard to draw any conclusions.
>
> Everything there will be in cache. I'd suggest that you increase the
> size of the tarball a *lot*, so the two activities are competing for
> disk.

Ok I'll go back to the original idea of tarring the whole kernel directory. It
needs to be something constant size obviously.

>
> > tarx_load:
> > Kernel [runs] Time CPU% Loads LCPU% Ratio
> > 2.4.19 [2] 87.6 74 13 24 1.30
> > 2.4.19-cc [1] 81.5 80 12 24 1.21
> > 2.5.38 [1] 296.5 23 54 28 4.41
> > 2.5.39 [1] 108.2 64 9 12 1.61
> > 2.5.40 [1] 107.0 64 8 11 1.59
> > 2.5.40-mm1 [1] 120.5 58 12 16 1.79
> >
> > This load repeatedly extracts a tar of the include directory of the
> > linux kernel. A performance boost is noted by the compressed cache kernel
> > consistent with this data being cached better (less IO). 2.5.38 shows
> > very heavy writing and a performance penalty with that. All the 2.5
> > kernels show worse performance than the 2.4 kernels as the time taken to
> > compile the kernel is longer even though the amount of work done by the
> > load has decreased.
>
> hm, that's interesting. I assume the tar file is being extracted
> into the same place each time? Is tar overwriting the old version,
> or are you unlinking the destination first?

Into the same place and overwriting the original.

>
> It would be most interesting to rename the untarred tree, so nothing
> is getting deleted.

Ok, this is going to take up a lot of space though.

>
> Which filesystem are you using here?

ReiserFS (sorry dont have any other hardware/fs to test on)

>
> > read_load:
> > Kernel [runs] Time CPU% Loads LCPU% Ratio
> > 2.4.19 [2] 134.1 54 14 5 2.00
> > 2.4.19-cc [2] 92.5 72 22 20 1.38
> > 2.5.38 [2] 100.5 76 9 5 1.50
> > 2.5.39 [2] 101.3 74 14 6 1.51
> > 2.5.40 [1] 101.5 73 13 5 1.51
> > 2.5.40-mm1 [1] 104.5 74 9 5 1.56
> >
> > This load repeatedly copies a file the size of the physical memory to
> > /dev/null. Compressed caching shows the performance boost of caching more
> > of this data in physical ram - caveat is that this data would be simple
> > to compress so the advantage is overstated. The 2.5 kernels show
> > equivalent performance at 2.5.38 (time down at the expense of load down)
> > but have better performance at 2.5.39-40 (time down with equivalent load
> > being performed). 2.5.40-mm1 seems to exhibit the same performance as
> > 2.5.38.
>
> That's complex. I expect there's a lot of eviction of executable
> text happening here. I'm working on tuning that up a bit.
>
> > lslr_load:
> > Kernel [runs] Time CPU% Loads LCPU% Ratio
> > 2.4.19 [2] 83.1 77 34 24 1.24
> > 2.4.19-cc [1] 82.8 79 34 24 1.23
> > 2.5.38 [1] 74.8 89 16 13 1.11
> > 2.5.39 [1] 76.7 88 18 14 1.14
> > 2.5.40 [1] 74.9 89 15 12 1.12
> > 2.5.40-mm1 [1] 76.0 89 15 12 1.13
> >
> > This load repeatedly does a `ls -lR >/dev/null`. The performance seems to
> > be overall similar, with the bias towards the kernel compilation being
> > performed sooner.
>
> How many files were under the `ls -lR'? I'd suggest "zillions", so
> we get heavily into slab reclaim, and lots of inode and directory
> cache thrashing and seeking...

The ls -lR was an entire kernel tree (to remain constant between runs). I dont
think I can keep it constant and make it much bigger without creating some
sort of fake dir tree unless you can suggest a different approach. I guess
overall `ls -lr /` will not be much different in size between runs if you
think that would be satisfactory.

Con
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.7 (GNU/Linux)

iD8DBQE9n96uF6dfvkL3i1gRAundAJ9YuHm4wPDw7OEWUgb3jOXk9oludgCfeslh
3aHxF8OcN1Cm8ep8g64K/ag=
=/f7y
-----END PGP SIGNATURE-----

2002-10-06 12:06:18

by Con Kolivas

[permalink] [raw]
Subject: Re: load additions to contest

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Here are the modifications as you suggested.

On Sunday 06 Oct 2002 4:11 pm, Andrew Morton wrote:
> Con Kolivas wrote:
> > ...
> >
> > tarc_load:
> > Kernel [runs] Time CPU% Loads LCPU% Ratio
> > 2.4.19 [2] 88.0 74 50 25 1.31
> > 2.4.19-cc [1] 86.1 78 51 26 1.28
> > 2.5.38 [1] 91.8 74 46 22 1.37
> > 2.5.39 [1] 94.4 71 58 27 1.41
> > 2.5.40 [1] 95.0 71 59 27 1.41
> > 2.5.40-mm1 [1] 93.8 72 56 26 1.40
> >
> > This load repeatedly creates a tar of the include directory of the linux
> > kernel. You can see a decrease in performance was visible at 2.5.38
> > without a concomitant increase in loads, but this improved by 2.5.39.

tarc_load:
Kernel [runs] Time CPU% Loads LCPU% Ratio
2.4.19 [2] 106.5 70 1 8 1.59
2.5.38 [1] 97.2 79 1 6 1.45
2.5.39 [1] 91.8 83 1 6 1.37
2.5.40 [1] 96.9 80 1 6 1.44
2.5.40-mm1 [1] 94.4 81 1 6 1.41

This version tars the whole kernel tree. No files are overwritten this time.
The results are definitely different, but the resolution in loads makes that
value completely unhelpful.

>
> Well the kernel compile took 7% longer, but the tar got 10% more
> work done. I expect this is a CPU scheduler artifact. The scheduler
> has changed so much, it's hard to draw any conclusions.
>
> Everything there will be in cache. I'd suggest that you increase the
> size of the tarball a *lot*, so the two activities are competing for
> disk.
>
> > tarx_load:
> > Kernel [runs] Time CPU% Loads LCPU% Ratio
> > 2.4.19 [2] 87.6 74 13 24 1.30
> > 2.4.19-cc [1] 81.5 80 12 24 1.21
> > 2.5.38 [1] 296.5 23 54 28 4.41
> > 2.5.39 [1] 108.2 64 9 12 1.61
> > 2.5.40 [1] 107.0 64 8 11 1.59
> > 2.5.40-mm1 [1] 120.5 58 12 16 1.79
> >
> > This load repeatedly extracts a tar of the include directory of the
> > linux kernel. A performance boost is noted by the compressed cache kernel
> > consistent with this data being cached better (less IO). 2.5.38 shows
> > very heavy writing and a performance penalty with that. All the 2.5
> > kernels show worse performance than the 2.4 kernels as the time taken to
> > compile the kernel is longer even though the amount of work done by the
> > load has decreased.

tarx_load:
Kernel [runs] Time CPU% Loads LCPU% Ratio
2.4.19 [1] 132.4 55 2 9 1.97
2.5.38 [1] 120.5 63 2 8 1.79
2.5.39 [1] 108.3 69 1 6 1.61
2.5.40 [1] 110.7 68 1 6 1.65
2.5.40-mm1 [1] 191.5 39 3 7 2.85

This version extracts a tar of the whole kernel tree. No files are overwritten
this time. Once again the results are very different, and the loads'
resolution is almost too low for comparison.

>
> hm, that's interesting. I assume the tar file is being extracted
> into the same place each time? Is tar overwriting the old version,
> or are you unlinking the destination first?
>
> It would be most interesting to rename the untarred tree, so nothing
> is getting deleted.
>
> Which filesystem are you using here?
>
> > read_load:
> > Kernel [runs] Time CPU% Loads LCPU% Ratio
> > 2.4.19 [2] 134.1 54 14 5 2.00
> > 2.4.19-cc [2] 92.5 72 22 20 1.38
> > 2.5.38 [2] 100.5 76 9 5 1.50
> > 2.5.39 [2] 101.3 74 14 6 1.51
> > 2.5.40 [1] 101.5 73 13 5 1.51
> > 2.5.40-mm1 [1] 104.5 74 9 5 1.56
> >
> > This load repeatedly copies a file the size of the physical memory to
> > /dev/null. Compressed caching shows the performance boost of caching more
> > of this data in physical ram - caveat is that this data would be simple
> > to compress so the advantage is overstated. The 2.5 kernels show
> > equivalent performance at 2.5.38 (time down at the expense of load down)
> > but have better performance at 2.5.39-40 (time down with equivalent load
> > being performed). 2.5.40-mm1 seems to exhibit the same performance as
> > 2.5.38.
>
> That's complex. I expect there's a lot of eviction of executable
> text happening here. I'm working on tuning that up a bit.
>
> > lslr_load:
> > Kernel [runs] Time CPU% Loads LCPU% Ratio
> > 2.4.19 [2] 83.1 77 34 24 1.24
> > 2.4.19-cc [1] 82.8 79 34 24 1.23
> > 2.5.38 [1] 74.8 89 16 13 1.11
> > 2.5.39 [1] 76.7 88 18 14 1.14
> > 2.5.40 [1] 74.9 89 15 12 1.12
> > 2.5.40-mm1 [1] 76.0 89 15 12 1.13
> >
> > This load repeatedly does a `ls -lR >/dev/null`. The performance seems to
> > be overall similar, with the bias towards the kernel compilation being
> > performed sooner.

lslr_load:
Kernel [runs] Time CPU% Loads LCPU% Ratio
2.4.19 [1] 89.8 77 1 20 1.34
2.5.38 [1] 99.1 71 1 20 1.48
2.5.39 [1] 101.3 70 2 24 1.51
2.5.40 [1] 97.0 72 1 21 1.44
2.5.40-mm1 [1] 96.6 73 1 22 1.44

This version does `ls -lR /`. Note the balance is swayed toward taking longer
to compile the kernel here, and loads' resolution is lost.

>
> How many files were under the `ls -lR'? I'd suggest "zillions", so
> we get heavily into slab reclaim, and lots of inode and directory
> cache thrashing and seeking...

I hope this is helpful in some way. In this guise it would be difficult to
release these as a standard part of contest (fast hardware could
theoretically use up all the disk space with this). I'm more than happy to
conduct personalised tests for linux kernel development as needed though.

Con
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.7 (GNU/Linux)

iD8DBQE9oCd9F6dfvkL3i1gRAgBsAKCasfXxk9USmY70/R76+BD9AJOlRwCeM1mm
kKalOKo5gZF6igzXTdNVXeY=
=d/0R
-----END PGP SIGNATURE-----