2001-12-01 19:17:52

by Jason Holmes

[permalink] [raw]
Subject: IO degradation in 2.4.17-pre2 vs. 2.4.16

I saw in a previous thread that the interactivity improvements in
2.4.17-pre2 had some adverse effect on IO throughput and since I was
already evaluating 2.4.16 for a somewhat large fileserving project, I
threw 2.4.17-pre2 on to see what has changed. Throughput while serving
a large number of clients is important to me, so my tests have included
using dbench to try to see how things scale as clients increase.

2.4.16:

Throughput 116.098 MB/sec (NB=145.123 MB/sec 1160.98 MBit/sec) 1 procs
Throughput 206.604 MB/sec (NB=258.255 MB/sec 2066.04 MBit/sec) 2 procs
Throughput 210.364 MB/sec (NB=262.955 MB/sec 2103.64 MBit/sec) 4 procs
Throughput 213.397 MB/sec (NB=266.747 MB/sec 2133.97 MBit/sec) 8 procs
Throughput 210.989 MB/sec (NB=263.736 MB/sec 2109.89 MBit/sec) 16
procs
Throughput 138.713 MB/sec (NB=173.391 MB/sec 1387.13 MBit/sec) 32
procs
Throughput 117.729 MB/sec (NB=147.162 MB/sec 1177.29 MBit/sec) 64
procs
Throughput 66.7354 MB/sec (NB=83.4193 MB/sec 667.354 MBit/sec) 128
procs

2.4.17-pre2:

Throughput 96.2302 MB/sec (NB=120.288 MB/sec 962.302 MBit/sec) 1 procs
Throughput 226.679 MB/sec (NB=283.349 MB/sec 2266.79 MBit/sec) 2 procs
Throughput 223.955 MB/sec (NB=279.944 MB/sec 2239.55 MBit/sec) 4 procs
Throughput 224.533 MB/sec (NB=280.666 MB/sec 2245.33 MBit/sec) 8 procs
Throughput 153.672 MB/sec (NB=192.09 MB/sec 1536.72 MBit/sec) 16 procs
Throughput 91.3464 MB/sec (NB=114.183 MB/sec 913.464 MBit/sec) 32
procs
Throughput 64.876 MB/sec (NB=81.095 MB/sec 648.76 MBit/sec) 64 procs
Throughput 54.9774 MB/sec (NB=68.7217 MB/sec 549.774 MBit/sec) 128
procs

Throughput 136.522 MB/sec (NB=170.652 MB/sec 1365.22 MBit/sec) 1 procs
Throughput 223.682 MB/sec (NB=279.603 MB/sec 2236.82 MBit/sec) 2 procs
Throughput 222.806 MB/sec (NB=278.507 MB/sec 2228.06 MBit/sec) 4 procs
Throughput 224.427 MB/sec (NB=280.534 MB/sec 2244.27 MBit/sec) 8 procs
Throughput 152.286 MB/sec (NB=190.358 MB/sec 1522.86 MBit/sec) 16
procs
Throughput 92.044 MB/sec (NB=115.055 MB/sec 920.44 MBit/sec) 32 procs
Throughput 78.0881 MB/sec (NB=97.6101 MB/sec 780.881 MBit/sec) 64
procs
Throughput 66.1573 MB/sec (NB=82.6966 MB/sec 661.573 MBit/sec) 128
procs

Throughput 117.95 MB/sec (NB=147.438 MB/sec 1179.5 MBit/sec) 1 procs
Throughput 212.469 MB/sec (NB=265.586 MB/sec 2124.69 MBit/sec) 2 procs
Throughput 214.763 MB/sec (NB=268.453 MB/sec 2147.63 MBit/sec) 4 procs
Throughput 214.007 MB/sec (NB=267.509 MB/sec 2140.07 MBit/sec) 8 procs
Throughput 96.6572 MB/sec (NB=120.821 MB/sec 966.572 MBit/sec) 16
procs
Throughput 48.1342 MB/sec (NB=60.1677 MB/sec 481.342 MBit/sec) 32
procs
Throughput 71.3444 MB/sec (NB=89.1806 MB/sec 713.444 MBit/sec) 64
procs
Throughput 59.258 MB/sec (NB=74.0724 MB/sec 592.58 MBit/sec) 128 procs

I have included three runs for 2.4.17-pre2 to show how inconsistent its
results are; 2.4.16 didn't have this problem to this extent. bonnie++
numbers seem largely unchanged between kernels, coming in around:

------Sequential Output------ --Sequential Input- --Random-
-Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
2512M 14348 81 49495 26 24438 16 16040 96 55006 15 373.7 1
------Sequential Create------ --------Random Create--------
-Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
16 3087 99 +++++ +++ +++++ +++ 3175 100 +++++ +++ 11042 100

The test machine is an IBM 342 with 2 1.26 GHz P3 processors and 1.25 GB
of RAM. The above numbers were generated off of 1 10K RPM SCSI disk
hanging off of an Adaptec aix7899 controller.

--
Jason Holmes


2001-12-01 21:35:12

by Andrew Morton

[permalink] [raw]
Subject: Re: IO degradation in 2.4.17-pre2 vs. 2.4.16

Jason Holmes wrote:
>
> I saw in a previous thread that the interactivity improvements in
> 2.4.17-pre2 had some adverse effect on IO throughput and since I was
> already evaluating 2.4.16 for a somewhat large fileserving project, I
> threw 2.4.17-pre2 on to see what has changed. Throughput while serving
> a large number of clients is important to me, so my tests have included
> using dbench to try to see how things scale as clients increase.
>
> 2.4.16:
>
> ...
> Throughput 210.989 MB/sec (NB=263.736 MB/sec 2109.89 MBit/sec) 16 procs
> ...
>
> 2.4.17-pre2:
>
> ...
> Throughput 153.672 MB/sec (NB=192.09 MB/sec 1536.72 MBit/sec) 16 procs
> ...

This is expected, and tunable.

The thing about dbench is this: it creates files and then it
quickly deletes them. It is really, really important to understand
this!

If the kernel allows processes to fill all of memory with dirty
data and to *not* start IO on that data, then this really helps
dbench, because when the delete comes along, that data gets tossed
away and is never written.

If you have enough memory, an entire dbench run can be performed
and it will do no disk IO at all.

The 2.4.17-pre2 change meant that the kernel starts writeout of
dirty data earlier, and will cause the writer to block, to
prevent it from filling all memory with write() data. This is
how the kernel is actually supposed to work, but it wasn't working
right, and the mistake benefitted dbench. The net effect is that
a dbench run does a lot more IO.

If your normal operating workload creates files and does *not*
delete them within a few seconds, then the -pre2 change won't
make much difference at all, as your bonnie++ figures show.

If your normal operating workloads _does_ involve very short-lived
files then you can optimise for that load by increasing the
kernel's dirty buffer thresholds:

mnm:/home/akpm> cat /proc/sys/vm/bdflush
40 0 0 0 500 3000 60 0 0
^^ ^^
nfract nfract_sync

These two numbers are percentages.

nfract: percentage of physical memory at which a write()r will
start writeout.

nfract_sync: percentage of physical memory at which a write()r
will block on some writeout (writer throttling).

You'll find that running

echo 80 0 0 0 500 3000 90 0 0 > /proc/sys/vm/bdflush

will boost your dbench throughput muchly.

dbench is a good stability and stress tester. It is not a good
benchmark, and it is not representative of most real-world
workloads.

-

2001-12-01 22:36:37

by Jason Holmes

[permalink] [raw]
Subject: Re: IO degradation in 2.4.17-pre2 vs. 2.4.16

Andrew Morton wrote:
>
> Jason Holmes wrote:
> >
> > I saw in a previous thread that the interactivity improvements in
> > 2.4.17-pre2 had some adverse effect on IO throughput and since I was
> > already evaluating 2.4.16 for a somewhat large fileserving project, I
> > threw 2.4.17-pre2 on to see what has changed. Throughput while serving
> > a large number of clients is important to me, so my tests have included
> > using dbench to try to see how things scale as clients increase.
> >
> > 2.4.16:
> >
> > ...
> > Throughput 210.989 MB/sec (NB=263.736 MB/sec 2109.89 MBit/sec) 16 procs
> > ...
> >
> > 2.4.17-pre2:
> >
> > ...
> > Throughput 153.672 MB/sec (NB=192.09 MB/sec 1536.72 MBit/sec) 16 procs
> > ...
>
> This is expected, and tunable.
>
> The thing about dbench is this: it creates files and then it
> quickly deletes them. It is really, really important to understand
> this!
>
> If the kernel allows processes to fill all of memory with dirty
> data and to *not* start IO on that data, then this really helps
> dbench, because when the delete comes along, that data gets tossed
> away and is never written.
>
> If you have enough memory, an entire dbench run can be performed
> and it will do no disk IO at all.

Yeah, I was basically treating the lower process runs (<64) as in-memory
performance and the higher process runs as a mix (since, for example,
the 128 run deals with ~8 GB of data and I only have 1.25 GB of RAM).

> ...
>
> You'll find that running
>
> echo 80 0 0 0 500 3000 90 0 0 > /proc/sys/vm/bdflush
>
> will boost your dbench throughput muchly.

Yeah, actually, I've been sorta "brute-forcing" the bdflush and
max-readahead space (or the part of it that I chose for a start) over
the past few days for bonnie++ and dbench. The idea was to use these
quicker-running benchmarks to get an general idea of good values to use
and then zero in on the final values with longer, more real-world load.
I was thinking that bonnie++ would at least give me an idea of
sequential read/write performance for files larger than RAM (one part of
the typical workload I see is moving large files out to multiple [32-64
or so] machines at the same time) and that dbench would give me an idea
of performance for many small read/write operations, both for cached and
on-disk data (another aspect of the workload I see is reading/writing
many small files from multiple machines, such as postprocessing the
results of some large computational run). Oh, I don't think I actually
mentioned that I'm looking to tune fileservers here for medium-sized
(100-200 node) computational clusters and that in the end there will be
something much more powerful than a single SCSI disk in the backend.

FWIW, the top 10 bdflush/max-readahead combinations for dbench (sorted
by 128 processes) that I've seen so far are:

16 32 64 128
-------- -------- -------- --------
70-900-1000-90-2047 208.056 159.598 144.721 122.514
30-100-1000-50-127 113.829 101.820 110.699 120.017
70-500-1000-90-2047 209.547 150.172 142.556 115.979
30-300-1000-90-63 108.862 118.443 109.060 112.901
30-100-1000-50-63 113.904 96.648 113.969 112.021
50-700-1000-90-63 208.062 137.579 134.504 111.656
30-500-1000-50-255 111.955 97.373 115.360 111.004
30-100-1000-70-1023 115.110 99.823 122.720 110.016
70-300-1000-90-1023 220.096 169.194 160.025 109.753
70-700-1000-90-255 208.468 146.202 140.098 109.618

(with the numbers on the left being
nfract-interval-age_buffer-nfract_sync-max_readahead, the column entries
being the non-adjusted MB/s that dbench reports, and the columns being
the number of processes). Unfortunately, these are a bit bunk because I
haven't run the tests enough times to average the results to remove
variance between runs.

If you have any suggestions on better ways than dbench to somewhat
quickly simulate performance for many clients hitting a fileserver at
the same time, I'd love to hear it.

Thanks,

--
Jason Holmes

2001-12-04 00:20:44

by Jason Holmes

[permalink] [raw]
Subject: Re: IO degradation in 2.4.17-pre2 vs. 2.4.16

Sure, I wasn't protesting the patch or anything; I was just passing my
observations along. I also couldn't care less about dbench numbers for
the sake of dbench numbers; I was just using it and other simple
benchmarks as stepping stones to try to figure out what effect bdflush
and max_readahead settings actually have on the way the system
performs. After the simple benchmarks narrowed things down, I would've
run more exhaustive benchmarks, then some large MM5 runs (including
setup, takedown, post-processing into graphs, etc), enough Gaussian jobs
to create 200 GB or so of scratch files, a hundred or so BLAST jobs
against a centralized database, all or part of these at the same time,
etc, the typical stuff that I see running. If I were to start out with
the real workload it'd take years.

The thing is, everywhere I read about tweaking filesystem performance
someone has some magic number to throw into bdflush. There's never any
justification for it and it's 9 times out of 10 for a "server" system,
whatever that is. Some recommendations are for values larger than
fs/buffer.c allows, some are wacko recommending 100/100 for
nfract/nfract_sync, some want 5000 or 6000 for nfract_sync, which seems
somehow wrong for a percentage (perhaps older kernels didn't have a
percentage there or something). There are even different bdflush
numbers between 2.4.13-pre2, 2.4.17, and 2.4.17-pre1aa1. I was just
looking for a way to profile the way the different settings affect
system performance under a variety of conditions and dbench seemed like
a way to get the 'many clients / many small files' aspect of it all.
Who knows, maybe the default numbers are the best compromise or maybe
the continuing vm tweaks will make any results from a previous kernel
invalid for a current kernel or maybe the bdflush tweaking isn't really
worth it at all and I'm better off getting on with mucking about with
larger hardware and parallel filesystems. At least I learned that I
really do want a larger max_readahead number.

As for interactivity, if the changes have any effect on the number of
"NFS server blah not responding" messages I get, I'll be more than
happy.

Thanks,

--
Jason Holmes

Marcelo Tosatti wrote:
>
> Jason,
>
> Yes, throughtput-only tests will have their numbers degradated with the
> change applied on 2.4.16-pre2.
>
> The whole thing is just about tradeoffs: Interactivity vs throughtput.
>
> I'm not going to destroy interactivity for end users to get beatiful
> dbench numbers.
>
> And about your clients: Don't you think they want some kind of
> decent latency on their side?
>
> Anyway, thanks for your report!
>
> On Sat, 1 Dec 2001, Jason Holmes wrote:
>
> > I saw in a previous thread that the interactivity improvements in
> > 2.4.17-pre2 had some adverse effect on IO throughput and since I was
> > already evaluating 2.4.16 for a somewhat large fileserving project, I
> > threw 2.4.17-pre2 on to see what has changed. Throughput while serving
> > a large number of clients is important to me, so my tests have included
> > using dbench to try to see how things scale as clients increase.
> >
> > 2.4.16:
> >
> > Throughput 116.098 MB/sec (NB=145.123 MB/sec 1160.98 MBit/sec) 1 procs
> > Throughput 206.604 MB/sec (NB=258.255 MB/sec 2066.04 MBit/sec) 2 procs
> > Throughput 210.364 MB/sec (NB=262.955 MB/sec 2103.64 MBit/sec) 4 procs
> > Throughput 213.397 MB/sec (NB=266.747 MB/sec 2133.97 MBit/sec) 8 procs
> > Throughput 210.989 MB/sec (NB=263.736 MB/sec 2109.89 MBit/sec) 16
> > procs
> > Throughput 138.713 MB/sec (NB=173.391 MB/sec 1387.13 MBit/sec) 32
> > procs
> > Throughput 117.729 MB/sec (NB=147.162 MB/sec 1177.29 MBit/sec) 64
> > procs
> > Throughput 66.7354 MB/sec (NB=83.4193 MB/sec 667.354 MBit/sec) 128
> > procs
> >
> > 2.4.17-pre2:
> >
> > Throughput 96.2302 MB/sec (NB=120.288 MB/sec 962.302 MBit/sec) 1 procs
> > Throughput 226.679 MB/sec (NB=283.349 MB/sec 2266.79 MBit/sec) 2 procs
> > Throughput 223.955 MB/sec (NB=279.944 MB/sec 2239.55 MBit/sec) 4 procs
> > Throughput 224.533 MB/sec (NB=280.666 MB/sec 2245.33 MBit/sec) 8 procs
> > Throughput 153.672 MB/sec (NB=192.09 MB/sec 1536.72 MBit/sec) 16 procs
> > Throughput 91.3464 MB/sec (NB=114.183 MB/sec 913.464 MBit/sec) 32
> > procs
> > Throughput 64.876 MB/sec (NB=81.095 MB/sec 648.76 MBit/sec) 64 procs
> > Throughput 54.9774 MB/sec (NB=68.7217 MB/sec 549.774 MBit/sec) 128
> > procs
> >
> > Throughput 136.522 MB/sec (NB=170.652 MB/sec 1365.22 MBit/sec) 1 procs
> > Throughput 223.682 MB/sec (NB=279.603 MB/sec 2236.82 MBit/sec) 2 procs
> > Throughput 222.806 MB/sec (NB=278.507 MB/sec 2228.06 MBit/sec) 4 procs
> > Throughput 224.427 MB/sec (NB=280.534 MB/sec 2244.27 MBit/sec) 8 procs
> > Throughput 152.286 MB/sec (NB=190.358 MB/sec 1522.86 MBit/sec) 16
> > procs
> > Throughput 92.044 MB/sec (NB=115.055 MB/sec 920.44 MBit/sec) 32 procs
> > Throughput 78.0881 MB/sec (NB=97.6101 MB/sec 780.881 MBit/sec) 64
> > procs
> > Throughput 66.1573 MB/sec (NB=82.6966 MB/sec 661.573 MBit/sec) 128
> > procs
> >
> > Throughput 117.95 MB/sec (NB=147.438 MB/sec 1179.5 MBit/sec) 1 procs
> > Throughput 212.469 MB/sec (NB=265.586 MB/sec 2124.69 MBit/sec) 2 procs
> > Throughput 214.763 MB/sec (NB=268.453 MB/sec 2147.63 MBit/sec) 4 procs
> > Throughput 214.007 MB/sec (NB=267.509 MB/sec 2140.07 MBit/sec) 8 procs
> > Throughput 96.6572 MB/sec (NB=120.821 MB/sec 966.572 MBit/sec) 16
> > procs
> > Throughput 48.1342 MB/sec (NB=60.1677 MB/sec 481.342 MBit/sec) 32
> > procs
> > Throughput 71.3444 MB/sec (NB=89.1806 MB/sec 713.444 MBit/sec) 64
> > procs
> > Throughput 59.258 MB/sec (NB=74.0724 MB/sec 592.58 MBit/sec) 128 procs
> >
> > I have included three runs for 2.4.17-pre2 to show how inconsistent its
> > results are; 2.4.16 didn't have this problem to this extent. bonnie++
> > numbers seem largely unchanged between kernels, coming in around:
> >
> > ------Sequential Output------ --Sequential Input- --Random-
> > -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
> > Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
> > 2512M 14348 81 49495 26 24438 16 16040 96 55006 15 373.7 1
> > ------Sequential Create------ --------Random Create--------
> > -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
> > files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
> > 16 3087 99 +++++ +++ +++++ +++ 3175 100 +++++ +++ 11042 100
> >
> > The test machine is an IBM 342 with 2 1.26 GHz P3 processors and 1.25 GB
> > of RAM. The above numbers were generated off of 1 10K RPM SCSI disk
> > hanging off of an Adaptec aix7899 controller.
> >
> > --
> > Jason Holmes
> > -
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to [email protected]
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at http://www.tux.org/lkml/
> >

2001-12-04 02:55:58

by Marcelo Tosatti

[permalink] [raw]
Subject: Re: IO degradation in 2.4.17-pre2 vs. 2.4.16



Jason,

Yes, throughtput-only tests will have their numbers degradated with the
change applied on 2.4.16-pre2.

The whole thing is just about tradeoffs: Interactivity vs throughtput.

I'm not going to destroy interactivity for end users to get beatiful
dbench numbers.

And about your clients: Don't you think they want some kind of
decent latency on their side?

Anyway, thanks for your report!

On Sat, 1 Dec 2001, Jason Holmes wrote:

> I saw in a previous thread that the interactivity improvements in
> 2.4.17-pre2 had some adverse effect on IO throughput and since I was
> already evaluating 2.4.16 for a somewhat large fileserving project, I
> threw 2.4.17-pre2 on to see what has changed. Throughput while serving
> a large number of clients is important to me, so my tests have included
> using dbench to try to see how things scale as clients increase.
>
> 2.4.16:
>
> Throughput 116.098 MB/sec (NB=145.123 MB/sec 1160.98 MBit/sec) 1 procs
> Throughput 206.604 MB/sec (NB=258.255 MB/sec 2066.04 MBit/sec) 2 procs
> Throughput 210.364 MB/sec (NB=262.955 MB/sec 2103.64 MBit/sec) 4 procs
> Throughput 213.397 MB/sec (NB=266.747 MB/sec 2133.97 MBit/sec) 8 procs
> Throughput 210.989 MB/sec (NB=263.736 MB/sec 2109.89 MBit/sec) 16
> procs
> Throughput 138.713 MB/sec (NB=173.391 MB/sec 1387.13 MBit/sec) 32
> procs
> Throughput 117.729 MB/sec (NB=147.162 MB/sec 1177.29 MBit/sec) 64
> procs
> Throughput 66.7354 MB/sec (NB=83.4193 MB/sec 667.354 MBit/sec) 128
> procs
>
> 2.4.17-pre2:
>
> Throughput 96.2302 MB/sec (NB=120.288 MB/sec 962.302 MBit/sec) 1 procs
> Throughput 226.679 MB/sec (NB=283.349 MB/sec 2266.79 MBit/sec) 2 procs
> Throughput 223.955 MB/sec (NB=279.944 MB/sec 2239.55 MBit/sec) 4 procs
> Throughput 224.533 MB/sec (NB=280.666 MB/sec 2245.33 MBit/sec) 8 procs
> Throughput 153.672 MB/sec (NB=192.09 MB/sec 1536.72 MBit/sec) 16 procs
> Throughput 91.3464 MB/sec (NB=114.183 MB/sec 913.464 MBit/sec) 32
> procs
> Throughput 64.876 MB/sec (NB=81.095 MB/sec 648.76 MBit/sec) 64 procs
> Throughput 54.9774 MB/sec (NB=68.7217 MB/sec 549.774 MBit/sec) 128
> procs
>
> Throughput 136.522 MB/sec (NB=170.652 MB/sec 1365.22 MBit/sec) 1 procs
> Throughput 223.682 MB/sec (NB=279.603 MB/sec 2236.82 MBit/sec) 2 procs
> Throughput 222.806 MB/sec (NB=278.507 MB/sec 2228.06 MBit/sec) 4 procs
> Throughput 224.427 MB/sec (NB=280.534 MB/sec 2244.27 MBit/sec) 8 procs
> Throughput 152.286 MB/sec (NB=190.358 MB/sec 1522.86 MBit/sec) 16
> procs
> Throughput 92.044 MB/sec (NB=115.055 MB/sec 920.44 MBit/sec) 32 procs
> Throughput 78.0881 MB/sec (NB=97.6101 MB/sec 780.881 MBit/sec) 64
> procs
> Throughput 66.1573 MB/sec (NB=82.6966 MB/sec 661.573 MBit/sec) 128
> procs
>
> Throughput 117.95 MB/sec (NB=147.438 MB/sec 1179.5 MBit/sec) 1 procs
> Throughput 212.469 MB/sec (NB=265.586 MB/sec 2124.69 MBit/sec) 2 procs
> Throughput 214.763 MB/sec (NB=268.453 MB/sec 2147.63 MBit/sec) 4 procs
> Throughput 214.007 MB/sec (NB=267.509 MB/sec 2140.07 MBit/sec) 8 procs
> Throughput 96.6572 MB/sec (NB=120.821 MB/sec 966.572 MBit/sec) 16
> procs
> Throughput 48.1342 MB/sec (NB=60.1677 MB/sec 481.342 MBit/sec) 32
> procs
> Throughput 71.3444 MB/sec (NB=89.1806 MB/sec 713.444 MBit/sec) 64
> procs
> Throughput 59.258 MB/sec (NB=74.0724 MB/sec 592.58 MBit/sec) 128 procs
>
> I have included three runs for 2.4.17-pre2 to show how inconsistent its
> results are; 2.4.16 didn't have this problem to this extent. bonnie++
> numbers seem largely unchanged between kernels, coming in around:
>
> ------Sequential Output------ --Sequential Input- --Random-
> -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
> Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
> 2512M 14348 81 49495 26 24438 16 16040 96 55006 15 373.7 1
> ------Sequential Create------ --------Random Create--------
> -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
> files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
> 16 3087 99 +++++ +++ +++++ +++ 3175 100 +++++ +++ 11042 100
>
> The test machine is an IBM 342 with 2 1.26 GHz P3 processors and 1.25 GB
> of RAM. The above numbers were generated off of 1 10K RPM SCSI disk
> hanging off of an Adaptec aix7899 controller.
>
> --
> Jason Holmes
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

2001-12-11 22:44:59

by Bill Davidsen

[permalink] [raw]
Subject: Re: IO degradation in 2.4.17-pre2 vs. 2.4.16

On Tue, 4 Dec 2001, Marcelo Tosatti wrote:

> Yes, throughtput-only tests will have their numbers degradated with the
> change applied on 2.4.16-pre2.
>
> The whole thing is just about tradeoffs: Interactivity vs throughtput.
>
> I'm not going to destroy interactivity for end users to get beatiful
> dbench numbers.
>
> And about your clients: Don't you think they want some kind of
> decent latency on their side?

It depends on the machine. For a server the thing you need to feed clients
is throughput. I don't see how feeding the data slower is going to be GOOD
for latency. Particularly servers which push a lot of data, like mail and
news or certain web sites, need to push it now.

Latency is more of an issue for end user machines.

--
bill davidsen <[email protected]>
CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.

2001-12-11 22:52:29

by Dan Maas

[permalink] [raw]
Subject: Re: IO degradation in 2.4.17-pre2 vs. 2.4.16

> > Yes, throughtput-only tests will have their numbers degradated with the
> > change applied on 2.4.16-pre2.
> >
> > The whole thing is just about tradeoffs: Interactivity vs throughtput.
> >
> > I'm not going to destroy interactivity for end users to get beatiful
> > dbench numbers.
>
> Latency is more of an issue for end user machines.

Time for CONFIG_OPTIMIZE_THROUGHPUT / CONFIG_OPTIMIZE_LATENCY ?

Dan

2001-12-11 23:01:20

by Marcelo Tosatti

[permalink] [raw]
Subject: Re: IO degradation in 2.4.17-pre2 vs. 2.4.16



On Tue, 11 Dec 2001, Dan Maas wrote:

> > > Yes, throughtput-only tests will have their numbers degradated with the
> > > change applied on 2.4.16-pre2.
> > >
> > > The whole thing is just about tradeoffs: Interactivity vs throughtput.
> > >
> > > I'm not going to destroy interactivity for end users to get beatiful
> > > dbench numbers.
> >
> > Latency is more of an issue for end user machines.
>
> Time for CONFIG_OPTIMIZE_THROUGHPUT / CONFIG_OPTIMIZE_LATENCY ?

That would be the best thing to do, yes.

2001-12-11 23:08:21

by Alan

[permalink] [raw]
Subject: Re: IO degradation in 2.4.17-pre2 vs. 2.4.16

> > Time for CONFIG_OPTIMIZE_THROUGHPUT / CONFIG_OPTIMIZE_LATENCY ?
> That would be the best thing to do, yes.

/proc/sys not CONFIG_..

2001-12-12 00:27:50

by Andrew Morton

[permalink] [raw]
Subject: Re: IO degradation in 2.4.17-pre2 vs. 2.4.16

Alan Cox wrote:
>
> > > Time for CONFIG_OPTIMIZE_THROUGHPUT / CONFIG_OPTIMIZE_LATENCY ?
> > That would be the best thing to do, yes.
>
> /proc/sys not CONFIG_..

/proc/sys/vm/bdflush, to be precise.

I thought we discussed all this?

-

2001-12-12 00:54:17

by J Sloan

[permalink] [raw]
Subject: Re: IO degradation in 2.4.17-pre2 vs. 2.4.16

Alan Cox wrote:

> > > Time for CONFIG_OPTIMIZE_THROUGHPUT / CONFIG_OPTIMIZE_LATENCY ?
> > That would be the best thing to do, yes.
>
> /proc/sys not CONFIG_..

YES!

Much much preferable...

cu

jjs