On Fri, 19 Oct 2001, Robert Cohen wrote:
>
> However, look what happens if I run 5 copies at once.
>
> writing to file of size 60 Megs with buffers of 5000 bytes
> writing to file of size 60 Megs with buffers of 5000 bytes
> writing to file of size 60 Megs with buffers of 5000 bytes
> writing to file of size 60 Megs with buffers of 5000 bytes
> writing to file of size 60 Megs with buffers of 5000 bytes
> write elapsed time=33.96 seconds, write_speed=1.77
> write elapsed time=37.43 seconds, write_speed=1.60
> write elapsed time=37.74 seconds, write_speed=1.59
> write elapsed time=37.93 seconds, write_speed=1.58
> write elapsed time=40.74 seconds, write_speed=1.47
> rewrite elapsed time=512.44 seconds, rewrite_speed=0.12
> rewrite elapsed time=518.59 seconds, rewrite_speed=0.12
> rewrite elapsed time=518.05 seconds, rewrite_speed=0.12
> rewrite elapsed time=518.96 seconds, rewrite_speed=0.12
> rewrite elapsed time=517.08 seconds, rewrite_speed=0.12
>
> Here we see a factor of about 15 between write speed and rewrite speed.
> That seems a little extreme.
The cumulative plain write speed (non-rewrite) doesn't change much,
because plain writes can be re-ordered and running five copies at once
isn't THAT big of a deal (ie still ends up seeking much more than just one
file would, but it's not catastrophic).
> >From the amount of seeking happening, I believe that all the reads are
> being done as single page separate reads. Surely there should be some
> readahead happening.
Well, we _could_ do read-ahead even for the read-modify-write, but the
fact is that it doesn't tend to be worth it for real-life loads.
That's because read-modify-write is _usually_ of the type where you lseek,
and modify a smallish thing in-place: anybody who cares about performance
and does consecutive modifications should have coalesced them anyway, so
only made-up benchmarks actually show the problem.
> I tested the same program under Solaris and I get about a factor of 2
> difference regardless whether its one copy or 5 copies.
I wouldn't be surprised if Solaris just always reads ahead. Most "big
iron" unixes try very very hard to make all IO in big chunks, whether it
is necessary or not.
> I believe that this is an odd situation and sure it only happens for
> badly written program. I can see that it would be stupid to optimise for
> this situation.
> But do we really need to do this badly for this case?
Well, if you find a real application that cares, I might change my mind,
but right now read-ahead looks like a waste of time to me.. Does anybody
really do re-write in practice?
(Yes, I know about databases, but they tend to want to be cached anyway,
and tend to do direct block IO, not sub-block read-modify-write).
Linus
Linus Torvalds wrote:
>
> > I believe that this is an odd situation and sure it only happens for
> > badly written program. I can see that it would be stupid to optimise for
> > this situation.
> > But do we really need to do this badly for this case?
>
> Well, if you find a real application that cares, I might change my mind,
> but right now read-ahead looks like a waste of time to me.. Does anybody
> really do re-write in practice?
You're right. I don't have a real world program that cares.
The program I have is a thing named lantest which is the standard
benchmark benchmark for macintosh file server performace. Actually as
far as I can tell, the only benchmark available.
The reason am bothered by this case is that I feel its bad practise to
leave degenerate performance situations where the kernel performance
really sucks if it can be easily avoided. If you have a hole in kernel
performce where people can get into degenerate performance, sooner or
later someone is going to fall into it and then they're going to blame
the kernel. Sure this may only happen to badly written programs, but its
not as if the world is exactly short of them.
If you take my situation as a case in point, I was asked to evaluate
Linux and Solaris a potential file serving platforms for Mac clients. So
as part of my testing, I fired up lantest as a benchmark. Solaris was
fine, Linux sucked. My conclusion was that Linux was an immature
operating system that still needed some work and we went with Solaris.
Its only because I devoted time and effort into understanding why Linux
did so badly that I now understand that this is not a Linux problem per
se. And I only did that because I wanted to report the problem and help
Linux become a better operating system.
Anyway, as far as I can tell, it would barely affect Linux performance
to do some read ahead in this case. The disk head is already there, it
costs almost nothing to read a few extra 10's or 100's of K's of data.
And it would help eliminate this potentially degenerate case.
--
Robert Cohen
Unix Support
TLTSU
Australian National University
Ph: 612 58389