2002-09-27 18:54:31

by Andrew Morton

[permalink] [raw]
Subject: Re: Warning - running *really* short on DMA buffers while doingfile transfers

"Justin T. Gibbs" wrote:
> ...
> The OS elevator will never know all of the device characteristics that
> the device knows. This is why the device's elevator will always out
> perform the OSes assuming the OS isn't stupid about overcommitting writes.
> That's what the argument is here. Linux is agressively committing writes
> when it shouldn't.

The VM really doesn't want to strangle itself because it might be
talking to a braindead SCSI drive.

I have a Fujitsu disk which allows newly submitted writes to
bypass already-submitted reads. A read went in, and did not
come back for three seconds. Ninety megabytes of writes went
into the disk during those three seconds.

>From a whole-system performance viewpoint that is completely
broken behaviour. It's unmanageable from the VM point of view.
At least, I don't want to have to manage it at that level.

We may be able to work around it by adding kludges to the IO
scheduler but it's easier to just set the tag depth to zero and
mutter rude words about clueless firmware developers.

> > I guess, however, that this issue will evaporate substantially once the
> > aic7xxx driver uses ordered tags to represent the transaction integrity
> > since the barriers will force the drive seek algorithm to follow the tag
> > transmission order much more closely.
> Hooks for sending ordered tags have been in the aic7xxx driver, at least
> in FreeBSD's version, since '97. As soon as the Linux cmd blocks have
> such information it will be trivial to have the aic7xxx driver issue
> the appropriate tag types. But this misses the point. Andrew's original
> speculation was that writes were "passing reads" once the read was
> submitted to the drive. I would like to understand the evidence behind
> that assertion since all drive's I've worked with automatically give
> a higher priority to read traffic than writes since writes can be buffered
> but reads cannot.

Could be that the Fujitsu is especially broken. I observed the three
second read latency with 253 tags (OK, that's 128 megabytes worth).
But with the driver limited to four tags, latency was two seconds.
Hence my speculation.

> Ordered tags only help if the driver is already not
> doing what you want or if your writes must have a specific order for
> data integrity.

Is it possible to add a tag to a read which says "may not be bypassed
by writes"? That would be OK, as long as the driver is only set up
to use a tag depth of four or so.

To use larger tag depths, we'd need to be able to tag newly incoming
reads with a "do this before servicing already-submitted writes"
attribute. Is anything like that available?