HI folks, sometime ago i seen on lkml a post
from >< regarding the implementation of O_DIRECT.
The thing about to care, is the fact, that *nobody*,
reacted on this post. It seems to me that nobody was
happy enough about this to tell "oh yes! at last!"
This is interesting, because one real advantage
of O_DIRECT are these greased weasel fast 15-20 Mb/s
file copies, which ones makes windoze users to look
on us as on lesser beings.
I understand, though, that this approach scales
bad in the terms of multithread loads, which ones are
especially important in server environments, the place
linux initially growed from, and that is why it wasn`t
already implemented.
One more problem i see here, and i think it is an
*extremely* important one, that making open( ... ,
BLA_BLA_BLA | O_DIRECT) is a thing some people may
overspeculate with. I mean that implementing O_DIRECT
in cp(1), wins the prize, but in the case of, say,
find(1) it is definitely not a wise move. The problem
may be determined as "poisoning" software with this
godblessed O_DIRECT, to the state, when 70% of code
on an average machine will use it, thus *completely*
killing the advantages of buffered access, and
suddenly *bang!*: the overall performance is died.
But the worst thing, is what the process of
poisoning is completely uncontrollable: each
stupid doodie can think, that His shitful piece of Code,
is Especially Important, ant that in his case O_DIRECT
is perfectly suitable. And in the case His code is
someway performance critical, then most likely O_DIRECT
will really improve his Code benchmarks, and that is
making things really awful, leading to the hell large
crowd of pig happy dudes thinking their useless code
is life critical, and thus dooming linux.
Maybe i`m stupid, as these potential dudes, and
painting things in too dark colors, but O_DIRECT,
i think, is a dangerous thing to play with.
That is why, i think, Linus as far as i can properly
recall, wasn`t happy with it et al.
Maybe i`m missing the whole point, and thus i want to
hear what other people will tell about it.
Cheers,
Samium Gromoff
On Wed, 4 Jul 2001, Samium Gromoff wrote:
> Maybe i`m missing the whole point, and thus i want to
> hear what other people will tell about it.
Several of us are working on it.
-ben
At 21:34 03/07/2001, Samium Gromoff wrote:
[snip]
> One more problem i see here, and i think it is an
> *extremely* important one, that making open( ... ,
> BLA_BLA_BLA | O_DIRECT) is a thing some people may
> overspeculate with. I mean that implementing O_DIRECT
> in cp(1), wins the prize, but in the case of, say,
Why should it? It is very well possible that the file(s) being copied have
been accessed beforehand and hence are already in the page/buffer cache.
Using O_DIRECT would not only completely bypass the page/buffer cache but
it would also cause the cache to be flushed (if dirty) and the cache
buffers/pages invalidated (otherwise you lose coherency). This is going to
be _slower_ than not using O_DIRECT.
> find(1) it is definitely not a wise move. The problem
> may be determined as "poisoning" software with this
> godblessed O_DIRECT, to the state, when 70% of code
> on an average machine will use it, thus *completely*
> killing the advantages of buffered access, and
> suddenly *bang!*: the overall performance is died.
Er. Using O_DIRECT means you are doing _unbuffered_ access. - Maybe I am
misunderstanding your comments, but is seems to me you have the whole
concept of O_DIRECT the wrong way round.
> But the worst thing, is what the process of
> poisoning is completely uncontrollable: each
> stupid doodie can think, that His shitful piece of Code,
> is Especially Important, ant that in his case O_DIRECT
> is perfectly suitable. And in the case His code is
> someway performance critical, then most likely O_DIRECT
> will really improve his Code benchmarks, and that is
> making things really awful, leading to the hell large
> crowd of pig happy dudes thinking their useless code
> is life critical, and thus dooming linux.
O_DIRECT _decreases_ performance drastically in most cases. So nobody in
their right mind would use it for normal applications. - The people who
would use it and would actually experience a speed _increase_ would be
programmers of large databases which perform their own caching in user
space (thus making the normal fs level caching unnecessary, and in fact,
worse than the unbuffered case) and programmers of multi media streaming
applications (e.g. video/audio streaming including DVD playback[1] for
example) which know that A) the data is not in the cache and B) the data
will never be accessed again in the near future so caching the data is not
only pointless but causes actually useful (other, unrelated) data present
in the cache to be displaced out of the cache.
> Maybe i`m stupid, as these potential dudes, and
> painting things in too dark colors, but O_DIRECT,
> i think, is a dangerous thing to play with.
It is indeed. It is only useful in very special circumstances as described
above. Using it in "normal" applications is stupid and will lead to
degradation of performance of the application using it.
> Maybe i`m missing the whole point, and thus i want to
> hear what other people will tell about it.
I think you do... I hope I managed to explain what O_DIRECT actually is above.
Shame you didn't attend the Linux Developers Conference (in Manchester)
last weekend as Andrea Arcangeli gave a very nice talk explaining O_DIRECT
in depth.
Best regards,
Anton
[1] Actually DVD players make use or raw i/o to access the DVD disk device
as a whole, thus bypassing file system code altogether, which is even
faster, but if you were to copy a DVD to your hard drive than O_DIRECT
would give you the described benefits.
--
"Nothing succeeds like success." - Alexandre Dumas
--
Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @)
Linux NTFS Maintainer / WWW: http://linux-ntfs.sf.net/
ICQ: 8561279 / WWW: http://www-stu.christs.cam.ac.uk/~aia21/
Hi,
On Wed, Jul 04, 2001 at 12:34:35AM +0400, Samium Gromoff wrote:
>
> This is interesting, because one real advantage
> of O_DIRECT are these greased weasel fast 15-20 Mb/s
> file copies, which ones makes windoze users to look
> on us as on lesser beings.
Not true.
O_DIRECT does not speed up sequential file accesses. If anything, it
may well slow them down, especially for writes. What O_DIRECT does is
twofold --- it guarantees physical IO to the disk (so that you know
for sure that the data is on disk for writes, or that the data on disk
is readable for reads); and it avoids the memory and CPU overhead of
keeping any cached copy of the data.
But because O_DIRECT is completely synchronous, it's not possible for
the kernel to implement its normal readahead and writebehind IO
clustering for direct IO. If you use the normal approach of writing
4k at a time to an O_DIRECT file, things may well be *massively*
slower than usual because the kernel is sending individual 4k IOs to
the disk, and because it is waiting for each IO to complete before the
application provides the next one.
On the contrary, buffered writes allow the kernel to batch those 4k
writes into large disk IOs, perhaps 100k or more; and the kernel can
maintain a queue of more than one such IO, so that once the first IO
completes the next one is immediately ready to be sent out.
For these reasons, buffered IO is often faster than O_DIRECT for pure
sequential access. The downside it its greater CPU cost and the fact
that it pollutes the cache (which, in turn, causes even _more_ CPU
overhead when the VM is forced to start reclaiming old cache data to
make room for new blocks.)
O_DIRECT is great for cases like multimedia (where you want to
maximise CPU available to the application and where you know in
advance that the data is unlikely to fit in cache) and databases
(where the application is caching things already and extra copies in
memory are just a waste of memory). It is not an automatic win for
all applications.
Cheers,
Stephen
In article <[email protected]>,
Stephen C. Tweedie <[email protected]> wrote:
>For these reasons, buffered IO is often faster than O_DIRECT for pure
>sequential access. The downside it its greater CPU cost and the fact
>that it pollutes the cache (which, in turn, causes even _more_ CPU
>overhead when the VM is forced to start reclaiming old cache data to
>make room for new blocks.)
Any chance of something like O_SEQUENTIAL (like madvise(MADV_SEQUENTIAL))
Mike.
Hi,
On Wed, Jul 04, 2001 at 06:27:13PM +0000, Miquel van Smoorenburg wrote:
> In article <[email protected]>,
> Stephen C. Tweedie <[email protected]> wrote:
> >For these reasons, buffered IO is often faster than O_DIRECT for pure
> >sequential access. The downside it its greater CPU cost and the fact
> >that it pollutes the cache (which, in turn, causes even _more_ CPU
> >overhead when the VM is forced to start reclaiming old cache data to
> >make room for new blocks.)
>
> Any chance of something like O_SEQUENTIAL (like madvise(MADV_SEQUENTIAL))
What for? The kernel already optimises readahead and writebehind for
sequential files.
If you want to provide specific extra hints to the kernel, then things
like O_UNCACHE might be more appropriate to instruct the kernel to
explicitly remove the cached page after IO completes (to avoid the VM
overhead of maintaining useless cache). That would provide a definite
improvement over normal IO for large multimedia-style files or for
huge copies. But what part of the normal handling of sequential files
would O_SEQUENTIAL change? Good handling of sequential files should
be the default, not an explicitly-requested feature.
Cheers,
Stephen
In article <[email protected]>,
Stephen C. Tweedie <[email protected]> wrote:
>Hi,
>
>On Wed, Jul 04, 2001 at 06:27:13PM +0000, Miquel van Smoorenburg wrote:
>>
>> Any chance of something like O_SEQUENTIAL (like madvise(MADV_SEQUENTIAL))
>
>What for? The kernel already optimises readahead and writebehind for
>sequential files.
Yes, but I really do mean like in madvise().
>If you want to provide specific extra hints to the kernel, then things
>like O_UNCACHE might be more appropriate to instruct the kernel to
>explicitly remove the cached page after IO completes (to avoid the VM
>overhead of maintaining useless cache). That would provide a definite
>improvement over normal IO for large multimedia-style files or for
>huge copies. But what part of the normal handling of sequential files
>would O_SEQUENTIAL change? Good handling of sequential files should
>be the default, not an explicitly-requested feature.
exactly what I meant, since that is what MADV_SEQUENTIAL seems to do:
linux/mm/filemap.c:
* MADV_SEQUENTIAL - pages in the given range will probably be accessed
* once, so they can be aggressively read ahead, and
* can be freed soon after they are accessed.
/*
* Read-ahead and flush behind for MADV_SEQUENTIAL areas. Since we are
* sure this is sequential access, we don't need a flexible read-ahead
* window size -- we can always use a large fixed size window.
*/
static void nopage_sequential_readahead(struct vm_area_struct * vma,
O_SEQUENTIAL perhaps is the wrong name.
I'd like to see this so I can run tar to backup a machine during the
day (if tar used this flag, ofcourse) without performance going
down the drain because of cache pollution.
Mike.
Hi,
On Wed, Jul 04, 2001 at 08:23:10PM +0000, Miquel van Smoorenburg wrote:
> >huge copies. But what part of the normal handling of sequential files
> >would O_SEQUENTIAL change? Good handling of sequential files should
> >be the default, not an explicitly-requested feature.
>
> exactly what I meant, since that is what MADV_SEQUENTIAL seems to do:
>
> linux/mm/filemap.c:
>
> * MADV_SEQUENTIAL - pages in the given range will probably be accessed
> * once, so they can be aggressively read ahead, and
> * can be freed soon after they are accessed.
We already have "drop-behind" for sequential reads --- we lower the
priority of recently read-in pages so that if they don't get accessed
again, they can be reclaimed. This should be, and is, part of the
default kernel behaviour for such things.
The trouble is that you still need the VM to go around and clean up
those pages if you need the memory for something else. There's a big
difference between "can be freed" and "are forcibly freed". O_DIRECT
behaves like the latter: the memory is automatically reclaimed after
use so it results in no memory pressure at all, whereas the
MADV_SEQUENTIAL type of behaviour just allows the VM to reclaim those
pages on demand --- the VM still has to do the work.
Cheers,
Stephen