2001-12-21 00:06:53

by Dave Jones

[permalink] [raw]
Subject: Possible O_DIRECT problems ?

Andrea, lk,
I just experimented with O_DIRECT in conjunction with fsx,
and the results aren't pretty.

Over NFS it survives around 921 operations, all local filesystems
(ext2,ext3,reiser tested) just 6 operations.
I've put the source to a modified fsx at
http://www.codemonkey.org.uk/cruft/fsx-odirect.c

It's possible I've done something wrong here, so look it over.
Just adding O_DIRECT flag to open() should be all thats necessary
correct ?

Also note, that by changing the flags on line 988 to have O_DIRECT
also, we get different failure type.

So, did I get the usage of O_DIRECT correct and find some bugs,
or have I had a little too much xmas spirits already ? 8-)


Dave.

--
| Dave Jones. http://www.codemonkey.org.uk
| SuSE Labs .


2001-12-21 00:24:14

by Trond Myklebust

[permalink] [raw]
Subject: Re: Possible O_DIRECT problems ?

>>>>> " " == Dave Jones <[email protected]> writes:

> Andrea, lk,
> I just experimented with O_DIRECT in conjunction with fsx,
> and the results aren't pretty.

> Over NFS it survives around 921 operations, all local
> filesystems (ext2,ext3,reiser tested) just 6 operations. I've
> put the source to a modified fsx at
> http://www.codemonkey.org.uk/cruft/fsx-odirect.c

Dave,

O_DIRECT for NFS isn't yet merged into the kernel. Are these Chuck
Lever's NFS patches you've been testing?

Cheers,
Trond

2001-12-21 00:38:26

by Dave Jones

[permalink] [raw]
Subject: Re: Possible O_DIRECT problems ?

On Fri, Dec 21, 2001 at 01:23:45AM +0100, Trond Myklebust wrote:

> O_DIRECT for NFS isn't yet merged into the kernel. Are these Chuck
> Lever's NFS patches you've been testing?

Nope, stock 2.4.17rc2 & 2.5.1.
I thought NFS might just ignore the O_DIRECT flag if it didn't
understand it yet, I wasn't expecting such a dramatic failure.

I just got reminded of the bugs Andrew Morton & some others
found in O_DIRECT, so this may be hitting the same problems
already found.

Dave.

2001-12-21 12:47:19

by Trond Myklebust

[permalink] [raw]
Subject: Re: Possible O_DIRECT problems ?

On Friday 21. December 2001 05:12, GOTO Masanori wrote:
> At Fri, 21 Dec 2001 00:39:42 +0000,
>
> Dave Jones <[email protected]> wrote:
> > On Fri, Dec 21, 2001 at 01:23:45AM +0100, Trond Myklebust wrote:
> > > O_DIRECT for NFS isn't yet merged into the kernel. Are these Chuck
> > > Lever's NFS patches you've been testing?
>
> Where is Chuck's patch ? I searched but didn't find.

I haven't put it up on my own web-site, but it should be available from the
CITI NFS client performance project site. See

http://www.citi.umich.edu/projects/nfs-perf/patches/

> Supporting direct_IO with NFS is some meaningful
> for users who have fast NAS server environment, IMHO.

It can also provide for better data security in some circumstances.
Journaling in databases over NFS can for instance benefit greatly, and has
been one of Chuck's motivations for doing it.

Cheers,
Trond

2001-12-21 16:05:14

by Chuck Lever

[permalink] [raw]
Subject: Re: Possible O_DIRECT problems ?

fyi: the complete patch against 2.4.16 (should work with little or no
modification against 2.4.17) is here:

http://www.citi.umich.edu/projects/nfs-perf/patches/

you'll need to apply inode2file.diff then nfs-odirect11.diff, and it
requires Trond's pathconf patch in order to be completely useful.

because O_DIRECT cannot do small I/O (must be a multiple of a block size),
does fsx work when using it? can someone describe the failures?

On Fri, 21 Dec 2001, GOTO Masanori wrote:

> At Fri, 21 Dec 2001 00:39:42 +0000,
> Dave Jones <[email protected]> wrote:
> >
> > On Fri, Dec 21, 2001 at 01:23:45AM +0100, Trond Myklebust wrote:
> >
> > > O_DIRECT for NFS isn't yet merged into the kernel. Are these Chuck
> > > Lever's NFS patches you've been testing?
>
> Where is Chuck's patch ? I searched but didn't find.
>
> > Nope, stock 2.4.17rc2 & 2.5.1.
> > I thought NFS might just ignore the O_DIRECT flag if it didn't
> > understand it yet, I wasn't expecting such a dramatic failure.
>
> Supporting direct_IO with NFS is some meaningful
> for users who have fast NAS server environment, IMHO.
>
> > I just got reminded of the bugs Andrew Morton & some others
> > found in O_DIRECT, so this may be hitting the same problems
> > already found.
>
> No, I think it's another issue, but it may be another bugs...
>
> -- gotom
>

- Chuck Lever
--
corporate: <[email protected]>
personal: <[email protected]>

2001-12-21 16:15:05

by Chuck Lever

[permalink] [raw]
Subject: Re: Possible O_DIRECT problems ?

On Fri, 21 Dec 2001, Trond Myklebust wrote:

> On Friday 21. December 2001 05:12, GOTO Masanori wrote:
> > At Fri, 21 Dec 2001 00:39:42 +0000,
> >
> > Dave Jones <[email protected]> wrote:
> > > On Fri, Dec 21, 2001 at 01:23:45AM +0100, Trond Myklebust wrote:
> > > > O_DIRECT for NFS isn't yet merged into the kernel. Are these Chuck
> > > > Lever's NFS patches you've been testing?
> >
> > Where is Chuck's patch ? I searched but didn't find.
>
> I haven't put it up on my own web-site, but it should be available from the
> CITI NFS client performance project site. See
>
> http://www.citi.umich.edu/projects/nfs-perf/patches/
>
> > Supporting direct_IO with NFS is some meaningful
> > for users who have fast NAS server environment, IMHO.
>
> It can also provide for better data security in some circumstances.
> Journaling in databases over NFS can for instance benefit greatly, and has
> been one of Chuck's motivations for doing it.

the patch is designed for applications that manage their own data cache,
like databases do. but it is also useful for applications that want to
move large datasets without blowing the O/S level data cache.

in the NFS case, because O_DIRECT read() and write() always go back to the
server, you can more easily build clustered and HA applications that share
the data storage backend.

- Chuck Lever
--
corporate: <[email protected]>
personal: <[email protected]>

2001-12-29 15:25:50

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: Possible O_DIRECT problems ?

On Fri, Dec 21, 2001 at 12:39:42AM +0000, Dave Jones wrote:
> On Fri, Dec 21, 2001 at 01:23:45AM +0100, Trond Myklebust wrote:
>
> > O_DIRECT for NFS isn't yet merged into the kernel. Are these Chuck
> > Lever's NFS patches you've been testing?
>
> Nope, stock 2.4.17rc2 & 2.5.1.
> I thought NFS might just ignore the O_DIRECT flag if it didn't
> understand it yet, I wasn't expecting such a dramatic failure.

The point of O_DIRECT is to do DMA directly into the userspace memory
(and to avoid the VM overhead but that's a secondary issue and with data
journaling we may need to put an anchor into the VM to serialize the
direct I/O with the pagecache I/O in a secondary - slower - direct_IO
callback for the data journaling fs).

But to avoid the mem copies you're required to use strict alignment and
size of the userspace buffers, just like rawio.

If you don't you will get -EINVAL. This ensures people will use O_DIRECT
correctly in their apps. In short every single bugreport like this about
this -EINVAL strict behaviour is the proof we need to be strict and to
return -EINVAL :)

Andrea

2001-12-29 18:46:29

by CJ

[permalink] [raw]
Subject: Re: Possible O_DIRECT problems ?

Shouldn't O_DIRECT's requirements come from the hardware? If we can
ASPI or CAM DMA SCSI devices to odd addresses and lengths, why not
O_DIRECT? Do ape drives DMA to user buffers? Are O_DIRECT's
current limits gratuitous?


Andrea Arcangeli wrote:

>On Fri, Dec 21, 2001 at 12:39:42AM +0000, Dave Jones wrote:
>
>>On Fri, Dec 21, 2001 at 01:23:45AM +0100, Trond Myklebust wrote:
>>
>> > O_DIRECT for NFS isn't yet merged into the kernel. Are these Chuck
>> > Lever's NFS patches you've been testing?
>>
>>Nope, stock 2.4.17rc2 & 2.5.1.
>>I thought NFS might just ignore the O_DIRECT flag if it didn't
>>understand it yet, I wasn't expecting such a dramatic failure.
>>
>
>The point of O_DIRECT is to do DMA directly into the userspace memory
>(and to avoid the VM overhead but that's a secondary issue and with data
>journaling we may need to put an anchor into the VM to serialize the
>direct I/O with the pagecache I/O in a secondary - slower - direct_IO
>callback for the data journaling fs).
>
>But to avoid the mem copies you're required to use strict alignment and
>size of the userspace buffers, just like rawio.
>
>If you don't you will get -EINVAL. This ensures people will use O_DIRECT
>correctly in their apps. In short every single bugreport like this about
>this -EINVAL strict behaviour is the proof we need to be strict and to
>return -EINVAL :)
>
>Andrea
>-
>To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>the body of a message to [email protected]
>More majordomo info at http://vger.kernel.org/majordomo-info.html
>Please read the FAQ at http://www.tux.org/lkml/
>
>.
>


2001-12-30 05:41:56

by Andre Hedrick

[permalink] [raw]
Subject: Re: Possible O_DIRECT problems ?

On Sat, 29 Dec 2001, CJ wrote:

> Shouldn't O_DIRECT's requirements come from the hardware? If we can
> ASPI or CAM DMA SCSI devices to odd addresses and lengths, why not
> O_DIRECT? Do ape drives DMA to user buffers? Are O_DIRECT's
> current limits gratuitous?

CAM is a very bad thing and that is why the X3 committees split.


Andre Hedrick
Linux Disk Certification Project Linux ATA Development

2001-12-30 10:16:15

by Gérard Roudier

[permalink] [raw]
Subject: Re: Possible O_DIRECT problems ?



On Sat, 29 Dec 2001, Andre Hedrick wrote:

> On Sat, 29 Dec 2001, CJ wrote:
>
> > Shouldn't O_DIRECT's requirements come from the hardware? If we can
> > ASPI or CAM DMA SCSI devices to odd addresses and lengths, why not
> > O_DIRECT? Do ape drives DMA to user buffers? Are O_DIRECT's
> > current limits gratuitous?
>
> CAM is a very bad thing and that is why the X3 committees split.

There were interesting guide-lines in CAM, notably the topology handling
and the error recovery scheme. But it was another different wheel in a
world where everybody did reinvent its own. It seemed also very DEC
tainted.

Btw, given guys like you in X3 committees, I am not surprised that splits
occur in this place. :-)

G?rard.

PS: Your various email addresses bounce back claiming some ridiculous
text about spammers. Is this still another show of your apparent
existential complex.