2010-11-18 23:34:29

by Moazam Raja

[permalink] [raw]
Subject: O_DIRECT, O_SYNC, or fsync() on NFS mounts?

Hi all,

I'm currently exporting a ZFS filesystem on Solaris 11 Express as NFS.
I have a Linux client mounting that NFS v3 filesystem with the
proto=tcp option.

My question is, what's the safest and most reliable way to write data
to this NFS mount on a Linux client? Should my application code use
O_DIRECT, or O_SYNC? Or should I be doing a write() and a fsync()? I
want to make sure that data is not lost and is truly committed, while
keeping decent performance (of course).


-Moazam


2010-11-19 19:25:05

by Trond Myklebust

[permalink] [raw]
Subject: Re: O_DIRECT, O_SYNC, or fsync() on NFS mounts?

On Thu, 2010-11-18 at 15:34 -0800, Moazam Raja wrote:
> Hi all,
>
> I'm currently exporting a ZFS filesystem on Solaris 11 Express as NFS.
> I have a Linux client mounting that NFS v3 filesystem with the
> proto=tcp option.
>
> My question is, what's the safest and most reliable way to write data
> to this NFS mount on a Linux client? Should my application code use
> O_DIRECT, or O_SYNC? Or should I be doing a write() and a fsync()? I
> want to make sure that data is not lost and is truly committed, while
> keeping decent performance (of course).

Any one of the above methods will ensure that the data is synced to
disk. In addition, NFS also guarantees that your data is fully synced to
disk when taking/freeing POSIX locks, and when you close() the file.

The choice of one method over the other depends on your application
requirements. Not on your choice of underlying storage.

Trond


2010-11-19 21:48:30

by J. Bruce Fields

[permalink] [raw]
Subject: Re: O_DIRECT, O_SYNC, or fsync() on NFS mounts?

On Fri, Nov 19, 2010 at 04:26:35PM -0500, Trond Myklebust wrote:
> On Fri, 2010-11-19 at 15:04 -0500, J. Bruce Fields wrote:
> > On Fri, Nov 19, 2010 at 02:24:59PM -0500, Trond Myklebust wrote:
> > > On Thu, 2010-11-18 at 15:34 -0800, Moazam Raja wrote:
> > > > Hi all,
> > > >
> > > > I'm currently exporting a ZFS filesystem on Solaris 11 Express as NFS.
> > > > I have a Linux client mounting that NFS v3 filesystem with the
> > > > proto=tcp option.
> > > >
> > > > My question is, what's the safest and most reliable way to write data
> > > > to this NFS mount on a Linux client? Should my application code use
> > > > O_DIRECT, or O_SYNC? Or should I be doing a write() and a fsync()? I
> > > > want to make sure that data is not lost and is truly committed, while
> > > > keeping decent performance (of course).
> > >
> > > Any one of the above methods will ensure that the data is synced to
> > > disk. In addition, NFS also guarantees that your data is fully synced to
> > > disk when taking/freeing POSIX locks, and when you close() the file.
> >
> > Is the client still doing that in the presence of a write delegation, by
> > the way?
>
> If the application requests O_DIRECT/O_SYNC or calls fsync(), we are
> required by POSIX to ensure the data is safe on disk. The presence of an
> NFS delegation does not change that requirement.
>
> We could potentially relax the sync-to-disk requirements when locking
> and closing the file since those are only about ensuring close-to-open
> cache consistency requirements (which is also ensured by the delegation)
> but we do not do so today.

OK, that makes sense.

We probably shouldn't say in that case that we "guarantee" the sync on
close/free, if we consider it a detail of the current implementation
rather than a requirement.

--b.

2010-11-19 19:57:04

by Chuck Lever

[permalink] [raw]
Subject: Re: O_DIRECT, O_SYNC, or fsync() on NFS mounts?


On Nov 19, 2010, at 2:24 PM, Trond Myklebust wrote:

> On Thu, 2010-11-18 at 15:34 -0800, Moazam Raja wrote:
>> Hi all,
>>
>> I'm currently exporting a ZFS filesystem on Solaris 11 Express as NFS.
>> I have a Linux client mounting that NFS v3 filesystem with the
>> proto=tcp option.
>>
>> My question is, what's the safest and most reliable way to write data
>> to this NFS mount on a Linux client? Should my application code use
>> O_DIRECT, or O_SYNC? Or should I be doing a write() and a fsync()? I
>> want to make sure that data is not lost and is truly committed, while
>> keeping decent performance (of course).
>
> Any one of the above methods will ensure that the data is synced to
> disk. In addition, NFS also guarantees that your data is fully synced to
> disk when taking/freeing POSIX locks, and when you close() the file.
>
> The choice of one method over the other depends on your application
> requirements. Not on your choice of underlying storage.

We should add that the synchronous methods (O_DIRECT and O_SYNC) guarantee that the write will go immediately to the server and return an immediate error code (like ENOSPC). That might be an improvement over an async write, where an error report could be delayed until close(2).

But as was said above, it depends on your application's requirements.

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com





2010-11-21 10:46:28

by Christoph Hellwig

[permalink] [raw]
Subject: Re: O_DIRECT, O_SYNC, or fsync() on NFS mounts?

On Fri, Nov 19, 2010 at 04:26:35PM -0500, Trond Myklebust wrote:
> If the application requests O_DIRECT/O_SYNC or calls fsync(), we are
> required by POSIX to ensure the data is safe on disk. The presence of an
> NFS delegation does not change that requirement.

That's not quite correct. O_DIRECT for one is not actually specific in
Posix at all, and the documented Linux semantics only say that the
pagecache should not be used (even if it sometimes is with various
filesystems). There is not guarantee that data actually is on disk or
reachable, for that you need to add the O_SYNC/O_DYSNC flag in addition
or use fsync/fdatasync.


2010-11-21 19:31:48

by Moazam Raja

[permalink] [raw]
Subject: Re: O_DIRECT, O_SYNC, or fsync() on NFS mounts?

This is my understanding as well, at least from reading the man pages.
O_DIRECT and O_SYNC have different characteristics, so I'm a bit
surprised at the responses here.

Furthermore, using O_DIRECT over a 1Gbe connection yields 100-110MB/s
performance where as O_SYNC on the same connection gives me around
20-30MB/s at best. There is definitely a difference.

-Moazam


On Sun, Nov 21, 2010 at 2:46 AM, Christoph Hellwig <[email protected]> wrote:
> On Fri, Nov 19, 2010 at 04:26:35PM -0500, Trond Myklebust wrote:
>> If the application requests O_DIRECT/O_SYNC or calls fsync(), we are
>> required by POSIX to ensure the data is safe on disk. The presence of an
>> NFS delegation does not change that requirement.
>
> That's not quite correct. ?O_DIRECT for one is not actually specific in
> Posix at all, and the documented Linux semantics only say that the
> pagecache should not be used (even if it sometimes is with various
> filesystems). ?There is not guarantee that data actually is on disk or
> reachable, for that you need to add the O_SYNC/O_DYSNC flag in addition
> or use fsync/fdatasync.
>
>

2010-11-19 20:04:24

by J. Bruce Fields

[permalink] [raw]
Subject: Re: O_DIRECT, O_SYNC, or fsync() on NFS mounts?

On Fri, Nov 19, 2010 at 02:24:59PM -0500, Trond Myklebust wrote:
> On Thu, 2010-11-18 at 15:34 -0800, Moazam Raja wrote:
> > Hi all,
> >
> > I'm currently exporting a ZFS filesystem on Solaris 11 Express as NFS.
> > I have a Linux client mounting that NFS v3 filesystem with the
> > proto=tcp option.
> >
> > My question is, what's the safest and most reliable way to write data
> > to this NFS mount on a Linux client? Should my application code use
> > O_DIRECT, or O_SYNC? Or should I be doing a write() and a fsync()? I
> > want to make sure that data is not lost and is truly committed, while
> > keeping decent performance (of course).
>
> Any one of the above methods will ensure that the data is synced to
> disk. In addition, NFS also guarantees that your data is fully synced to
> disk when taking/freeing POSIX locks, and when you close() the file.

Is the client still doing that in the presence of a write delegation, by
the way?

--b.

>
> The choice of one method over the other depends on your application
> requirements. Not on your choice of underlying storage.
>
> Trond
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2010-11-19 21:26:41

by Trond Myklebust

[permalink] [raw]
Subject: Re: O_DIRECT, O_SYNC, or fsync() on NFS mounts?

On Fri, 2010-11-19 at 15:04 -0500, J. Bruce Fields wrote:
> On Fri, Nov 19, 2010 at 02:24:59PM -0500, Trond Myklebust wrote:
> > On Thu, 2010-11-18 at 15:34 -0800, Moazam Raja wrote:
> > > Hi all,
> > >
> > > I'm currently exporting a ZFS filesystem on Solaris 11 Express as NFS.
> > > I have a Linux client mounting that NFS v3 filesystem with the
> > > proto=tcp option.
> > >
> > > My question is, what's the safest and most reliable way to write data
> > > to this NFS mount on a Linux client? Should my application code use
> > > O_DIRECT, or O_SYNC? Or should I be doing a write() and a fsync()? I
> > > want to make sure that data is not lost and is truly committed, while
> > > keeping decent performance (of course).
> >
> > Any one of the above methods will ensure that the data is synced to
> > disk. In addition, NFS also guarantees that your data is fully synced to
> > disk when taking/freeing POSIX locks, and when you close() the file.
>
> Is the client still doing that in the presence of a write delegation, by
> the way?

If the application requests O_DIRECT/O_SYNC or calls fsync(), we are
required by POSIX to ensure the data is safe on disk. The presence of an
NFS delegation does not change that requirement.

We could potentially relax the sync-to-disk requirements when locking
and closing the file since those are only about ensuring close-to-open
cache consistency requirements (which is also ensured by the delegation)
but we do not do so today.

Cheers
Trond


2010-11-21 20:02:05

by Trond Myklebust

[permalink] [raw]
Subject: Re: O_DIRECT, O_SYNC, or fsync() on NFS mounts?

On Sun, 2010-11-21 at 05:46 -0500, Christoph Hellwig wrote:
> On Fri, Nov 19, 2010 at 04:26:35PM -0500, Trond Myklebust wrote:
> > If the application requests O_DIRECT/O_SYNC or calls fsync(), we are
> > required by POSIX to ensure the data is safe on disk. The presence of an
> > NFS delegation does not change that requirement.
>
> That's not quite correct. O_DIRECT for one is not actually specific in
> Posix at all, and the documented Linux semantics only say that the
> pagecache should not be used (even if it sometimes is with various
> filesystems). There is not guarantee that data actually is on disk or
> reachable, for that you need to add the O_SYNC/O_DYSNC flag in addition
> or use fsync/fdatasync.

True.

We treat the O_DIRECT case as being the same as O_DIRECT|O_SYNC because
we don't currently have a way to locate and track outstanding O_DIRECT
rpc calls, and so fsync() has no effect.

We do, however support aio/dio, and so people who want better writev()
syscall latency can use that...

Cheers
Trond


2010-11-20 23:54:13

by Trond Myklebust

[permalink] [raw]
Subject: Re: O_DIRECT, O_SYNC, or fsync() on NFS mounts?

On Sat, 2010-11-20 at 09:12 -0700, Kums wrote:
> On Fri, Nov 19, 2010 at 12:24 PM, Trond Myklebust <
> [email protected]> wrote:
>
> > On Thu, 2010-11-18 at 15:34 -0800, Moazam Raja wrote:
> > > Hi all,
> > >
> > > I'm currently exporting a ZFS filesystem on Solaris 11 Express as NFS.
> > > I have a Linux client mounting that NFS v3 filesystem with the
> > > proto=tcp option.
> > >
> > > My question is, what's the safest and most reliable way to write data
> > > to this NFS mount on a Linux client? Should my application code use
> > > O_DIRECT, or O_SYNC? Or should I be doing a write() and a fsync()? I
> > > want to make sure that data is not lost and is truly committed, while
> > > keeping decent performance (of course).
> >
> > Any one of the above methods will ensure that the data is synced to
> > disk. In addition, NFS also guarantees that your data is fully synced to
> > disk when taking/freeing POSIX locks, and when you close() the file.
> > --
> >
>
> Instead of enforcing at the application side with O_DIRECT/O_SYNC, what if
> we mount nfs client with -o sync option as well as exportfs the nfs server
> with sync option? This way the data from all the application can be
> guaranteed to be safe?

???????

What is your definition of 'safe' here? Do you mean 'everything written
by my application is guaranteed to hit disk'? If so, then that would be
a much stronger guarantee than POSIX and local disk give you, and it
will seriously impact I/O performance (whether you use NFS, local disk
or whatever).

Why do you need this kind of guarantee in the first place? What
applications are you running?

Trond


2010-11-22 18:04:56

by Trond Myklebust

[permalink] [raw]
Subject: Re: O_DIRECT, O_SYNC, or fsync() on NFS mounts?

On Mon, 2010-11-22 at 10:45 -0700, Kums wrote:
> On Sat, Nov 20, 2010 at 4:54 PM, Trond Myklebust <[email protected]
> > wrote:
> > If so, then that would be
> > a much stronger guarantee than POSIX and local disk give you, and it
> > will seriously impact I/O performance (whether you use NFS, local disk
> > or whatever).
> >
>
> Yes, I understand. Iam just throwing out a suggestion to see if "-o sync"
> nfs mount + sync exportfs option can be alternative to using O_SYNC or
> O_DIRECT in the application (to guarantee everything written by application
> hits the disk).

mount -osync under NFS works exactly the same as under any other
filesystem, so yes...

Trond