2010-12-16 23:07:11

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [nfsv4] layoutcommits and file layout

On Thu, Dec 16, 2010 at 11:21:21AM -0500, Matt W. Benjamin wrote:
> Hi,
>
> We have a files implementation which wants to receive LAYOUTCOMMIT when a client is finished with a layout. It was my clear understanding from rfc5661 that we could expect this behavior.

Care to post it to the list?



2011-01-03 14:40:11

by Trond Myklebust

[permalink] [raw]
Subject: Re: [nfsv4] layoutcommits and file layout

On Mon, 2011-01-03 at 16:21 +0200, Benny Halevy wrote:
> On 2010-12-17 01:07, Christoph Hellwig wrote:
> > On Thu, Dec 16, 2010 at 11:21:21AM -0500, Matt W. Benjamin wrote:
> >> Hi,
> >>
> >> We have a files implementation which wants to receive LAYOUTCOMMIT when a client is finished with a layout. It was my clear understanding from rfc5661 that we could expect this behavior.
> >
> > Care to post it to the list?
> >
>
> I don't know what Matt's server is doing but the fundamental problem is
> manifested with extending a file with parallel DS writes.
> Assuming that the DS writes are executed in arbitrary order,
> exposing the file length before LAYOUTCOMMIT can cause
> a concurrent reader to read a hole. Although locking can
> solve this case, day-to-day applications that work well over
> local filesystem and legacy NFS may break because of this.

...and this differs from ordinary NFS writes exactly how?

Both cached and uncached (i.e. O_DIRECT) writes can and will be flushed
to disk in entirely random order when writing to the MDS. If you have a
parallel reader on another client (or even on the same client in the
case of O_DIRECT), and want it to see accurate data, then use locking.
If not, you will see holes and other strangeness.

IOW: There are no 'day-to-day applications that work well over legacy
NFS' that rely on this behaviour.

_______________________________________________
nfsv4 mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/nfsv4

2011-01-03 14:21:07

by Benny Halevy

[permalink] [raw]
Subject: Re: [nfsv4] layoutcommits and file layout

On 2010-12-17 01:07, Christoph Hellwig wrote:
> On Thu, Dec 16, 2010 at 11:21:21AM -0500, Matt W. Benjamin wrote:
>> Hi,
>>
>> We have a files implementation which wants to receive LAYOUTCOMMIT when a client is finished with a layout. It was my clear understanding from rfc5661 that we could expect this behavior.
>
> Care to post it to the list?
>

I don't know what Matt's server is doing but the fundamental problem is
manifested with extending a file with parallel DS writes.
Assuming that the DS writes are executed in arbitrary order,
exposing the file length before LAYOUTCOMMIT can cause
a concurrent reader to read a hole. Although locking can
solve this case, day-to-day applications that work well over
local filesystem and legacy NFS may break because of this.

Benny
_______________________________________________
nfsv4 mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/nfsv4

2011-01-05 19:14:34

by Myklebust, Trond

[permalink] [raw]
Subject: Re: [nfsv4] layoutcommits and file layout

On Wed, 2011-01-05 at 14:04 -0500, Trond Myklebust wrote:
> On Wed, 2011-01-05 at 21:01 +0200, Benny Halevy wrote:
> > On 2011-01-03 16:40, Trond Myklebust wrote:
> > > On Mon, 2011-01-03 at 16:21 +0200, Benny Halevy wrote:
> > >> On 2010-12-17 01:07, Christoph Hellwig wrote:
> > >>> On Thu, Dec 16, 2010 at 11:21:21AM -0500, Matt W. Benjamin wrote:
> > >>>> Hi,
> > >>>>
> > >>>> We have a files implementation which wants to receive LAYOUTCOMMIT when a client is finished with a layout. It was my clear understanding from rfc5661 that we could expect this behavior.
> > >>>
> > >>> Care to post it to the list?
> > >>>
> > >>
> > >> I don't know what Matt's server is doing but the fundamental problem is
> > >> manifested with extending a file with parallel DS writes.
> > >> Assuming that the DS writes are executed in arbitrary order,
> > >> exposing the file length before LAYOUTCOMMIT can cause
> > >> a concurrent reader to read a hole. Although locking can
> > >> solve this case, day-to-day applications that work well over
> > >> local filesystem and legacy NFS may break because of this.
> > >
> > > ...and this differs from ordinary NFS writes exactly how?
> > >
> > > Both cached and uncached (i.e. O_DIRECT) writes can and will be flushed
> > > to disk in entirely random order when writing to the MDS. If you have a
> > > parallel reader on another client (or even on the same client in the
> > > case of O_DIRECT), and want it to see accurate data, then use locking.
> > > If not, you will see holes and other strangeness.
> > >
> > > IOW: There are no 'day-to-day applications that work well over legacy
> > > NFS' that rely on this behaviour.
> > >
> >
> > Assuming the client writes sequentially (over tcp) the writes will
> > practically be processed in order into the server's cache so with
> > no crashes in the mix a parallel reader will see no holes.
> > I'd really like the following scenario to work over pNFS with
> > no hassles:
> > "some app >> foo" on one client, and
> > "tail -f foo" on another
>
> No, that doesn't work today! Believe me, I get the "bug reports"...
>
> There is no point in trying to add properties to pNFS that don't exist
> with ordinary NFS.

...and for the record: use of TCP does _not_ suffice to ensure writes
are processed in order.

In the Linux kernel, we have all sorts of parallelism going on before
the writes even hit the socket on the client. Everything from background
flushing to queuing in the sunrpc layer (e.g. for a session slot)
conspires to destroy any hope of ever achieving what you propose above.

That's not even counting what goes on with the server side. Think, for
instance, of the case where the server crashes before a COMMIT has been
successfully sent. Not only will your reader see holes, it will think
the file has been truncated...

Trond
--
Trond Myklebust
Linux NFS client maintainer

NetApp
[email protected]
http://www.netapp.com

_______________________________________________
nfsv4 mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/nfsv4

2011-01-05 19:01:27

by Benny Halevy

[permalink] [raw]
Subject: Re: [nfsv4] layoutcommits and file layout

On 2011-01-03 16:40, Trond Myklebust wrote:
> On Mon, 2011-01-03 at 16:21 +0200, Benny Halevy wrote:
>> On 2010-12-17 01:07, Christoph Hellwig wrote:
>>> On Thu, Dec 16, 2010 at 11:21:21AM -0500, Matt W. Benjamin wrote:
>>>> Hi,
>>>>
>>>> We have a files implementation which wants to receive LAYOUTCOMMIT when a client is finished with a layout. It was my clear understanding from rfc5661 that we could expect this behavior.
>>>
>>> Care to post it to the list?
>>>
>>
>> I don't know what Matt's server is doing but the fundamental problem is
>> manifested with extending a file with parallel DS writes.
>> Assuming that the DS writes are executed in arbitrary order,
>> exposing the file length before LAYOUTCOMMIT can cause
>> a concurrent reader to read a hole. Although locking can
>> solve this case, day-to-day applications that work well over
>> local filesystem and legacy NFS may break because of this.
>
> ...and this differs from ordinary NFS writes exactly how?
>
> Both cached and uncached (i.e. O_DIRECT) writes can and will be flushed
> to disk in entirely random order when writing to the MDS. If you have a
> parallel reader on another client (or even on the same client in the
> case of O_DIRECT), and want it to see accurate data, then use locking.
> If not, you will see holes and other strangeness.
>
> IOW: There are no 'day-to-day applications that work well over legacy
> NFS' that rely on this behaviour.
>

Assuming the client writes sequentially (over tcp) the writes will
practically be processed in order into the server's cache so with
no crashes in the mix a parallel reader will see no holes.
I'd really like the following scenario to work over pNFS with
no hassles:
"some app >> foo" on one client, and
"tail -f foo" on another

2011-01-05 19:04:24

by Myklebust, Trond

[permalink] [raw]
Subject: Re: [nfsv4] layoutcommits and file layout

On Wed, 2011-01-05 at 21:01 +0200, Benny Halevy wrote:
> On 2011-01-03 16:40, Trond Myklebust wrote:
> > On Mon, 2011-01-03 at 16:21 +0200, Benny Halevy wrote:
> >> On 2010-12-17 01:07, Christoph Hellwig wrote:
> >>> On Thu, Dec 16, 2010 at 11:21:21AM -0500, Matt W. Benjamin wrote:
> >>>> Hi,
> >>>>
> >>>> We have a files implementation which wants to receive LAYOUTCOMMIT when a client is finished with a layout. It was my clear understanding from rfc5661 that we could expect this behavior.
> >>>
> >>> Care to post it to the list?
> >>>
> >>
> >> I don't know what Matt's server is doing but the fundamental problem is
> >> manifested with extending a file with parallel DS writes.
> >> Assuming that the DS writes are executed in arbitrary order,
> >> exposing the file length before LAYOUTCOMMIT can cause
> >> a concurrent reader to read a hole. Although locking can
> >> solve this case, day-to-day applications that work well over
> >> local filesystem and legacy NFS may break because of this.
> >
> > ...and this differs from ordinary NFS writes exactly how?
> >
> > Both cached and uncached (i.e. O_DIRECT) writes can and will be flushed
> > to disk in entirely random order when writing to the MDS. If you have a
> > parallel reader on another client (or even on the same client in the
> > case of O_DIRECT), and want it to see accurate data, then use locking.
> > If not, you will see holes and other strangeness.
> >
> > IOW: There are no 'day-to-day applications that work well over legacy
> > NFS' that rely on this behaviour.
> >
>
> Assuming the client writes sequentially (over tcp) the writes will
> practically be processed in order into the server's cache so with
> no crashes in the mix a parallel reader will see no holes.
> I'd really like the following scenario to work over pNFS with
> no hassles:
> "some app >> foo" on one client, and
> "tail -f foo" on another

No, that doesn't work today! Believe me, I get the "bug reports"...

There is no point in trying to add properties to pNFS that don't exist
with ordinary NFS.

Trond
--
Trond Myklebust
Linux NFS client maintainer

NetApp
[email protected]
http://www.netapp.com