2011-03-30 15:58:12

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [RFC] spnfs-block: restore i_op->fallocate

On Wed, Mar 30, 2011 at 05:54:20PM +0200, Benny Halevy wrote:
> spnfsd-blocks needs the old inode_operations API for fallocate
> as it does not have a struct_file in hand.
>
> As all file systems (but xfs) currently use the struct file argument
> to get to the inode move their implementation back into a inode operation.
> Introduce generic_file_fallocate that can be used as the file_operations
> method that just does that and calls i_op->fallocate.
>
> Refactor the xfs implementation and introduce _xfs_vn_fallocate
> that takes an addition attr_flags, which value depends on the struct file
> argument to xfs_file_fallocate.

NAK. Not only isn't spnfsd-block not upstream, but I probably never will be
given what a piece of junk it is.

Second making fallocate a file operation was done on purpose, and all the
other filesystem need the same fix that xfs has - making the allocation
stable if done on an O_SYNC file descriptor.



2011-03-30 17:33:44

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [RFC] spnfs-block: restore i_op->fallocate

On Wed, Mar 30, 2011 at 07:11:47PM +0200, Benny Halevy wrote:
> Makes sense. This could also be done by adding a respective flags argument
> to fallocate and have a common wrapper function look at the file descriptor
> and call the fs fallocate, that could then get the inode rather the file.
> In other words, why copy code rather than factor it out into a common
> function?

We can discuss that _iff_ a valid use for a file-less fallocate appears
in mainline. The pnfs-block one is not. It's just a racy hack, which
opens gapping holes. Take a look what it does - it allocates block for
a client to write into directly, with absolutely zero guarantee the
block allocation actually stays around until that point.

You'll need to have some outstanding token on extent map changes like
done in CXFS or NEC's "gfs" which implemented something similar to pnfs
based on nfsv3.


2011-03-30 17:11:31

by Benny Halevy

[permalink] [raw]
Subject: Re: [RFC] spnfs-block: restore i_op->fallocate

On 2011-03-30 17:58, Christoph Hellwig wrote:

> On Wed, Mar 30, 2011 at 05:54:20PM +0200, Benny Halevy wrote:
>> spnfsd-blocks needs the old inode_operations API for fallocate
>> as it does not have a struct_file in hand.
>>
>> As all file systems (but xfs) currently use the struct file argument
>> to get to the inode move their implementation back into a inode operation.
>> Introduce generic_file_fallocate that can be used as the file_operations
>> method that just does that and calls i_op->fallocate.
>>
>> Refactor the xfs implementation and introduce _xfs_vn_fallocate
>> that takes an addition attr_flags, which value depends on the struct file
>> argument to xfs_file_fallocate.
> NAK. Not only isn't spnfsd-block not upstream, but I probably never will be
> given what a piece of junk it is.
>
> Second making fallocate a file operation was done on purpose, and all the

I understand that from the API perspective but note that other than O_SYNC
there's no use for the struct file * passed in.

> other filesystem need the same fix that xfs has - making the allocation
> stable if done on an O_SYNC file descriptor.

Makes sense. This could also be done by adding a respective flags argument
to fallocate and have a common wrapper function look at the file descriptor
and call the fs fallocate, that could then get the inode rather the file.
In other words, why copy code rather than factor it out into a common
function?

Benny

> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html


2011-03-31 13:53:39

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [RFC] spnfs-block: restore i_op->fallocate

Btw, how is the spnfs-block support supposed to work at all?

fallocate creates unwritten extents, and I can't actually
spot a place that would later convert them to regular extents.
And how does it work for filesystems without ->fallocate like
ext3?

And how do we prevent clients from reading uninitialized
blocks in areas allocated on the server but not written
to yet. Is there anything like unwritten extents in the
on the write protocol?


2011-03-31 06:54:31

by Benny Halevy

[permalink] [raw]
Subject: Re: [RFC] spnfs-block: restore i_op->fallocate

On 2011-03-30 19:33, Christoph Hellwig wrote:

> On Wed, Mar 30, 2011 at 07:11:47PM +0200, Benny Halevy wrote:
>> Makes sense. This could also be done by adding a respective flags argument
>> to fallocate and have a common wrapper function look at the file descriptor
>> and call the fs fallocate, that could then get the inode rather the file.
>> In other words, why copy code rather than factor it out into a common
>> function?
> We can discuss that _iff_ a valid use for a file-less fallocate appears
> in mainline. The pnfs-block one is not. It's just a racy hack, which
> opens gapping holes. Take a look what it does - it allocates block for
> a client to write into directly, with absolutely zero guarantee the
> block allocation actually stays around until that point.
>
> You'll need to have some outstanding token on extent map changes like
> done in CXFS or NEC's "gfs" which implemented something similar to pnfs
> based on nfsv3.

Agreed.

Benny

> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html


2011-04-01 08:30:46

by Benny Halevy

[permalink] [raw]
Subject: Re: [RFC] spnfs-block: restore i_op->fallocate

On 2011-03-31 09:53, Christoph Hellwig wrote:

> Btw, how is the spnfs-block support supposed to work at all?
>
> fallocate creates unwritten extents, and I can't actually
> spot a place that would later convert them to regular extents.

It's supposed to work by committing the extents on
layoutcommit. It's supposed to happen in the spnfs-block
but it doesn't. Currently, the generic layer calls write_inode_now
if the size changes and the fs is exported "sync" so my guess is that
it works now only when the file is extended but not when writing
in-place into holes.

> And how does it work for filesystems without ->fallocate like
> ext3?

It doesn't. spnfs-block requires fs support for fallocate and fiemap.

> And how do we prevent clients from reading uninitialized
> blocks in areas allocated on the server but not written
> to yet. Is there anything like unwritten extents in the
> on the write protocol?

Yes, there is, yet spnfs-block does not implement it
as it was implemented essentially as a reference/testing tool.

The protocol allows the server to provisionally allocate space
on layoutget that the client can write into, privately.
The clients changes only become visible to other clients
when they are committed to the file on LAYOUTCOMMIT.
This also allows implementing copy-on-write as the client
can be given in the layout separate extents describing the
readable copy of the block and the writeable one and the
client participates in the copy-on-write process by copying
the contents of the block before modifying it (or zeroing it out
if it's just invalid). This is done at write_begin time on
the client side.

Benny

> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html