2013-10-28 15:00:42

by Anna Schumaker

[permalink] [raw]
Subject: [PATCH 0/3] NFS: Add support for fallocate and llseek

These patches add in support for the fallocate() and llseek() system calls
using the NFS v4.2 operations WRITE_PLUS and SEEK to do the work on the
server. This patchset uses constants defined in the matching server patches,
so these should probably be applied second.

Questions, comments and thoughts are appreciated!

Anna

Anna Schumaker (3):
NFS: Use WRITE_PLUS for hole punches
NFS: Allow for asynchronous WRITE_PLUS calls
NFS: Implement SEEK

fs/nfs/callback.h | 13 +++
fs/nfs/callback_proc.c | 8 ++
fs/nfs/callback_xdr.c | 54 ++++++++++-
fs/nfs/inode.c | 2 +
fs/nfs/nfs4_fs.h | 9 ++
fs/nfs/nfs4file.c | 143 +++++++++++++++++++++++++++++
fs/nfs/nfs4proc.c | 68 ++++++++++++++
fs/nfs/nfs4xdr.c | 233 ++++++++++++++++++++++++++++++++++++++++++++++++
include/linux/nfs4.h | 4 +
include/linux/nfs_xdr.h | 49 ++++++++++
10 files changed, 580 insertions(+), 3 deletions(-)

--
1.8.4.1



2013-10-29 13:41:48

by Myklebust, Trond

[permalink] [raw]
Subject: Re: [PATCH 2/3] NFS: Allow for asynchronous WRITE_PLUS calls


On Oct 29, 2013, at 9:24 AM, Hellwig Christoph <[email protected]> wrote:

> On Tue, Oct 29, 2013 at 01:21:26PM +0000, Myklebust, Trond wrote:
>> The current draft spec even allows the client to specify a "pattern" to be written to the "hole".
>
> That's indeed the WRITE SAME lookalike then. At least the SCSI people
> were smart enough to define an unmap bit for "hole punching" and allow
> the target to reject all other versions if they don't want to support
> it.
>
> Given that NFS v4.2 isn't finalized I'd very much recommend to
>
> a) properly separate those case s
> b) do not make the async version mandatory

union write_plus_arg4 switch (data_content4 wpa_content) {
case NFS4_CONTENT_DATA:
data4 wpa_data;
case NFS4_CONTENT_APP_DATA_HOLE:
app_data_hole4 wpa_adh;
case NFS4_CONTENT_HOLE:
data_info4 wpa_hole;
default:
void;
};

I suppose we could add cases: NFS4_CONTENT_APP_DATA_HOLE_SYNC, NFS4_CONTENT_HOLE_SYNC to allow the client to specify that it doesn't want asynchronous behaviour (and add error cases to allow the server to specify that it cannot safely do so).

There is already a 'di_allocated' boolean for the NFS4_CONTENT_HOLE case (which is used to distinguish between holes and zeroed extents). The only thing missing there is an error case, AFAICS, to allow the server to return that it doesn't support holes.

Tom, any thoughts on whether or not this kind of change to the spec is doable at this time?


2013-10-29 12:48:36

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH 2/3] NFS: Allow for asynchronous WRITE_PLUS calls

On Tue, Oct 29, 2013 at 12:44:50PM +0000, Myklebust, Trond wrote:
> It exists because the server vendors were worried that operations such as preallocation and/or hole punching can take a more or less unbounded amount of time due to the 64-bit size. By using an (optional) callback method, the server can free up the RPC slot that would otherwise be kept waiting in the synchronous case.

Must be some horried filesystems on those vendors servers. Both
operations should be O(nr_extents), where nr_extents should be extremely
low for the preallocation and still reasonably low for hole punches,
although users control the allocastion pattern.

There's a reason why these aren't aio operations locally either.


2013-10-29 12:44:52

by Myklebust, Trond

[permalink] [raw]
Subject: Re: [PATCH 2/3] NFS: Allow for asynchronous WRITE_PLUS calls


On Oct 29, 2013, at 3:39 AM, Christoph Hellwig <[email protected]> wrote:

> On Mon, Oct 28, 2013 at 11:00:17AM -0400, Anna Schumaker wrote:
>> Clients are required to support CB_OFFLOAD for the NFS4_CONTENT_HOLE arm
>> of the WRITE_PLUS union.
>
> This sounds pretty stupid. Just curious, who got this braindamage into
> the standard?
>

It exists because the server vendors were worried that operations such as preallocation and/or hole punching can take a more or less unbounded amount of time due to the 64-bit size. By using an (optional) callback method, the server can free up the RPC slot that would otherwise be kept waiting in the synchronous case.

Cheers
Trond

2013-10-29 13:08:57

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH 2/3] NFS: Allow for asynchronous WRITE_PLUS calls

On Tue, Oct 29, 2013 at 01:05:44PM +0000, Myklebust, Trond wrote:
> Imagine someone wanting to punch a 50TB hole in NTFS, for instance. It doesn't have real holes, so you'd end up needing to zero out the existing extents in that region of the file. That will take time, even with an O(nr_extents) algorithm.

That behaviour is not a hole punch, and should not be multiplexed onto
a whole punch on the wire command! If such a use case is important
enough there should be an equivanet of the SCSI WRITE SAME command which
might make some sense to be implemented async.

We'd surely not support it in the Linux nfsd, though.

2013-10-29 12:41:29

by Anna Schumaker

[permalink] [raw]
Subject: Re: [PATCH 1/3] NFS: Use WRITE_PLUS for hole punches

On Tue 29 Oct 2013 03:39:08 AM EDT, Christoph Hellwig wrote:
> On Mon, Oct 28, 2013 at 11:00:16AM -0400, Anna Schumaker wrote:
>> This patch implements a version of fallocate for NFS v4. In the v4.2
>> case, we use WRITE_PLUS with DATA_CONTENT_HOLE set to punch a hole in a
>> file. In NFS < v4.2, we fall back to the generic VFS fallocate
>> implementation.
>
> What generic VFS fallocate implementation do you fall back to? Not only
> can't I find a call to to it, nor does such a thing even exist.
>

Hmm... good question. I'm not sure what I was thinking when I wrote
that comment, so I'll remove it for v2 of these patches.

2013-10-29 07:39:09

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH 1/3] NFS: Use WRITE_PLUS for hole punches

On Mon, Oct 28, 2013 at 11:00:16AM -0400, Anna Schumaker wrote:
> This patch implements a version of fallocate for NFS v4. In the v4.2
> case, we use WRITE_PLUS with DATA_CONTENT_HOLE set to punch a hole in a
> file. In NFS < v4.2, we fall back to the generic VFS fallocate
> implementation.

What generic VFS fallocate implementation do you fall back to? Not only
can't I find a call to to it, nor does such a thing even exist.


2013-10-28 15:00:46

by Anna Schumaker

[permalink] [raw]
Subject: [PATCH 3/3] NFS: Implement SEEK

The SEEK operation is used when an application makes an lseek call with
either the SEEK_HOLE or SEEK_DATA flags set. I fall back on
nfs_file_llseek() when other flags are set or in the NFS < 4.2 case.

Signed-off-by: Anna Schumaker <[email protected]>
---
fs/nfs/nfs4_fs.h | 1 +
fs/nfs/nfs4file.c | 38 +++++++++++++++++++++
fs/nfs/nfs4proc.c | 29 ++++++++++++++++
fs/nfs/nfs4xdr.c | 91 +++++++++++++++++++++++++++++++++++++++++++++++++
include/linux/nfs4.h | 1 +
include/linux/nfs_xdr.h | 20 +++++++++++
6 files changed, 180 insertions(+)

diff --git a/fs/nfs/nfs4_fs.h b/fs/nfs/nfs4_fs.h
index f52fc5f..badc1fa 100644
--- a/fs/nfs/nfs4_fs.h
+++ b/fs/nfs/nfs4_fs.h
@@ -355,6 +355,7 @@ nfs4_state_protect_write(struct nfs_client *clp, struct rpc_clnt **clntp,
#ifdef CONFIG_NFS_V4_2
int nfs42_proc_fallocate(struct nfs_server *, struct nfs42_write_plus_args *,
struct nfs42_write_res *, struct rpc_cred *);
+loff_t nfs42_proc_llseek(struct inode *, nfs4_stateid *, loff_t, int);
#endif /* CONFIG_NFS_V4_2 */

extern const struct nfs4_minor_version_ops *nfs_v4_minor_ops[];
diff --git a/fs/nfs/nfs4file.c b/fs/nfs/nfs4file.c
index caf0658..b88ba0c 100644
--- a/fs/nfs/nfs4file.c
+++ b/fs/nfs/nfs4file.c
@@ -218,10 +218,48 @@ static long nfs42_fallocate(struct file *file, int mode, loff_t offset, loff_t l
err = nfs_wait_for_offload(&res);
return err;
}
+
+static loff_t nfs42_file_llseek(struct file *filep, loff_t offset, int whence)
+{
+ nfs4_stateid stateid;
+ struct nfs_open_context *ctx;
+ struct inode *inode = file_inode(filep);
+ loff_t pos;
+
+ pos = nfs42_select_stateid(filep, &stateid, FMODE_READ | FMODE_WRITE, &ctx);
+ if (pos < 0)
+ return pos;
+
+ nfs_wb_all(inode);
+ pos = nfs42_proc_llseek(inode, &stateid, offset, whence);
+ if (pos < 0)
+ return pos;
+ return vfs_setpos(filep, pos, inode->i_sb->s_maxbytes);
+}
+
+static loff_t nfs4_file_llseek(struct file *filep, loff_t offset, int whence)
+{
+ struct nfs_server *server = NFS_SERVER(file_inode(filep));
+
+ if (server->nfs_client->cl_minorversion < 2)
+ return nfs_file_llseek(filep, offset, whence);
+
+ switch (whence) {
+ case SEEK_HOLE:
+ case SEEK_DATA:
+ return nfs42_file_llseek(filep, offset, whence);
+ default:
+ return nfs_file_llseek(filep, offset, whence);
+ }
+}
#endif /* CONFIG_NFS_V4_2 */

const struct file_operations nfs4_file_operations = {
+#ifdef CONFIG_NFS_V4_2
+ .llseek = nfs4_file_llseek,
+#else
.llseek = nfs_file_llseek,
+#endif
.read = do_sync_read,
.write = do_sync_write,
.aio_read = nfs_file_read,
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 003bacb..79f1295 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -7865,6 +7865,35 @@ int nfs42_proc_fallocate(struct nfs_server *server,

return err;
}
+
+loff_t nfs42_proc_llseek(struct inode *inode, nfs4_stateid *stateid,
+ loff_t offset, int whence)
+{
+ struct nfs42_seek_args args = {
+ .sa_fh = NFS_FH(inode),
+ .sa_stateid = stateid,
+ .sa_offset = offset,
+ };
+ struct nfs42_seek_res res;
+ struct rpc_message msg = {
+ .rpc_proc = &nfs4_procedures[NFSPROC4_CLNT_SEEK],
+ .rpc_argp = &args,
+ .rpc_resp = &res,
+ };
+ struct nfs_server *server = NFS_SERVER(inode);
+ int status;
+
+ if (whence == SEEK_HOLE)
+ args.sa_what = NFS4_CONTENT_HOLE;
+ else
+ args.sa_what = NFS4_CONTENT_DATA;
+
+ status = nfs4_call_sync(server->client, server, &msg,
+ &args.seq_args, &res.seq_res, 0);
+ if (status)
+ return status;
+ return res.sr_offset;
+}
#endif /* CONFIG_NFS_V4_2 */

static bool nfs4_match_stateid(const nfs4_stateid *s1,
diff --git a/fs/nfs/nfs4xdr.c b/fs/nfs/nfs4xdr.c
index 28e2fad..f7ced39 100644
--- a/fs/nfs/nfs4xdr.c
+++ b/fs/nfs/nfs4xdr.c
@@ -435,6 +435,10 @@ static int nfs4_stat_to_errno(int);
2 /* bytes written */ + \
1 /* committed */ + \
XDR_QUADLEN(NFS4_VERIFIER_SIZE))
+#define encode_seek_maxsz (op_encode_hdr_maxsz + \
+ XDR_QUADLEN(NFS4_STATEID_SIZE) + \
+ 2 + 1)
+#define decode_seek_maxsz (op_decode_hdr_maxsz + 1 + 1 + 5)
#endif /* CONFIG_NFS_V4_2 */

#define NFS4_enc_compound_sz (1024) /* XXX: large enough? */
@@ -903,6 +907,12 @@ EXPORT_SYMBOL_GPL(nfs41_maxgetdevinfo_overhead);
#define NFS4_dec_write_plus_sz (compound_decode_hdr_maxsz + \
decode_putfh_maxsz + \
decode_write_plus_maxsz)
+#define NFS4_enc_seek_sz (compound_encode_hdr_maxsz + \
+ encode_putfh_maxsz + \
+ encode_seek_maxsz)
+#define NFS4_dec_seek_sz (compound_decode_hdr_maxsz + \
+ decode_putfh_maxsz + \
+ decode_seek_maxsz)
#endif /* CONFIG_NFS_V4_2 */

static const umode_t nfs_type2fmt[] = {
@@ -2104,6 +2114,16 @@ static void encode_write_plus(struct xdr_stream *xdr,
encode_uint32(xdr, 1);
encode_write_plus_hole(xdr, args);
}
+
+static void encode_seek(struct xdr_stream *xdr,
+ struct nfs42_seek_args *args,
+ struct compound_hdr *hdr)
+{
+ encode_op_hdr(xdr, OP_SEEK, decode_seek_maxsz, hdr);
+ encode_nfs4_stateid(xdr, args->sa_stateid);
+ encode_uint64(xdr, args->sa_offset);
+ encode_uint32(xdr, args->sa_what);
+}
#endif /* CONFIG_NFS_V4_2 */

/*
@@ -3071,6 +3091,25 @@ static void nfs4_xdr_enc_write_plus(struct rpc_rqst *req,
encode_nops(&hdr);
return;
}
+
+/*
+ * Encode SEEK request
+ */
+static void nfs4_xdr_enc_seek(struct rpc_rqst *req,
+ struct xdr_stream *xdr,
+ struct nfs42_seek_args *args)
+{
+ struct compound_hdr hdr = {
+ .minorversion = nfs4_xdr_minorversion(&args->seq_args),
+ };
+
+ encode_compound_hdr(xdr, req, &hdr);
+ encode_sequence(xdr, &args->seq_args, &hdr);
+ encode_putfh(xdr, args->sa_fh, &hdr);
+ encode_seek(xdr, args, &hdr);
+ encode_nops(&hdr);
+ return;
+}
#endif /* CONFIG_NFS_V4_2 */

static void print_overflow_msg(const char *func, const struct xdr_stream *xdr)
@@ -6081,6 +6120,33 @@ static int decode_writeplus(struct xdr_stream *xdr, struct nfs42_write_res *res)

return decode_write_response(xdr, res);
}
+
+static int decode_seek(struct xdr_stream *xdr, struct nfs42_seek_res *res)
+{
+ int status;
+ __be32 *p;
+
+ status = decode_op_hdr(xdr, OP_SEEK);
+ if (status)
+ return status;
+
+ p = xdr_inline_decode(xdr, 28);
+ if (unlikely(!p))
+ goto out_overflow;
+
+ res->sr_eof = be32_to_cpup(p++);
+ res->sr_whence = be32_to_cpup(p++);
+ p = xdr_decode_hyper(p, &res->sr_offset);
+ p = xdr_decode_hyper(p, &res->sr_length);
+ res->sr_allocated = be32_to_cpup(p);
+ return 0;
+
+out_overflow:
+ print_overflow_msg(__func__, xdr);
+ return -EIO;
+
+ return 0;
+}
#endif /* CONFIG_NFS_V4_2 */

/*
@@ -7320,6 +7386,30 @@ static int nfs4_xdr_dec_write_plus(struct rpc_rqst *rqstp,
out:
return status;
}
+
+/*
+ * Decode SEEK request
+ */
+static int nfs4_xdr_dec_seek(struct rpc_rqst *rqstp,
+ struct xdr_stream *xdr,
+ struct nfs42_seek_res *res)
+{
+ struct compound_hdr hdr;
+ int status;
+
+ status = decode_compound_hdr(xdr, &hdr);
+ if (status)
+ goto out;
+ status = decode_sequence(xdr, &res->seq_res, rqstp);
+ if (status)
+ goto out;
+ status = decode_putfh(xdr);
+ if (status)
+ goto out;
+ status = decode_seek(xdr, res);
+out:
+ return status;
+}
#endif /* CONFIG_NFS_V4_2 */

/**
@@ -7533,6 +7623,7 @@ struct rpc_procinfo nfs4_procedures[] = {
#endif /* CONFIG_NFS_V4_1 */
#if defined(CONFIG_NFS_V4_2)
PROC(WRITE_PLUS, enc_write_plus, dec_write_plus),
+ PROC(SEEK, enc_seek, dec_seek),
#endif /* CONFIG_NFS_V4_2 */
};

diff --git a/include/linux/nfs4.h b/include/linux/nfs4.h
index 237016a..0d9033a 100644
--- a/include/linux/nfs4.h
+++ b/include/linux/nfs4.h
@@ -492,6 +492,7 @@ enum {

/* nfs42 */
NFSPROC4_CLNT_WRITE_PLUS,
+ NFSPROC4_CLNT_SEEK,
};

/* nfs41 types */
diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h
index d8cbe5a..a46f803 100644
--- a/include/linux/nfs_xdr.h
+++ b/include/linux/nfs_xdr.h
@@ -1236,6 +1236,26 @@ struct nfs42_write_plus_args {
u32 wp_allocated;
};

+struct nfs42_seek_args {
+ struct nfs4_sequence_args seq_args;
+
+ struct nfs_fh *sa_fh;
+ nfs4_stateid *sa_stateid;
+ u64 sa_offset;
+ u32 sa_what;
+};
+
+struct nfs42_seek_res {
+ struct nfs4_sequence_res seq_res;
+ unsigned int status;
+
+ u32 sr_eof;
+ u32 sr_whence;
+ u64 sr_offset;
+ u64 sr_length;
+ u32 sr_allocated;
+};
+
struct nfs42_write_res
{
struct nfs4_sequence_res seq_res;
--
1.8.4.1


2013-10-29 13:05:45

by Myklebust, Trond

[permalink] [raw]
Subject: Re: [PATCH 2/3] NFS: Allow for asynchronous WRITE_PLUS calls


On Oct 29, 2013, at 8:48 AM, Hellwig Christoph <[email protected]> wrote:

> On Tue, Oct 29, 2013 at 12:44:50PM +0000, Myklebust, Trond wrote:
>> It exists because the server vendors were worried that operations such as preallocation and/or hole punching can take a more or less unbounded amount of time due to the 64-bit size. By using an (optional) callback method, the server can free up the RPC slot that would otherwise be kept waiting in the synchronous case.
>
> Must be some horried filesystems on those vendors servers. Both
> operations should be O(nr_extents), where nr_extents should be extremely
> low for the preallocation and still reasonably low for hole punches,
> although users control the allocastion pattern.
>
> There's a reason why these aren't aio operations locally either.
>

Imagine someone wanting to punch a 50TB hole in NTFS, for instance. It doesn't have real holes, so you'd end up needing to zero out the existing extents in that region of the file. That will take time, even with an O(nr_extents) algorithm.



2013-10-29 13:24:21

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH 2/3] NFS: Allow for asynchronous WRITE_PLUS calls

On Tue, Oct 29, 2013 at 01:21:26PM +0000, Myklebust, Trond wrote:
> The current draft spec even allows the client to specify a "pattern" to be written to the "hole".

That's indeed the WRITE SAME lookalike then. At least the SCSI people
were smart enough to define an unmap bit for "hole punching" and allow
the target to reject all other versions if they don't want to support
it.

Given that NFS v4.2 isn't finalized I'd very much recommend to

a) properly separate those case s
b) do not make the async version mandatory

2013-10-29 07:39:43

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH 2/3] NFS: Allow for asynchronous WRITE_PLUS calls

On Mon, Oct 28, 2013 at 11:00:17AM -0400, Anna Schumaker wrote:
> Clients are required to support CB_OFFLOAD for the NFS4_CONTENT_HOLE arm
> of the WRITE_PLUS union.

This sounds pretty stupid. Just curious, who got this braindamage into
the standard?


2013-10-28 15:00:45

by Anna Schumaker

[permalink] [raw]
Subject: [PATCH 2/3] NFS: Allow for asynchronous WRITE_PLUS calls

Clients are required to support CB_OFFLOAD for the NFS4_CONTENT_HOLE arm
of the WRITE_PLUS union.

Signed-off-by: Anna Schumaker <[email protected]>
---
fs/nfs/callback.h | 13 ++++++++++++
fs/nfs/callback_proc.c | 8 ++++++++
fs/nfs/callback_xdr.c | 54 +++++++++++++++++++++++++++++++++++++++++++++++---
fs/nfs/nfs4_fs.h | 3 +++
fs/nfs/nfs4file.c | 54 +++++++++++++++++++++++++++++++++++++++++++++++++-
fs/nfs/nfs4xdr.c | 2 +-
6 files changed, 129 insertions(+), 5 deletions(-)

diff --git a/fs/nfs/callback.h b/fs/nfs/callback.h
index 84326e9..1653958 100644
--- a/fs/nfs/callback.h
+++ b/fs/nfs/callback.h
@@ -187,6 +187,19 @@ extern __be32 nfs4_callback_devicenotify(
void *dummy, struct cb_process_state *cps);

#endif /* CONFIG_NFS_V4_1 */
+
+#ifdef CONFIG_NFS_V4_2
+struct cb_offloadargs {
+ struct nfs_fh dst_fh;
+ nfs4_stateid stateid;
+ struct nfs42_write_res write_res;
+};
+
+extern __be32 nfs4_callback_offload(struct cb_offloadargs *, void *,
+ struct cb_process_state *);
+int nfs_wake_offload(struct cb_offloadargs *);
+#endif /* CONFIG_NFS_V4_2 */
+
extern int check_gss_callback_principal(struct nfs_client *, struct svc_rqst *);
extern __be32 nfs4_callback_getattr(struct cb_getattrargs *args,
struct cb_getattrres *res,
diff --git a/fs/nfs/callback_proc.c b/fs/nfs/callback_proc.c
index ae2e87b..8ff6400 100644
--- a/fs/nfs/callback_proc.c
+++ b/fs/nfs/callback_proc.c
@@ -536,3 +536,11 @@ out:
return status;
}
#endif /* CONFIG_NFS_V4_1 */
+
+#ifdef CONFIG_NFS_V4_2
+__be32 nfs4_callback_offload(struct cb_offloadargs *args, void *dummy,
+ struct cb_process_state *cps)
+{
+ return htonl(nfs_wake_offload(args));
+}
+#endif /* CONFIG_NFS_V4_2 */
diff --git a/fs/nfs/callback_xdr.c b/fs/nfs/callback_xdr.c
index f4ccfe6..843e074 100644
--- a/fs/nfs/callback_xdr.c
+++ b/fs/nfs/callback_xdr.c
@@ -35,6 +35,14 @@
#define CB_OP_RECALLSLOT_RES_MAXSZ (CB_OP_HDR_RES_MAXSZ)
#endif /* CONFIG_NFS_V4_1 */

+#if defined(CONFIG_NFS_V4_2)
+#define CB_OP_OFFLOAD_RES_MAXSZ (CB_OP_HDR_RES_MAXSZ + \
+ 1 + XDR_QUADLEN(NFS4_FHSIZE) + \
+ XDR_QUADLEN(NFS4_STATEID_SIZE) + \
+ 1 + XDR_QUADLEN(NFS4_STATEID_SIZE) + \
+ 2 + 1 + XDR_QUADLEN(NFS4_VERIFIER_SIZE))
+#endif /* CONFIG_NFS_V4_2 */
+
#define NFSDBG_FACILITY NFSDBG_CALLBACK

/* Internal error code */
@@ -527,6 +535,37 @@ static __be32 decode_recallslot_args(struct svc_rqst *rqstp,

#endif /* CONFIG_NFS_V4_1 */

+#ifdef CONFIG_NFS_V4_2
+static inline __be32 decode_write_res(struct xdr_stream *xdr,
+ struct nfs42_write_res *write_res)
+{
+ __be32 status = decode_write_response(xdr, write_res);
+ if (status == -EIO)
+ return htonl(NFS4ERR_RESOURCE);
+ return htonl(status);
+}
+
+static __be32 decode_offload_args(struct svc_rqst *rqstp,
+ struct xdr_stream *xdr,
+ struct cb_offloadargs *args)
+{
+ __be32 status;
+
+ status = decode_fh(xdr, &args->dst_fh);
+ if (unlikely(status != 0))
+ goto out;
+
+ status = decode_stateid(xdr, &args->stateid);
+ if (unlikely(status != 0))
+ goto out;
+
+ status = decode_write_res(xdr, &args->write_res);
+out:
+ dprintk("%s: exit with status = %d\n", __func__, ntohl(status));
+ return status;
+}
+#endif /* CONFIG_NFS_V4_2 */
+
static __be32 encode_string(struct xdr_stream *xdr, unsigned int len, const char *str)
{
__be32 *p;
@@ -794,9 +833,11 @@ preprocess_nfs42_op(int nop, unsigned int op_nr, struct callback_op **op)
if (status != htonl(NFS4ERR_OP_ILLEGAL))
return status;

- if (op_nr == OP_CB_OFFLOAD)
- return htonl(NFS4ERR_NOTSUPP);
- return htonl(NFS4ERR_OP_ILLEGAL);
+ if (op_nr != OP_CB_OFFLOAD)
+ return htonl(NFS4ERR_OP_ILLEGAL);
+
+ *op = &callback_ops[op_nr];
+ return htonl(NFS4_OK);
}
#else /* CONFIG_NFS_V4_2 */
static __be32
@@ -991,6 +1032,13 @@ static struct callback_op callback_ops[] = {
.res_maxsize = CB_OP_RECALLSLOT_RES_MAXSZ,
},
#endif /* CONFIG_NFS_V4_1 */
+#if defined(CONFIG_NFS_V4_2)
+ [OP_CB_OFFLOAD] = {
+ .process_op = (callback_process_op_t)nfs4_callback_offload,
+ .decode_args = (callback_decode_arg_t)decode_offload_args,
+ .res_maxsize = CB_OP_OFFLOAD_RES_MAXSZ,
+ },
+#endif
};

/*
diff --git a/fs/nfs/nfs4_fs.h b/fs/nfs/nfs4_fs.h
index 1557f15..f52fc5f 100644
--- a/fs/nfs/nfs4_fs.h
+++ b/fs/nfs/nfs4_fs.h
@@ -464,6 +464,9 @@ static inline void nfs4_unregister_sysctl(void)

/* nfs4xdr.c */
extern struct rpc_procinfo nfs4_procedures[];
+#if defined(CONFIG_NFS_V4_2)
+int decode_write_response(struct xdr_stream *, struct nfs42_write_res *);
+#endif /* CONFIG_NFS_V4_2 */

struct nfs4_mount_data;

diff --git a/fs/nfs/nfs4file.c b/fs/nfs/nfs4file.c
index ab2fbe0..caf0658 100644
--- a/fs/nfs/nfs4file.c
+++ b/fs/nfs/nfs4file.c
@@ -4,6 +4,7 @@
* Copyright (C) 1992 Rick Sladkey
*/
#include <linux/nfs_fs.h>
+#include "callback.h"
#include "internal.h"
#include "fscache.h"
#include "pnfs.h"
@@ -119,6 +120,50 @@ nfs4_file_fsync(struct file *file, loff_t start, loff_t end, int datasync)
}

#ifdef CONFIG_NFS_V4_2
+static LIST_HEAD(nfs_offload_waitlist);
+static DEFINE_SPINLOCK(nfs_offload_lock);
+
+static unsigned int nfs_wait_for_offload(struct nfs42_write_res *res)
+{
+ spin_lock(&nfs_offload_lock);
+ list_add(&res->wait_list, &nfs_offload_waitlist);
+ spin_unlock(&nfs_offload_lock);
+
+ wait_for_completion(&res->completion);
+
+ spin_lock(&nfs_offload_lock);
+ list_del(&res->wait_list);
+ spin_unlock(&nfs_offload_lock);
+
+ return res->wr_status;
+}
+
+static struct nfs42_write_res *nfs_find_waiting_offload(nfs4_stateid *stateid)
+{
+ struct nfs42_write_res *cur;
+
+ list_for_each_entry(cur, &nfs_offload_waitlist, wait_list) {
+ if (memcmp(stateid, &cur->wr_stateid, sizeof(nfs4_stateid)) == 0)
+ return cur;
+ }
+ return NULL;
+}
+
+int nfs_wake_offload(struct cb_offloadargs *offload)
+{
+ struct nfs42_write_res *write;
+
+ spin_lock(&nfs_offload_lock);
+ write = nfs_find_waiting_offload(&offload->stateid);
+ spin_unlock(&nfs_offload_lock);
+ if (write == NULL)
+ return NFS4ERR_BAD_STATEID;
+
+ write->wr_bytes_copied = offload->write_res.wr_bytes_copied;
+ complete(&write->completion);
+ return NFS4_OK;
+}
+
static int nfs42_select_stateid(struct file *file, nfs4_stateid *stateid,
fmode_t mode, struct nfs_open_context **ctx)
{
@@ -164,7 +209,14 @@ static long nfs42_fallocate(struct file *file, int mode, loff_t offset, loff_t l
if (err < 0)
return err;

- return nfs42_proc_fallocate(server, &args, &res, ctx->cred);
+ init_completion(&res.completion);
+ err = nfs42_proc_fallocate(server, &args, &res, ctx->cred);
+ if (err)
+ return err;
+
+ if (res.wr_async == true)
+ err = nfs_wait_for_offload(&res);
+ return err;
}
#endif /* CONFIG_NFS_V4_2 */

diff --git a/fs/nfs/nfs4xdr.c b/fs/nfs/nfs4xdr.c
index 4ffecbe..28e2fad 100644
--- a/fs/nfs/nfs4xdr.c
+++ b/fs/nfs/nfs4xdr.c
@@ -6041,7 +6041,7 @@ static int decode_free_stateid(struct xdr_stream *xdr,
#endif /* CONFIG_NFS_V4_1 */

#ifdef CONFIG_NFS_V4_2
-static int decode_write_response(struct xdr_stream *xdr,
+int decode_write_response(struct xdr_stream *xdr,
struct nfs42_write_res *write_res)
{
__be32 *p;
--
1.8.4.1


2013-10-29 13:44:10

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH 2/3] NFS: Allow for asynchronous WRITE_PLUS calls

On Tue, Oct 29, 2013 at 06:24:21AM -0700, Hellwig Christoph wrote:
> On Tue, Oct 29, 2013 at 01:21:26PM +0000, Myklebust, Trond wrote:
> > The current draft spec even allows the client to specify a "pattern" to be written to the "hole".
>
> That's indeed the WRITE SAME lookalike then. At least the SCSI people
> were smart enough to define an unmap bit for "hole punching" and allow
> the target to reject all other versions if they don't want to support
> it.
>
> Given that NFS v4.2 isn't finalized I'd very much recommend to
>
> a) properly separate those case s
> b) do not make the async version mandatory

That sounds reasonable to me.

By the way, source is available from https://github.com/loghyr/NFSv4.2,
and Tom takes patches.....

(Or maybe Anna can be talked into that.)

--b.

2013-10-29 13:21:28

by Myklebust, Trond

[permalink] [raw]
Subject: Re: [PATCH 2/3] NFS: Allow for asynchronous WRITE_PLUS calls


On Oct 29, 2013, at 9:08 AM, Hellwig Christoph <[email protected]> wrote:

> On Tue, Oct 29, 2013 at 01:05:44PM +0000, Myklebust, Trond wrote:
>> Imagine someone wanting to punch a 50TB hole in NTFS, for instance. It doesn't have real holes, so you'd end up needing to zero out the existing extents in that region of the file. That will take time, even with an O(nr_extents) algorithm.
>
> That behaviour is not a hole punch, and should not be multiplexed onto
> a whole punch on the wire command! If such a use case is important
> enough there should be an equivanet of the SCSI WRITE SAME command which
> might make some sense to be implemented async.
>
> We'd surely not support it in the Linux nfsd, though.

The current draft spec even allows the client to specify a "pattern" to be written to the "hole". It is mainly there for applications like Oracle, that initialise empty blocks with a non-trivial pattern so that they can do the on-disk equivalent of memory poisoning. By doing it in this way, it allows the more powerful storage arrays to just implement that patterned region as a set of deduplicated blocks instead of actually allocating all the blocks.

You can indeed argue that filesystems that don't have holes and/or deduplication should not implement this operation at all, however the functionality is there in case they do...


2013-10-28 15:00:44

by Anna Schumaker

[permalink] [raw]
Subject: [PATCH 1/3] NFS: Use WRITE_PLUS for hole punches

This patch implements a version of fallocate for NFS v4. In the v4.2
case, we use WRITE_PLUS with DATA_CONTENT_HOLE set to punch a hole in a
file. In NFS < v4.2, we fall back to the generic VFS fallocate
implementation.

Signed-off-by: Anna Schumaker <[email protected]>
---
fs/nfs/inode.c | 2 +
fs/nfs/nfs4_fs.h | 5 ++
fs/nfs/nfs4file.c | 53 ++++++++++++++++++
fs/nfs/nfs4proc.c | 39 +++++++++++++
fs/nfs/nfs4xdr.c | 142 ++++++++++++++++++++++++++++++++++++++++++++++++
include/linux/nfs4.h | 3 +
include/linux/nfs_xdr.h | 29 ++++++++++
7 files changed, 273 insertions(+)

diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
index eda8879..de0efbd 100644
--- a/fs/nfs/inode.c
+++ b/fs/nfs/inode.c
@@ -680,6 +680,7 @@ struct nfs_lock_context *nfs_get_lock_context(struct nfs_open_context *ctx)
kfree(new);
return res;
}
+EXPORT_SYMBOL_GPL(nfs_get_lock_context);

void nfs_put_lock_context(struct nfs_lock_context *l_ctx)
{
@@ -692,6 +693,7 @@ void nfs_put_lock_context(struct nfs_lock_context *l_ctx)
spin_unlock(&inode->i_lock);
kfree(l_ctx);
}
+EXPORT_SYMBOL_GPL(nfs_put_lock_context);

/**
* nfs_close_context - Common close_context() routine NFSv2/v3
diff --git a/fs/nfs/nfs4_fs.h b/fs/nfs/nfs4_fs.h
index 28842ab..1557f15 100644
--- a/fs/nfs/nfs4_fs.h
+++ b/fs/nfs/nfs4_fs.h
@@ -352,6 +352,11 @@ nfs4_state_protect_write(struct nfs_client *clp, struct rpc_clnt **clntp,
}
#endif /* CONFIG_NFS_V4_1 */

+#ifdef CONFIG_NFS_V4_2
+int nfs42_proc_fallocate(struct nfs_server *, struct nfs42_write_plus_args *,
+ struct nfs42_write_res *, struct rpc_cred *);
+#endif /* CONFIG_NFS_V4_2 */
+
extern const struct nfs4_minor_version_ops *nfs_v4_minor_ops[];

extern const u32 nfs4_fattr_bitmap[3];
diff --git a/fs/nfs/nfs4file.c b/fs/nfs/nfs4file.c
index 77efaf1..ab2fbe0 100644
--- a/fs/nfs/nfs4file.c
+++ b/fs/nfs/nfs4file.c
@@ -118,6 +118,56 @@ nfs4_file_fsync(struct file *file, loff_t start, loff_t end, int datasync)
return ret;
}

+#ifdef CONFIG_NFS_V4_2
+static int nfs42_select_stateid(struct file *file, nfs4_stateid *stateid,
+ fmode_t mode, struct nfs_open_context **ctx)
+{
+ struct nfs_lock_context *lock;
+ int ret;
+
+ *ctx = nfs_file_open_context(file);
+ if (!*ctx)
+ return -EBADF;
+
+ lock = nfs_get_lock_context(*ctx);
+ if (IS_ERR(lock))
+ return PTR_ERR(lock);
+
+ ret = nfs4_set_rw_stateid(stateid, *ctx, lock, mode);
+
+ if (lock)
+ nfs_put_lock_context(lock);
+ return ret;
+}
+
+static long nfs42_fallocate(struct file *file, int mode, loff_t offset, loff_t len)
+{
+ int err;
+ struct inode *inode = file_inode(file);
+ struct nfs42_write_plus_args args = {
+ .wp_fh = NFS_FH(inode),
+ .wp_stable = NFS_FILE_SYNC,
+ .wp_content = NFS4_CONTENT_HOLE,
+ .wp_offset = offset,
+ .wp_length = len,
+ .wp_allocated = (mode == 0),
+ };
+
+ struct nfs_open_context *ctx;
+ struct nfs42_write_res res;
+ struct nfs_server *server = NFS_SERVER(inode);
+
+ if (server->nfs_client->cl_minorversion < 2)
+ return -EOPNOTSUPP;
+
+ err = nfs42_select_stateid(file, &args.wp_stateid, FMODE_WRITE, &ctx);
+ if (err < 0)
+ return err;
+
+ return nfs42_proc_fallocate(server, &args, &res, ctx->cred);
+}
+#endif /* CONFIG_NFS_V4_2 */
+
const struct file_operations nfs4_file_operations = {
.llseek = nfs_file_llseek,
.read = do_sync_read,
@@ -133,6 +183,9 @@ const struct file_operations nfs4_file_operations = {
.flock = nfs_flock,
.splice_read = nfs_file_splice_read,
.splice_write = nfs_file_splice_write,
+#ifdef CONFIG_NFS_V4_2
+ .fallocate = nfs42_fallocate,
+#endif /* CONFIG_NFS_V4_2 */
.check_flags = nfs_check_flags,
.setlease = nfs_setlease,
};
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index d2b4845..003bacb 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -7828,6 +7828,45 @@ static bool nfs41_match_stateid(const nfs4_stateid *s1,

#endif /* CONFIG_NFS_V4_1 */

+#ifdef CONFIG_NFS_V4_2
+int _nfs42_proc_fallocate(struct nfs_server *server,
+ struct nfs42_write_plus_args *args,
+ struct nfs42_write_res *res,
+ struct rpc_cred *cred)
+{
+ struct rpc_message msg = {
+ .rpc_proc = &nfs4_procedures[NFSPROC4_CLNT_WRITE_PLUS],
+ .rpc_argp = args,
+ .rpc_resp = res,
+ .rpc_cred = cred,
+ };
+ int status;
+
+ status = nfs4_call_sync(server->client, server, &msg,
+ &args->seq_args, &res->seq_res, 0);
+ if (status == -NFS4ERR_NOTSUPP)
+ return -EOPNOTSUPP;
+ return status;
+}
+
+int nfs42_proc_fallocate(struct nfs_server *server,
+ struct nfs42_write_plus_args *args,
+ struct nfs42_write_res *res,
+ struct rpc_cred *cred)
+{
+ struct nfs4_exception exception = { };
+ int err;
+
+ do {
+ err = nfs4_handle_exception(server,
+ _nfs42_proc_fallocate(server, args, res, cred),
+ &exception);
+ } while (exception.retry);
+
+ return err;
+}
+#endif /* CONFIG_NFS_V4_2 */
+
static bool nfs4_match_stateid(const nfs4_stateid *s1,
const nfs4_stateid *s2)
{
diff --git a/fs/nfs/nfs4xdr.c b/fs/nfs/nfs4xdr.c
index 79210d2..4ffecbe 100644
--- a/fs/nfs/nfs4xdr.c
+++ b/fs/nfs/nfs4xdr.c
@@ -420,6 +420,23 @@ static int nfs4_stat_to_errno(int);
#define decode_sequence_maxsz 0
#endif /* CONFIG_NFS_V4_1 */

+#ifdef CONFIG_NFS_V4_2
+#define encode_write_plus_maxsz (op_encode_hdr_maxsz + \
+ XDR_QUADLEN(NFS4_STATEID_SIZE) + \
+ 1 /* stable */ + \
+ 1 /* number of writes */ + \
+ 1 /* data_content4 */ + \
+ 2 /* offset */ + \
+ 2 /* length */ + \
+ 1 /* allocated */)
+#define decode_write_plus_maxsz (op_decode_hdr_maxsz + \
+ 1 /* number of stateids */ +\
+ XDR_QUADLEN(NFS4_STATEID_SIZE) + \
+ 2 /* bytes written */ + \
+ 1 /* committed */ + \
+ XDR_QUADLEN(NFS4_VERIFIER_SIZE))
+#endif /* CONFIG_NFS_V4_2 */
+
#define NFS4_enc_compound_sz (1024) /* XXX: large enough? */
#define NFS4_dec_compound_sz (1024) /* XXX: large enough? */
#define NFS4_enc_read_sz (compound_encode_hdr_maxsz + \
@@ -879,6 +896,15 @@ const u32 nfs41_maxgetdevinfo_overhead = ((RPC_MAX_REPHEADER_WITH_AUTH +
EXPORT_SYMBOL_GPL(nfs41_maxgetdevinfo_overhead);
#endif /* CONFIG_NFS_V4_1 */

+#ifdef CONFIG_NFS_V4_2
+#define NFS4_enc_write_plus_sz (compound_encode_hdr_maxsz + \
+ encode_putfh_maxsz + \
+ encode_write_plus_maxsz)
+#define NFS4_dec_write_plus_sz (compound_decode_hdr_maxsz + \
+ decode_putfh_maxsz + \
+ decode_write_plus_maxsz)
+#endif /* CONFIG_NFS_V4_2 */
+
static const umode_t nfs_type2fmt[] = {
[NF4BAD] = 0,
[NF4REG] = S_IFREG,
@@ -2058,6 +2084,28 @@ static void encode_free_stateid(struct xdr_stream *xdr,
}
#endif /* CONFIG_NFS_V4_1 */

+#ifdef CONFIG_NFS_V4_2
+static void encode_write_plus_hole(struct xdr_stream *xdr,
+ struct nfs42_write_plus_args *args)
+{
+ encode_uint32(xdr, NFS4_CONTENT_HOLE);
+ encode_uint64(xdr, args->wp_offset);
+ encode_uint64(xdr, args->wp_length);
+ encode_uint32(xdr, args->wp_allocated);
+}
+
+static void encode_write_plus(struct xdr_stream *xdr,
+ struct nfs42_write_plus_args *args,
+ struct compound_hdr *hdr)
+{
+ encode_op_hdr(xdr, OP_WRITE_PLUS, decode_write_plus_maxsz, hdr);
+ encode_nfs4_stateid(xdr, &args->wp_stateid);
+ encode_uint32(xdr, args->wp_stable);
+ encode_uint32(xdr, 1);
+ encode_write_plus_hole(xdr, args);
+}
+#endif /* CONFIG_NFS_V4_2 */
+
/*
* END OF "GENERIC" ENCODE ROUTINES.
*/
@@ -3004,6 +3052,27 @@ static void nfs4_xdr_enc_free_stateid(struct rpc_rqst *req,
}
#endif /* CONFIG_NFS_V4_1 */

+#ifdef CONFIG_NFS_V4_2
+/*
+ * Encode WRITE_PLUS request
+ */
+static void nfs4_xdr_enc_write_plus(struct rpc_rqst *req,
+ struct xdr_stream *xdr,
+ struct nfs42_write_plus_args *args)
+{
+ struct compound_hdr hdr = {
+ .minorversion = nfs4_xdr_minorversion(&args->seq_args),
+ };
+
+ encode_compound_hdr(xdr, req, &hdr);
+ encode_sequence(xdr, &args->seq_args, &hdr);
+ encode_putfh(xdr, args->wp_fh, &hdr);
+ encode_write_plus(xdr, args, &hdr);
+ encode_nops(&hdr);
+ return;
+}
+#endif /* CONFIG_NFS_V4_2 */
+
static void print_overflow_msg(const char *func, const struct xdr_stream *xdr)
{
dprintk("nfs: %s: prematurely hit end of receive buffer. "
@@ -5971,6 +6040,49 @@ static int decode_free_stateid(struct xdr_stream *xdr,
}
#endif /* CONFIG_NFS_V4_1 */

+#ifdef CONFIG_NFS_V4_2
+static int decode_write_response(struct xdr_stream *xdr,
+ struct nfs42_write_res *write_res)
+{
+ __be32 *p;
+ int num_ids;
+
+ p = xdr_inline_decode(xdr, 4);
+ if (unlikely(!p))
+ goto out_overflow;
+ num_ids = be32_to_cpup(p);
+
+ if (num_ids == 0)
+ write_res->wr_async = false;
+ else {
+ if (decode_stateid(xdr, &write_res->wr_stateid) != 0)
+ goto out_overflow;
+ write_res->wr_async = true;
+ }
+
+ p = xdr_inline_decode(xdr, 12);
+ if (unlikely(!p))
+ goto out_overflow;
+ p = xdr_decode_hyper(p, &write_res->wr_bytes_copied);
+ write_res->wr_committed = be32_to_cpup(p);
+
+ return decode_write_verifier(xdr, &write_res->wr_verf);
+
+out_overflow:
+ print_overflow_msg(__func__, xdr);
+ return -EIO;
+}
+
+static int decode_writeplus(struct xdr_stream *xdr, struct nfs42_write_res *res)
+{
+ res->wr_status = decode_op_hdr(xdr, OP_WRITE_PLUS);
+ if (res->wr_status)
+ return res->wr_status;
+
+ return decode_write_response(xdr, res);
+}
+#endif /* CONFIG_NFS_V4_2 */
+
/*
* END OF "GENERIC" DECODE ROUTINES.
*/
@@ -7183,6 +7295,33 @@ out:
}
#endif /* CONFIG_NFS_V4_1 */

+#ifdef CONFIG_NFS_V4_2
+/*
+ * Decode WRITE_PLUS response
+ */
+static int nfs4_xdr_dec_write_plus(struct rpc_rqst *rqstp,
+ struct xdr_stream *xdr,
+ struct nfs42_write_res *res)
+{
+ struct compound_hdr hdr;
+ int status;
+
+ status = decode_compound_hdr(xdr, &hdr);
+ if (status)
+ goto out;
+ status = decode_sequence(xdr, &res->seq_res, rqstp);
+ if (status)
+ goto out;
+ status = decode_putfh(xdr);
+ if (status)
+ goto out;
+ status = decode_writeplus(xdr, res);
+
+out:
+ return status;
+}
+#endif /* CONFIG_NFS_V4_2 */
+
/**
* nfs4_decode_dirent - Decode a single NFSv4 directory entry stored in
* the local page cache.
@@ -7392,6 +7531,9 @@ struct rpc_procinfo nfs4_procedures[] = {
enc_bind_conn_to_session, dec_bind_conn_to_session),
PROC(DESTROY_CLIENTID, enc_destroy_clientid, dec_destroy_clientid),
#endif /* CONFIG_NFS_V4_1 */
+#if defined(CONFIG_NFS_V4_2)
+ PROC(WRITE_PLUS, enc_write_plus, dec_write_plus),
+#endif /* CONFIG_NFS_V4_2 */
};

const struct rpc_version nfs_version4 = {
diff --git a/include/linux/nfs4.h b/include/linux/nfs4.h
index 81d6b09..237016a 100644
--- a/include/linux/nfs4.h
+++ b/include/linux/nfs4.h
@@ -489,6 +489,9 @@ enum {
NFSPROC4_CLNT_GETDEVICELIST,
NFSPROC4_CLNT_BIND_CONN_TO_SESSION,
NFSPROC4_CLNT_DESTROY_CLIENTID,
+
+ /* nfs42 */
+ NFSPROC4_CLNT_WRITE_PLUS,
};

/* nfs41 types */
diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h
index 49f52c8..d8cbe5a 100644
--- a/include/linux/nfs_xdr.h
+++ b/include/linux/nfs_xdr.h
@@ -1223,6 +1223,35 @@ struct pnfs_ds_commit_info {

#endif /* CONFIG_NFS_V4_1 */

+#ifdef CONFIG_NFS_V4_2
+struct nfs42_write_plus_args {
+ struct nfs4_sequence_args seq_args;
+
+ struct nfs_fh *wp_fh;
+ nfs4_stateid wp_stateid;
+ enum nfs3_stable_how wp_stable;
+ enum data_content4 wp_content;
+ u64 wp_offset;
+ u64 wp_length;
+ u32 wp_allocated;
+};
+
+struct nfs42_write_res
+{
+ struct nfs4_sequence_res seq_res;
+ unsigned int wr_status;
+
+ bool wr_async;
+ nfs4_stateid wr_stateid;
+ u64 wr_bytes_copied;
+ int wr_committed;
+ struct nfs_write_verifier wr_verf;
+
+ struct list_head wait_list;
+ struct completion completion;
+};
+#endif
+
struct nfs_page;

#define NFS_PAGEVEC_SIZE (8U)
--
1.8.4.1


2013-11-05 09:37:04

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH 1/3] NFS: Use WRITE_PLUS for hole punches

On Mon, Oct 28, 2013 at 11:00:16AM -0400, Anna Schumaker wrote:
> + struct nfs42_write_plus_args args = {
> + .wp_fh = NFS_FH(inode),
> + .wp_stable = NFS_FILE_SYNC,
> + .wp_content = NFS4_CONTENT_HOLE,
> + .wp_offset = offset,
> + .wp_length = len,
> + .wp_allocated = (mode == 0),
> + };

After spending some time with draft 21 of the NFSv4.2 spec I don't think
we can use WRITE_PLUS for the prealloc mode of fallocate. The problem
with the NFS4_CONTENT_HOLE arm of WRITE_PLUS is that it is defined to
zero the whole range, while fallocate is defined as being a no-op for
parts of the range that already contain data.

In addition we'll also need more sanity checks on the flags argument,
there already is a FALLOC_FL_KEEP_SIZE not supportable by the NFS
semantics (not that it nessecarily should), and more may be added
in the future. Take a a look at the other fallocate implementations tha
t have a quick check on the top for flags they don't support:

if (mode & ~(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE))
return -EOPNOTSUPP;