2013-10-28 14:57:32

by Anna Schumaker

[permalink] [raw]
Subject: [PATCH 0/4] NFSD: Add support for WRITE_PLUS and SEEK

This patch series adds server support for the NFS v4.2 operations WRITE_PLUS
and SEEK. The first few patches fix up error codes to match the current draft
of the spec and then add in the NFS v4.2 decode operations array so later
patches can focus on only adding in the new operations.

Patch 3 (Add WRITE_PLUS support for hole punches) exports the do_fallocate()
function from fs/open.c. Should this be its own patch submitted somewhere
else? Is there a better way to get into the VFS fallocate routines?

The final patch is not meant to be applied, and is only included as an example
of what I did to add offload handling to the client.

Questions, comments and thoughts are appreciated!

Anna

Anna Schumaker (4):
NFSD: Update error codes
NFSD: Create nfs v4.2 decode ops
NFSD: Add WRITE_PLUS support for hole punches
NFSD: Implement SEEK

fs/nfsd/nfs4proc.c | 89 +++++++++++++++++++++++++
fs/nfsd/nfs4xdr.c | 179 ++++++++++++++++++++++++++++++++++++++++++++++++++-
fs/nfsd/nfsd.h | 2 +-
fs/nfsd/vfs.c | 14 ++++
fs/nfsd/vfs.h | 1 +
fs/nfsd/xdr4.h | 39 +++++++++++
fs/open.c | 1 +
include/linux/nfs4.h | 21 +++++-
8 files changed, 342 insertions(+), 4 deletions(-)

--
1.8.4.1



2013-10-29 13:23:20

by Myklebust, Trond

[permalink] [raw]
Subject: Re: [PATCH 3/4] NFSD: Add WRITE_PLUS support for hole punches


On Oct 29, 2013, at 9:06 AM, J. Bruce Fields <[email protected]> wrote:

> On Tue, Oct 29, 2013 at 08:50:56AM -0400, Anna Schumaker wrote:
>> On Mon 28 Oct 2013 05:40:30 PM EDT, J. Bruce Fields wrote:
>>> On Mon, Oct 28, 2013 at 10:57:25AM -0400, Anna Schumaker wrote:
>>>> diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
>>>> index 419572f..3210c6f 100644
>>>> --- a/fs/nfsd/nfs4proc.c
>>>> +++ b/fs/nfsd/nfs4proc.c
>>>> @@ -1028,6 +1028,42 @@ nfsd4_write(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
>>>> return status;
>>>> }
>>>>
>>>> +static __be32
>>>> +nfsd4_write_plus_hole(struct file *file, struct nfsd4_write_plus *writeplus,
>>>> + struct net *net)
>>>> +{
>>>> + __be32 status;
>>>> +
>>>> + status = nfsd4_vfs_fallocate(file, writeplus->wp_allocated,
>>>> + writeplus->wp_offset, writeplus->wp_length);
>>>> + if (status == nfs_ok) {
>>>> + writeplus->wp_res.wr_stid = NULL;
>>>> + writeplus->wp_res.wr_bytes_written = writeplus->wp_length;
>>>> + writeplus->wp_res.wr_stable_how = NFS_FILE_SYNC;
>>>
>>> Do we need to sync?
>>
>> I did the sync in nfsd4_vfs_fallocate (below), but I can move it if
>> that would make more sense.
>
> What I meant was--why are we doing a sync at all, instead of returning
> NFS_UNSTABLE and making the client commit?

What if the client specifies FILE_SYNC? The spec doesn't allow the server to return UNSTABLE in that situation.

Trond

2013-10-29 13:06:50

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH 3/4] NFSD: Add WRITE_PLUS support for hole punches

On Tue, Oct 29, 2013 at 08:50:56AM -0400, Anna Schumaker wrote:
> On Mon 28 Oct 2013 05:40:30 PM EDT, J. Bruce Fields wrote:
> > On Mon, Oct 28, 2013 at 10:57:25AM -0400, Anna Schumaker wrote:
> >> diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
> >> index 419572f..3210c6f 100644
> >> --- a/fs/nfsd/nfs4proc.c
> >> +++ b/fs/nfsd/nfs4proc.c
> >> @@ -1028,6 +1028,42 @@ nfsd4_write(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
> >> return status;
> >> }
> >>
> >> +static __be32
> >> +nfsd4_write_plus_hole(struct file *file, struct nfsd4_write_plus *writeplus,
> >> + struct net *net)
> >> +{
> >> + __be32 status;
> >> +
> >> + status = nfsd4_vfs_fallocate(file, writeplus->wp_allocated,
> >> + writeplus->wp_offset, writeplus->wp_length);
> >> + if (status == nfs_ok) {
> >> + writeplus->wp_res.wr_stid = NULL;
> >> + writeplus->wp_res.wr_bytes_written = writeplus->wp_length;
> >> + writeplus->wp_res.wr_stable_how = NFS_FILE_SYNC;
> >
> > Do we need to sync?
>
> I did the sync in nfsd4_vfs_fallocate (below), but I can move it if
> that would make more sense.

What I meant was--why are we doing a sync at all, instead of returning
NFS_UNSTABLE and making the client commit?

Honest question, I haven't thought about which is best.

--b.

2013-10-28 14:57:34

by Anna Schumaker

[permalink] [raw]
Subject: [PATCH 3/4] NFSD: Add WRITE_PLUS support for hole punches

This patch only implements DATA_CONTENT_HOLE support, but I tried to
write it so other arms of the discriminated union can be added easily
later.

This patch is missing support for decoding multiple requests on the same
file. The way it's written now only the last range provided by the
client will be written to.

Signed-off-by: Anna Schumaker <[email protected]>
---
fs/nfsd/nfs4proc.c | 45 ++++++++++++++++++++++++++++++++++++++
fs/nfsd/nfs4xdr.c | 62 ++++++++++++++++++++++++++++++++++++++++++++++++++--
fs/nfsd/vfs.c | 14 ++++++++++++
fs/nfsd/vfs.h | 1 +
fs/nfsd/xdr4.h | 24 ++++++++++++++++++++
fs/open.c | 1 +
include/linux/nfs4.h | 8 ++++++-
7 files changed, 152 insertions(+), 3 deletions(-)

diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
index 419572f..3210c6f 100644
--- a/fs/nfsd/nfs4proc.c
+++ b/fs/nfsd/nfs4proc.c
@@ -1028,6 +1028,42 @@ nfsd4_write(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
return status;
}

+static __be32
+nfsd4_write_plus_hole(struct file *file, struct nfsd4_write_plus *writeplus,
+ struct net *net)
+{
+ __be32 status;
+
+ status = nfsd4_vfs_fallocate(file, writeplus->wp_allocated,
+ writeplus->wp_offset, writeplus->wp_length);
+ if (status == nfs_ok) {
+ writeplus->wp_res.wr_stid = NULL;
+ writeplus->wp_res.wr_bytes_written = writeplus->wp_length;
+ writeplus->wp_res.wr_stable_how = NFS_FILE_SYNC;
+ gen_boot_verifier(&writeplus->wp_res.wr_verifier, net);
+ }
+
+ return status;
+}
+
+static __be32
+nfsd4_write_plus(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
+ struct nfsd4_write_plus *writeplus)
+{
+ struct file *file;
+ __be32 status;
+ struct net *net = SVC_NET(rqstp);
+
+ status = nfs4_preprocess_stateid_op(net, cstate, &writeplus->wp_stateid,
+ WR_STATE, &file);
+ if (status)
+ return status;
+
+ if (writeplus->wp_data_content == NFS4_CONTENT_HOLE)
+ return nfsd4_write_plus_hole(file, writeplus, net);
+ return nfserr_union_notsupp;
+}
+
/* This routine never returns NFS_OK! If there are no other errors, it
* will return NFSERR_SAME or NFSERR_NOT_SAME depending on whether the
* attributes matched. VERIFY is implemented by mapping NFSERR_SAME
@@ -1840,6 +1876,15 @@ static struct nfsd4_operation nfsd4_ops[] = {
.op_get_currentstateid = (stateid_getter)nfsd4_get_freestateid,
.op_rsize_bop = (nfsd4op_rsize)nfsd4_only_status_rsize,
},
+
+ /* NFSv4.2 operations */
+ [OP_WRITE_PLUS] = {
+ .op_func = (nfsd4op_func)nfsd4_write_plus,
+ .op_flags = OP_MODIFIES_SOMETHING | OP_CACHEME,
+ .op_name = "OP_WRITE_PLUS",
+ .op_rsize_bop = (nfsd4op_rsize)nfsd4_write_rsize,
+ .op_get_currentstateid = (stateid_getter)nfsd4_get_writestateid,
+ },
};

#ifdef NFSD_DEBUG
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index 60f5a1f..1e4e9af 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -1485,6 +1485,32 @@ static __be32 nfsd4_decode_reclaim_complete(struct nfsd4_compoundargs *argp, str
}

static __be32
+nfsd4_decode_write_plus(struct nfsd4_compoundargs *argp,
+ struct nfsd4_write_plus *writeplus)
+{
+ DECODE_HEAD;
+ unsigned int i;
+
+ status = nfsd4_decode_stateid(argp, &writeplus->wp_stateid);
+ if (status)
+ return status;
+
+ READ_BUF(8);
+ READ32(writeplus->wp_stable_how);
+ READ32(writeplus->wp_data_length);
+
+ for (i = 0; i < writeplus->wp_data_length; i++) {
+ READ_BUF(24);
+ READ32(writeplus->wp_data_content);
+ READ64(writeplus->wp_offset);
+ READ64(writeplus->wp_length);
+ READ32(writeplus->wp_allocated);
+ }
+
+ DECODE_TAIL;
+}
+
+static __be32
nfsd4_decode_noop(struct nfsd4_compoundargs *argp, void *p)
{
return nfs_ok;
@@ -1665,7 +1691,7 @@ static nfsd4_dec nfsd42_dec_ops[] = {
[OP_COPY_NOTIFY] = (nfsd4_dec)nfsd4_decode_notsupp,
[OP_OFFLOAD_REVOKE] = (nfsd4_dec)nfsd4_decode_notsupp,
[OP_OFFLOAD_STATUS] = (nfsd4_dec)nfsd4_decode_notsupp,
- [OP_WRITE_PLUS] = (nfsd4_dec)nfsd4_decode_notsupp,
+ [OP_WRITE_PLUS] = (nfsd4_dec)nfsd4_decode_write_plus,
[OP_READ_PLUS] = (nfsd4_dec)nfsd4_decode_notsupp,
[OP_SEEK] = (nfsd4_dec)nfsd4_decode_notsupp,
[OP_IO_ADVISE] = (nfsd4_dec)nfsd4_decode_notsupp,
@@ -3591,6 +3617,38 @@ nfsd4_encode_test_stateid(struct nfsd4_compoundres *resp, __be32 nfserr,
return nfserr;
}

+static void
+nfsd42_encode_write_res(struct nfsd4_compoundres *resp, struct nfsd42_write_res *write)
+{
+ __be32 *p;
+
+ RESERVE_SPACE(4);
+
+ if (write->wr_stid == NULL) {
+ WRITE32(0);
+ ADJUST_ARGS();
+ } else {
+ WRITE32(1);
+ ADJUST_ARGS();
+ nfsd4_encode_stateid(resp, &write->wr_stid->sc_stateid);
+ }
+
+ RESERVE_SPACE(12 + NFS4_VERIFIER_SIZE);
+ WRITE64(write->wr_bytes_written);
+ WRITE32(write->wr_stable_how);
+ WRITEMEM(write->wr_verifier.data, NFS4_VERIFIER_SIZE);
+ ADJUST_ARGS();
+}
+
+static __be32
+nfsd4_encode_write_plus(struct nfsd4_compoundres *resp, __be32 nfserr,
+ struct nfsd4_write_plus *writeplus)
+{
+ if (!nfserr)
+ nfsd42_encode_write_res(resp, &writeplus->wp_res);
+ return nfserr;
+}
+
static __be32
nfsd4_encode_noop(struct nfsd4_compoundres *resp, __be32 nfserr, void *p)
{
@@ -3670,7 +3728,7 @@ static nfsd4_enc nfsd4_enc_ops[] = {
[OP_COPY_NOTIFY] = (nfsd4_enc)nfsd4_encode_noop,
[OP_OFFLOAD_REVOKE] = (nfsd4_enc)nfsd4_encode_noop,
[OP_OFFLOAD_STATUS] = (nfsd4_enc)nfsd4_encode_noop,
- [OP_WRITE_PLUS] = (nfsd4_enc)nfsd4_encode_noop,
+ [OP_WRITE_PLUS] = (nfsd4_enc)nfsd4_encode_write_plus,
[OP_READ_PLUS] = (nfsd4_enc)nfsd4_encode_noop,
[OP_SEEK] = (nfsd4_enc)nfsd4_encode_noop,
[OP_IO_ADVISE] = (nfsd4_enc)nfsd4_encode_noop,
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index c827acb..7fec087 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -16,6 +16,7 @@
#include <linux/fs.h>
#include <linux/file.h>
#include <linux/splice.h>
+#include <linux/falloc.h>
#include <linux/fcntl.h>
#include <linux/namei.h>
#include <linux/delay.h>
@@ -649,6 +650,19 @@ __be32 nfsd4_set_nfs4_label(struct svc_rqst *rqstp, struct svc_fh *fhp,
}
#endif

+__be32 nfsd4_vfs_fallocate(struct file *file, bool allocated, loff_t offset, loff_t len)
+{
+ int error, mode = 0;
+
+ if (allocated == false)
+ mode = FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE;
+
+ error = do_fallocate(file, mode, offset, len);
+ if (error == 0)
+ error = vfs_fsync_range(file, offset, offset + len, 0);
+ return nfserrno(error);
+}
+
#endif /* defined(CONFIG_NFSD_V4) */

#ifdef CONFIG_NFSD_V3
diff --git a/fs/nfsd/vfs.h b/fs/nfsd/vfs.h
index a4be2e3..187eb95 100644
--- a/fs/nfsd/vfs.h
+++ b/fs/nfsd/vfs.h
@@ -57,6 +57,7 @@ __be32 nfsd4_set_nfs4_acl(struct svc_rqst *, struct svc_fh *,
int nfsd4_get_nfs4_acl(struct svc_rqst *, struct dentry *, struct nfs4_acl **);
__be32 nfsd4_set_nfs4_label(struct svc_rqst *, struct svc_fh *,
struct xdr_netobj *);
+__be32 nfsd4_vfs_fallocate(struct file *, bool, loff_t, loff_t);
#endif /* CONFIG_NFSD_V4 */
__be32 nfsd_create(struct svc_rqst *, struct svc_fh *,
char *name, int len, struct iattr *attrs,
diff --git a/fs/nfsd/xdr4.h b/fs/nfsd/xdr4.h
index b3ed644..aaef9ab 100644
--- a/fs/nfsd/xdr4.h
+++ b/fs/nfsd/xdr4.h
@@ -430,6 +430,27 @@ struct nfsd4_reclaim_complete {
u32 rca_one_fs;
};

+struct nfsd42_write_res {
+ struct nfs4_stid *wr_stid;
+ u64 wr_bytes_written;
+ u32 wr_stable_how;
+ nfs4_verifier wr_verifier;
+};
+
+struct nfsd4_write_plus {
+ /* request */
+ stateid_t wp_stateid;
+ u32 wp_stable_how;
+ u32 wp_data_length;
+ u32 wp_data_content;
+ u64 wp_offset;
+ u64 wp_length;
+ u32 wp_allocated;
+
+ /* response */
+ struct nfsd42_write_res wp_res;
+};
+
struct nfsd4_op {
int opnum;
__be32 status;
@@ -475,6 +496,9 @@ struct nfsd4_op {
struct nfsd4_reclaim_complete reclaim_complete;
struct nfsd4_test_stateid test_stateid;
struct nfsd4_free_stateid free_stateid;
+
+ /* NFSv4.2 */
+ struct nfsd4_write_plus write_plus;
} u;
struct nfs4_replay * replay;
};
diff --git a/fs/open.c b/fs/open.c
index d420331..09db2d5 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -278,6 +278,7 @@ int do_fallocate(struct file *file, int mode, loff_t offset, loff_t len)
sb_end_write(inode->i_sb);
return ret;
}
+EXPORT_SYMBOL_GPL(do_fallocate);

SYSCALL_DEFINE4(fallocate, int, fd, int, mode, loff_t, offset, loff_t, len)
{
diff --git a/include/linux/nfs4.h b/include/linux/nfs4.h
index 2bc5217..55ed991 100644
--- a/include/linux/nfs4.h
+++ b/include/linux/nfs4.h
@@ -128,7 +128,7 @@ enum nfs_opnum4 {
Needs to be updated if more operations are defined in future.*/

#define FIRST_NFS4_OP OP_ACCESS
-#define LAST_NFS4_OP OP_RECLAIM_COMPLETE
+#define LAST_NFS4_OP OP_WRITE_PLUS

enum nfsstat4 {
NFS4_OK = 0,
@@ -552,4 +552,10 @@ struct nfs4_deviceid {
char data[NFS4_DEVICEID4_SIZE];
};

+enum data_content4 {
+ NFS4_CONTENT_DATA = 0,
+ NFS4_CONTENT_APP_DATA_HOLE = 1,
+ NFS4_CONTENT_HOLE = 2,
+};
+
#endif
--
1.8.4.1


2013-10-29 13:33:25

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH 2/4] NFSD: Create nfs v4.2 decode ops

On Tue, Oct 29, 2013 at 08:43:46AM -0400, Anna Schumaker wrote:
> On Mon 28 Oct 2013 05:11:19 PM EDT, J. Bruce Fields wrote:
> > On Mon, Oct 28, 2013 at 08:59:59PM +0000, Myklebust, Trond wrote:
> >>> -----Original Message-----
> >>> From: [email protected] [mailto:linux-nfs-
> >>> [email protected]] On Behalf Of J. Bruce Fields
> >>> Sent: Monday, October 28, 2013 4:54 PM
> >>> To: Schumaker, Bryan
> >>> Cc: [email protected]
> >>> Subject: Re: [PATCH 2/4] NFSD: Create nfs v4.2 decode ops
> >>>
> >>> On Mon, Oct 28, 2013 at 10:57:24AM -0400, Anna Schumaker wrote:
> >>>> I'm doing this in a separate patch to keep from putting in a lot of
> >>>> extra code when I go to add operations to the server for real.
> >>>
> >>> Makes sense.
> >>>
> >>> But: now we're duplicating the list of 4.0 op decoders 3 times and the
> >>> 4.1 ops twice. We'll never need different decoders for different
> >>> minorversions (worst case we can test for the minorversion in the decoder if
> >>> necessary).
> >>>
> >>> I wonder if there's a better way to organize this.... Maybe something more
> >>> like a single array with
> >>>
> >>> [OP_SETCLIENTID] = {
> >>> .op_decode = (nfsd4_dec)nfsd4_decode_access,
> >>> .op_unsupported_since_version = 1,
> >>> }
> >>> ...
> >>> [OP_EXCHANGE_ID] = {
> >>> .op_decode = (nfsd4_dec)nfsd4_decode_exchange_id,
> >>> .op_first_supported_in_version = 1,
> >>> }
> >>>
> >>> ?
> >>
> >> Is that really necessary? Why not just have a single array and have nfsd4_decode_clientid itself check the minor version?
> >
> > That'd work too. Every operation that's been introduced more recently
> > than 4.0 or that's since been deprecated would need a version check at
> > the top of its decoder. That'd be a dozen or so.
> >
> > But the information has to go somewhere and perhaps that's more
> > straightforward than sticking this in data.... OK, I'd be fine with
> > that.
>
> Makes sense. Do you want me to put this in with this patch series or
> do it as something separate either before or after?

Best would be to do it as something separate before the rest. So
combine the minorversion 0 and minorversion 1 cases into 1 array first.
And I can apply that patch now independent of the rest.

--b.

2013-10-31 16:07:20

by Anna Schumaker

[permalink] [raw]
Subject: Re: [PATCH 4/4] NFSD: Implement SEEK

On 10/31/2013 12:04 PM, Christoph Hellwig wrote:
> Btw, I just noticed that current nfs mainline fails xfstests 286 which
> test SEEK_DATA/SEEK_HOLE. That sounds like a useful test case for your
> patches.
>

Yeah, I've been using it to test this. I'm having trouble getting the last few parts of both test 11 and 12 to pass, which is better than what's happening now!

Anna

2013-10-31 16:04:43

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH 4/4] NFSD: Implement SEEK

Btw, I just noticed that current nfs mainline fails xfstests 286 which
test SEEK_DATA/SEEK_HOLE. That sounds like a useful test case for your
patches.


2013-10-29 13:53:25

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [RFC 5/4] NFSD: Add basic CB_OFFLOAD support

On Tue, Oct 29, 2013 at 06:38:00AM -0700, Christoph Hellwig wrote:
> On Tue, Oct 29, 2013 at 09:36:11AM -0400, J. Bruce Fields wrote:
> > Yeah, understood, I'm glad we're not implementing that, I just wonder
> > why every one of these operations (COPY, WRITE_PLUS, etc.) has to have
> > this asynchronous option.
> >
> > The client's still stuck implementing it even if the server does, it's
> > extra protocol verbage even if nobody uses it, and I'm not completely
> > clear what it's for.
>
> Seems like Trond answered that question: feature creep that people
> without the slightest sense of abstraction tried to overload over a few
> operations.

Your complaint as I understand it is that quick and long-running
operations were combined into one one operation when they would have
better been separated. I agree.

But I also don't understand why the long-running operations need an
async option. Maybe they do, I just don't understand why.

Alternatives might include just letting the operation hog a request slot
for the whole time, or making it work in chunks. (E.g. allowing COPY to
return short writes.)

--b.

2013-10-28 14:57:33

by Anna Schumaker

[permalink] [raw]
Subject: [PATCH 1/4] NFSD: Update error codes

Recent NFS v4.2 drafts have removed NFS4ERR_METADATA_NOTSUPP and
reassigned the error code to NFS4ERR_UNION_NOTSUPP.

Signed-off-by: Anna Schumaker <[email protected]>
---
fs/nfsd/nfsd.h | 2 +-
include/linux/nfs4.h | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/nfsd/nfsd.h b/fs/nfsd/nfsd.h
index 30f34ab..4fe49e9 100644
--- a/fs/nfsd/nfsd.h
+++ b/fs/nfsd/nfsd.h
@@ -244,7 +244,7 @@ void nfsd_lockd_shutdown(void);
#define nfserr_deleg_revoked cpu_to_be32(NFS4ERR_DELEG_REVOKED)
#define nfserr_partner_notsupp cpu_to_be32(NFS4ERR_PARTNER_NOTSUPP)
#define nfserr_partner_no_auth cpu_to_be32(NFS4ERR_PARTNER_NO_AUTH)
-#define nfserr_metadata_notsupp cpu_to_be32(NFS4ERR_METADATA_NOTSUPP)
+#define nfserr_union_notsupp cpu_to_be32(NFS4ERR_UNION_NOTSUPP)
#define nfserr_offload_denied cpu_to_be32(NFS4ERR_OFFLOAD_DENIED)
#define nfserr_wrong_lfs cpu_to_be32(NFS4ERR_WRONG_LFS)
#define nfserr_badlabel cpu_to_be32(NFS4ERR_BADLABEL)
diff --git a/include/linux/nfs4.h b/include/linux/nfs4.h
index e36dee5..a92e065 100644
--- a/include/linux/nfs4.h
+++ b/include/linux/nfs4.h
@@ -232,7 +232,7 @@ enum nfsstat4 {
/* nfs42 */
NFS4ERR_PARTNER_NOTSUPP = 10088,
NFS4ERR_PARTNER_NO_AUTH = 10089,
- NFS4ERR_METADATA_NOTSUPP = 10090,
+ NFS4ERR_UNION_NOTSUPP = 10090,
NFS4ERR_OFFLOAD_DENIED = 10091,
NFS4ERR_WRONG_LFS = 10092,
NFS4ERR_BADLABEL = 10093,
--
1.8.4.1


2013-10-28 21:11:19

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH 2/4] NFSD: Create nfs v4.2 decode ops

On Mon, Oct 28, 2013 at 08:59:59PM +0000, Myklebust, Trond wrote:
> > -----Original Message-----
> > From: [email protected] [mailto:linux-nfs-
> > [email protected]] On Behalf Of J. Bruce Fields
> > Sent: Monday, October 28, 2013 4:54 PM
> > To: Schumaker, Bryan
> > Cc: [email protected]
> > Subject: Re: [PATCH 2/4] NFSD: Create nfs v4.2 decode ops
> >
> > On Mon, Oct 28, 2013 at 10:57:24AM -0400, Anna Schumaker wrote:
> > > I'm doing this in a separate patch to keep from putting in a lot of
> > > extra code when I go to add operations to the server for real.
> >
> > Makes sense.
> >
> > But: now we're duplicating the list of 4.0 op decoders 3 times and the
> > 4.1 ops twice. We'll never need different decoders for different
> > minorversions (worst case we can test for the minorversion in the decoder if
> > necessary).
> >
> > I wonder if there's a better way to organize this.... Maybe something more
> > like a single array with
> >
> > [OP_SETCLIENTID] = {
> > .op_decode = (nfsd4_dec)nfsd4_decode_access,
> > .op_unsupported_since_version = 1,
> > }
> > ...
> > [OP_EXCHANGE_ID] = {
> > .op_decode = (nfsd4_dec)nfsd4_decode_exchange_id,
> > .op_first_supported_in_version = 1,
> > }
> >
> > ?
>
> Is that really necessary? Why not just have a single array and have nfsd4_decode_clientid itself check the minor version?

That'd work too. Every operation that's been introduced more recently
than 4.0 or that's since been deprecated would need a version check at
the top of its decoder. That'd be a dozen or so.

But the information has to go somewhere and perhaps that's more
straightforward than sticking this in data.... OK, I'd be fine with
that.

--b.

2013-10-29 07:37:20

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [RFC 5/4] NFSD: Add basic CB_OFFLOAD support

On Mon, Oct 28, 2013 at 05:52:21PM -0400, J. Bruce Fields wrote:
> On Mon, Oct 28, 2013 at 10:57:27AM -0400, Anna Schumaker wrote:
> > This patch adds basic offload support to the WRITE_PLUS operation.
> > Since I don't implement OFFLOAD_ABORT, OFFLOAD_REVOKE or OFFLOAD_STATUS
> > this patch is NOT spec compliant and should not be applied without
> > further work.
>
> Ugh. I don't understand why we need asynchronous modes for all these
> operations.

Hole punches as implemented by any filesystem at the moement are pure
metadata manipulations and should not require "async" versions that
offload to a workqueue.


2013-10-29 13:36:13

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [RFC 5/4] NFSD: Add basic CB_OFFLOAD support

On Tue, Oct 29, 2013 at 12:37:19AM -0700, Christoph Hellwig wrote:
> On Mon, Oct 28, 2013 at 05:52:21PM -0400, J. Bruce Fields wrote:
> > On Mon, Oct 28, 2013 at 10:57:27AM -0400, Anna Schumaker wrote:
> > > This patch adds basic offload support to the WRITE_PLUS operation.
> > > Since I don't implement OFFLOAD_ABORT, OFFLOAD_REVOKE or OFFLOAD_STATUS
> > > this patch is NOT spec compliant and should not be applied without
> > > further work.
> >
> > Ugh. I don't understand why we need asynchronous modes for all these
> > operations.
>
> Hole punches as implemented by any filesystem at the moement are pure
> metadata manipulations and should not require "async" versions that
> offload to a workqueue.

Yeah, understood, I'm glad we're not implementing that, I just wonder
why every one of these operations (COPY, WRITE_PLUS, etc.) has to have
this asynchronous option.

The client's still stuck implementing it even if the server does, it's
extra protocol verbage even if nobody uses it, and I'm not completely
clear what it's for.

--b.

2013-10-29 12:50:58

by Anna Schumaker

[permalink] [raw]
Subject: Re: [PATCH 3/4] NFSD: Add WRITE_PLUS support for hole punches

On Mon 28 Oct 2013 05:40:30 PM EDT, J. Bruce Fields wrote:
> On Mon, Oct 28, 2013 at 10:57:25AM -0400, Anna Schumaker wrote:
>> This patch only implements DATA_CONTENT_HOLE support, but I tried to
>> write it so other arms of the discriminated union can be added easily
>> later.
>
> And there's an NFS4ERR_UNION_NOTSUPP to handle that case, OK. (Though
> the spec language is a little ambiguous:
>
> If the client asks for a hole and the server does not support
> that arm of the discriminated union, but does support one or
> more additional arms, it can signal to the client that it
> supports the operation, but not the arm with
> NFS4ERR_UNION_NOTSUPP.
>
> So it's unclear whether we can return this in the case the client is
> *not* asking for a hole. Are we sure the NFS4_CONTENT_DATA case is
> optional? The language in 5.3 ("Instead a client should utlize
> READ_PLUS and WRITE_PLUS") also suggests the CONTENT_DATA case might be
> meant to be mandatory.)
>
>> This patch is missing support for decoding multiple requests on the same
>> file. The way it's written now only the last range provided by the
>> client will be written to.
>
> But that doesn't seem to be a limitation anticipated by the spec, so I
> guess we need to fix that.
>
> (Why is there an array, anyway? Wouldn't it work just as well to send
> multiple RPC's, or a single RPC with multiple WRITE_PLUS ops?)
>
>>
>> Signed-off-by: Anna Schumaker <[email protected]>
>> ---
>> fs/nfsd/nfs4proc.c | 45 ++++++++++++++++++++++++++++++++++++++
>> fs/nfsd/nfs4xdr.c | 62 ++++++++++++++++++++++++++++++++++++++++++++++++++--
>> fs/nfsd/vfs.c | 14 ++++++++++++
>> fs/nfsd/vfs.h | 1 +
>> fs/nfsd/xdr4.h | 24 ++++++++++++++++++++
>> fs/open.c | 1 +
>> include/linux/nfs4.h | 8 ++++++-
>> 7 files changed, 152 insertions(+), 3 deletions(-)
>>
>> diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
>> index 419572f..3210c6f 100644
>> --- a/fs/nfsd/nfs4proc.c
>> +++ b/fs/nfsd/nfs4proc.c
>> @@ -1028,6 +1028,42 @@ nfsd4_write(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
>> return status;
>> }
>>
>> +static __be32
>> +nfsd4_write_plus_hole(struct file *file, struct nfsd4_write_plus *writeplus,
>> + struct net *net)
>> +{
>> + __be32 status;
>> +
>> + status = nfsd4_vfs_fallocate(file, writeplus->wp_allocated,
>> + writeplus->wp_offset, writeplus->wp_length);
>> + if (status == nfs_ok) {
>> + writeplus->wp_res.wr_stid = NULL;
>> + writeplus->wp_res.wr_bytes_written = writeplus->wp_length;
>> + writeplus->wp_res.wr_stable_how = NFS_FILE_SYNC;
>
> Do we need to sync?

I did the sync in nfsd4_vfs_fallocate (below), but I can move it if
that would make more sense.
>
>> + gen_boot_verifier(&writeplus->wp_res.wr_verifier, net);
>> + }
>> +
>> + return status;
>
> Nit, but I like the usual idiom better in most cases:
>
> if (status)
> return status;
> normal case here...
> return 0;

Sure, I'll change that.

>
> --b.
>
>> +}
>> +
>> +static __be32
>> +nfsd4_write_plus(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
>> + struct nfsd4_write_plus *writeplus)
>> +{
>> + struct file *file;
>> + __be32 status;
>> + struct net *net = SVC_NET(rqstp);
>> +
>> + status = nfs4_preprocess_stateid_op(net, cstate, &writeplus->wp_stateid,
>> + WR_STATE, &file);
>> + if (status)
>> + return status;
>> +
>> + if (writeplus->wp_data_content == NFS4_CONTENT_HOLE)
>> + return nfsd4_write_plus_hole(file, writeplus, net);
>> + return nfserr_union_notsupp;
>> +}
>> +
>> /* This routine never returns NFS_OK! If there are no other errors, it
>> * will return NFSERR_SAME or NFSERR_NOT_SAME depending on whether the
>> * attributes matched. VERIFY is implemented by mapping NFSERR_SAME
>> @@ -1840,6 +1876,15 @@ static struct nfsd4_operation nfsd4_ops[] = {
>> .op_get_currentstateid = (stateid_getter)nfsd4_get_freestateid,
>> .op_rsize_bop = (nfsd4op_rsize)nfsd4_only_status_rsize,
>> },
>> +
>> + /* NFSv4.2 operations */
>> + [OP_WRITE_PLUS] = {
>> + .op_func = (nfsd4op_func)nfsd4_write_plus,
>> + .op_flags = OP_MODIFIES_SOMETHING | OP_CACHEME,
>> + .op_name = "OP_WRITE_PLUS",
>> + .op_rsize_bop = (nfsd4op_rsize)nfsd4_write_rsize,
>> + .op_get_currentstateid = (stateid_getter)nfsd4_get_writestateid,
>> + },
>> };
>>
>> #ifdef NFSD_DEBUG
>> diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
>> index 60f5a1f..1e4e9af 100644
>> --- a/fs/nfsd/nfs4xdr.c
>> +++ b/fs/nfsd/nfs4xdr.c
>> @@ -1485,6 +1485,32 @@ static __be32 nfsd4_decode_reclaim_complete(struct nfsd4_compoundargs *argp, str
>> }
>>
>> static __be32
>> +nfsd4_decode_write_plus(struct nfsd4_compoundargs *argp,
>> + struct nfsd4_write_plus *writeplus)
>> +{
>> + DECODE_HEAD;
>> + unsigned int i;
>> +
>> + status = nfsd4_decode_stateid(argp, &writeplus->wp_stateid);
>> + if (status)
>> + return status;
>> +
>> + READ_BUF(8);
>> + READ32(writeplus->wp_stable_how);
>> + READ32(writeplus->wp_data_length);
>> +
>> + for (i = 0; i < writeplus->wp_data_length; i++) {
>> + READ_BUF(24);
>> + READ32(writeplus->wp_data_content);
>> + READ64(writeplus->wp_offset);
>> + READ64(writeplus->wp_length);
>> + READ32(writeplus->wp_allocated);
>> + }
>> +
>> + DECODE_TAIL;
>> +}
>> +
>> +static __be32
>> nfsd4_decode_noop(struct nfsd4_compoundargs *argp, void *p)
>> {
>> return nfs_ok;
>> @@ -1665,7 +1691,7 @@ static nfsd4_dec nfsd42_dec_ops[] = {
>> [OP_COPY_NOTIFY] = (nfsd4_dec)nfsd4_decode_notsupp,
>> [OP_OFFLOAD_REVOKE] = (nfsd4_dec)nfsd4_decode_notsupp,
>> [OP_OFFLOAD_STATUS] = (nfsd4_dec)nfsd4_decode_notsupp,
>> - [OP_WRITE_PLUS] = (nfsd4_dec)nfsd4_decode_notsupp,
>> + [OP_WRITE_PLUS] = (nfsd4_dec)nfsd4_decode_write_plus,
>> [OP_READ_PLUS] = (nfsd4_dec)nfsd4_decode_notsupp,
>> [OP_SEEK] = (nfsd4_dec)nfsd4_decode_notsupp,
>> [OP_IO_ADVISE] = (nfsd4_dec)nfsd4_decode_notsupp,
>> @@ -3591,6 +3617,38 @@ nfsd4_encode_test_stateid(struct nfsd4_compoundres *resp, __be32 nfserr,
>> return nfserr;
>> }
>>
>> +static void
>> +nfsd42_encode_write_res(struct nfsd4_compoundres *resp, struct nfsd42_write_res *write)
>> +{
>> + __be32 *p;
>> +
>> + RESERVE_SPACE(4);
>> +
>> + if (write->wr_stid == NULL) {
>> + WRITE32(0);
>> + ADJUST_ARGS();
>> + } else {
>> + WRITE32(1);
>> + ADJUST_ARGS();
>> + nfsd4_encode_stateid(resp, &write->wr_stid->sc_stateid);
>> + }
>> +
>> + RESERVE_SPACE(12 + NFS4_VERIFIER_SIZE);
>> + WRITE64(write->wr_bytes_written);
>> + WRITE32(write->wr_stable_how);
>> + WRITEMEM(write->wr_verifier.data, NFS4_VERIFIER_SIZE);
>> + ADJUST_ARGS();
>> +}
>> +
>> +static __be32
>> +nfsd4_encode_write_plus(struct nfsd4_compoundres *resp, __be32 nfserr,
>> + struct nfsd4_write_plus *writeplus)
>> +{
>> + if (!nfserr)
>> + nfsd42_encode_write_res(resp, &writeplus->wp_res);
>> + return nfserr;
>> +}
>> +
>> static __be32
>> nfsd4_encode_noop(struct nfsd4_compoundres *resp, __be32 nfserr, void *p)
>> {
>> @@ -3670,7 +3728,7 @@ static nfsd4_enc nfsd4_enc_ops[] = {
>> [OP_COPY_NOTIFY] = (nfsd4_enc)nfsd4_encode_noop,
>> [OP_OFFLOAD_REVOKE] = (nfsd4_enc)nfsd4_encode_noop,
>> [OP_OFFLOAD_STATUS] = (nfsd4_enc)nfsd4_encode_noop,
>> - [OP_WRITE_PLUS] = (nfsd4_enc)nfsd4_encode_noop,
>> + [OP_WRITE_PLUS] = (nfsd4_enc)nfsd4_encode_write_plus,
>> [OP_READ_PLUS] = (nfsd4_enc)nfsd4_encode_noop,
>> [OP_SEEK] = (nfsd4_enc)nfsd4_encode_noop,
>> [OP_IO_ADVISE] = (nfsd4_enc)nfsd4_encode_noop,
>> diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
>> index c827acb..7fec087 100644
>> --- a/fs/nfsd/vfs.c
>> +++ b/fs/nfsd/vfs.c
>> @@ -16,6 +16,7 @@
>> #include <linux/fs.h>
>> #include <linux/file.h>
>> #include <linux/splice.h>
>> +#include <linux/falloc.h>
>> #include <linux/fcntl.h>
>> #include <linux/namei.h>
>> #include <linux/delay.h>
>> @@ -649,6 +650,19 @@ __be32 nfsd4_set_nfs4_label(struct svc_rqst *rqstp, struct svc_fh *fhp,
>> }
>> #endif
>>
>> +__be32 nfsd4_vfs_fallocate(struct file *file, bool allocated, loff_t offset, loff_t len)
>> +{
>> + int error, mode = 0;
>> +
>> + if (allocated == false)
>> + mode = FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE;
>> +
>> + error = do_fallocate(file, mode, offset, len);
>> + if (error == 0)
>> + error = vfs_fsync_range(file, offset, offset + len, 0);
>> + return nfserrno(error);
>> +}
>> +
>> #endif /* defined(CONFIG_NFSD_V4) */
>>
>> #ifdef CONFIG_NFSD_V3
>> diff --git a/fs/nfsd/vfs.h b/fs/nfsd/vfs.h
>> index a4be2e3..187eb95 100644
>> --- a/fs/nfsd/vfs.h
>> +++ b/fs/nfsd/vfs.h
>> @@ -57,6 +57,7 @@ __be32 nfsd4_set_nfs4_acl(struct svc_rqst *, struct svc_fh *,
>> int nfsd4_get_nfs4_acl(struct svc_rqst *, struct dentry *, struct nfs4_acl **);
>> __be32 nfsd4_set_nfs4_label(struct svc_rqst *, struct svc_fh *,
>> struct xdr_netobj *);
>> +__be32 nfsd4_vfs_fallocate(struct file *, bool, loff_t, loff_t);
>> #endif /* CONFIG_NFSD_V4 */
>> __be32 nfsd_create(struct svc_rqst *, struct svc_fh *,
>> char *name, int len, struct iattr *attrs,
>> diff --git a/fs/nfsd/xdr4.h b/fs/nfsd/xdr4.h
>> index b3ed644..aaef9ab 100644
>> --- a/fs/nfsd/xdr4.h
>> +++ b/fs/nfsd/xdr4.h
>> @@ -430,6 +430,27 @@ struct nfsd4_reclaim_complete {
>> u32 rca_one_fs;
>> };
>>
>> +struct nfsd42_write_res {
>> + struct nfs4_stid *wr_stid;
>> + u64 wr_bytes_written;
>> + u32 wr_stable_how;
>> + nfs4_verifier wr_verifier;
>> +};
>> +
>> +struct nfsd4_write_plus {
>> + /* request */
>> + stateid_t wp_stateid;
>> + u32 wp_stable_how;
>> + u32 wp_data_length;
>> + u32 wp_data_content;
>> + u64 wp_offset;
>> + u64 wp_length;
>> + u32 wp_allocated;
>> +
>> + /* response */
>> + struct nfsd42_write_res wp_res;
>> +};
>> +
>> struct nfsd4_op {
>> int opnum;
>> __be32 status;
>> @@ -475,6 +496,9 @@ struct nfsd4_op {
>> struct nfsd4_reclaim_complete reclaim_complete;
>> struct nfsd4_test_stateid test_stateid;
>> struct nfsd4_free_stateid free_stateid;
>> +
>> + /* NFSv4.2 */
>> + struct nfsd4_write_plus write_plus;
>> } u;
>> struct nfs4_replay * replay;
>> };
>> diff --git a/fs/open.c b/fs/open.c
>> index d420331..09db2d5 100644
>> --- a/fs/open.c
>> +++ b/fs/open.c
>> @@ -278,6 +278,7 @@ int do_fallocate(struct file *file, int mode, loff_t offset, loff_t len)
>> sb_end_write(inode->i_sb);
>> return ret;
>> }
>> +EXPORT_SYMBOL_GPL(do_fallocate);
>>
>> SYSCALL_DEFINE4(fallocate, int, fd, int, mode, loff_t, offset, loff_t, len)
>> {
>> diff --git a/include/linux/nfs4.h b/include/linux/nfs4.h
>> index 2bc5217..55ed991 100644
>> --- a/include/linux/nfs4.h
>> +++ b/include/linux/nfs4.h
>> @@ -128,7 +128,7 @@ enum nfs_opnum4 {
>> Needs to be updated if more operations are defined in future.*/
>>
>> #define FIRST_NFS4_OP OP_ACCESS
>> -#define LAST_NFS4_OP OP_RECLAIM_COMPLETE
>> +#define LAST_NFS4_OP OP_WRITE_PLUS
>>
>> enum nfsstat4 {
>> NFS4_OK = 0,
>> @@ -552,4 +552,10 @@ struct nfs4_deviceid {
>> char data[NFS4_DEVICEID4_SIZE];
>> };
>>
>> +enum data_content4 {
>> + NFS4_CONTENT_DATA = 0,
>> + NFS4_CONTENT_APP_DATA_HOLE = 1,
>> + NFS4_CONTENT_HOLE = 2,
>> +};
>> +
>> #endif
>> --
>> 1.8.4.1
>>



2013-10-28 21:40:31

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH 3/4] NFSD: Add WRITE_PLUS support for hole punches

On Mon, Oct 28, 2013 at 10:57:25AM -0400, Anna Schumaker wrote:
> This patch only implements DATA_CONTENT_HOLE support, but I tried to
> write it so other arms of the discriminated union can be added easily
> later.

And there's an NFS4ERR_UNION_NOTSUPP to handle that case, OK. (Though
the spec language is a little ambiguous:

If the client asks for a hole and the server does not support
that arm of the discriminated union, but does support one or
more additional arms, it can signal to the client that it
supports the operation, but not the arm with
NFS4ERR_UNION_NOTSUPP.

So it's unclear whether we can return this in the case the client is
*not* asking for a hole. Are we sure the NFS4_CONTENT_DATA case is
optional? The language in 5.3 ("Instead a client should utlize
READ_PLUS and WRITE_PLUS") also suggests the CONTENT_DATA case might be
meant to be mandatory.)

> This patch is missing support for decoding multiple requests on the same
> file. The way it's written now only the last range provided by the
> client will be written to.

But that doesn't seem to be a limitation anticipated by the spec, so I
guess we need to fix that.

(Why is there an array, anyway? Wouldn't it work just as well to send
multiple RPC's, or a single RPC with multiple WRITE_PLUS ops?)

>
> Signed-off-by: Anna Schumaker <[email protected]>
> ---
> fs/nfsd/nfs4proc.c | 45 ++++++++++++++++++++++++++++++++++++++
> fs/nfsd/nfs4xdr.c | 62 ++++++++++++++++++++++++++++++++++++++++++++++++++--
> fs/nfsd/vfs.c | 14 ++++++++++++
> fs/nfsd/vfs.h | 1 +
> fs/nfsd/xdr4.h | 24 ++++++++++++++++++++
> fs/open.c | 1 +
> include/linux/nfs4.h | 8 ++++++-
> 7 files changed, 152 insertions(+), 3 deletions(-)
>
> diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
> index 419572f..3210c6f 100644
> --- a/fs/nfsd/nfs4proc.c
> +++ b/fs/nfsd/nfs4proc.c
> @@ -1028,6 +1028,42 @@ nfsd4_write(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
> return status;
> }
>
> +static __be32
> +nfsd4_write_plus_hole(struct file *file, struct nfsd4_write_plus *writeplus,
> + struct net *net)
> +{
> + __be32 status;
> +
> + status = nfsd4_vfs_fallocate(file, writeplus->wp_allocated,
> + writeplus->wp_offset, writeplus->wp_length);
> + if (status == nfs_ok) {
> + writeplus->wp_res.wr_stid = NULL;
> + writeplus->wp_res.wr_bytes_written = writeplus->wp_length;
> + writeplus->wp_res.wr_stable_how = NFS_FILE_SYNC;

Do we need to sync?

> + gen_boot_verifier(&writeplus->wp_res.wr_verifier, net);
> + }
> +
> + return status;

Nit, but I like the usual idiom better in most cases:

if (status)
return status;
normal case here...
return 0;

--b.

> +}
> +
> +static __be32
> +nfsd4_write_plus(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
> + struct nfsd4_write_plus *writeplus)
> +{
> + struct file *file;
> + __be32 status;
> + struct net *net = SVC_NET(rqstp);
> +
> + status = nfs4_preprocess_stateid_op(net, cstate, &writeplus->wp_stateid,
> + WR_STATE, &file);
> + if (status)
> + return status;
> +
> + if (writeplus->wp_data_content == NFS4_CONTENT_HOLE)
> + return nfsd4_write_plus_hole(file, writeplus, net);
> + return nfserr_union_notsupp;
> +}
> +
> /* This routine never returns NFS_OK! If there are no other errors, it
> * will return NFSERR_SAME or NFSERR_NOT_SAME depending on whether the
> * attributes matched. VERIFY is implemented by mapping NFSERR_SAME
> @@ -1840,6 +1876,15 @@ static struct nfsd4_operation nfsd4_ops[] = {
> .op_get_currentstateid = (stateid_getter)nfsd4_get_freestateid,
> .op_rsize_bop = (nfsd4op_rsize)nfsd4_only_status_rsize,
> },
> +
> + /* NFSv4.2 operations */
> + [OP_WRITE_PLUS] = {
> + .op_func = (nfsd4op_func)nfsd4_write_plus,
> + .op_flags = OP_MODIFIES_SOMETHING | OP_CACHEME,
> + .op_name = "OP_WRITE_PLUS",
> + .op_rsize_bop = (nfsd4op_rsize)nfsd4_write_rsize,
> + .op_get_currentstateid = (stateid_getter)nfsd4_get_writestateid,
> + },
> };
>
> #ifdef NFSD_DEBUG
> diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
> index 60f5a1f..1e4e9af 100644
> --- a/fs/nfsd/nfs4xdr.c
> +++ b/fs/nfsd/nfs4xdr.c
> @@ -1485,6 +1485,32 @@ static __be32 nfsd4_decode_reclaim_complete(struct nfsd4_compoundargs *argp, str
> }
>
> static __be32
> +nfsd4_decode_write_plus(struct nfsd4_compoundargs *argp,
> + struct nfsd4_write_plus *writeplus)
> +{
> + DECODE_HEAD;
> + unsigned int i;
> +
> + status = nfsd4_decode_stateid(argp, &writeplus->wp_stateid);
> + if (status)
> + return status;
> +
> + READ_BUF(8);
> + READ32(writeplus->wp_stable_how);
> + READ32(writeplus->wp_data_length);
> +
> + for (i = 0; i < writeplus->wp_data_length; i++) {
> + READ_BUF(24);
> + READ32(writeplus->wp_data_content);
> + READ64(writeplus->wp_offset);
> + READ64(writeplus->wp_length);
> + READ32(writeplus->wp_allocated);
> + }
> +
> + DECODE_TAIL;
> +}
> +
> +static __be32
> nfsd4_decode_noop(struct nfsd4_compoundargs *argp, void *p)
> {
> return nfs_ok;
> @@ -1665,7 +1691,7 @@ static nfsd4_dec nfsd42_dec_ops[] = {
> [OP_COPY_NOTIFY] = (nfsd4_dec)nfsd4_decode_notsupp,
> [OP_OFFLOAD_REVOKE] = (nfsd4_dec)nfsd4_decode_notsupp,
> [OP_OFFLOAD_STATUS] = (nfsd4_dec)nfsd4_decode_notsupp,
> - [OP_WRITE_PLUS] = (nfsd4_dec)nfsd4_decode_notsupp,
> + [OP_WRITE_PLUS] = (nfsd4_dec)nfsd4_decode_write_plus,
> [OP_READ_PLUS] = (nfsd4_dec)nfsd4_decode_notsupp,
> [OP_SEEK] = (nfsd4_dec)nfsd4_decode_notsupp,
> [OP_IO_ADVISE] = (nfsd4_dec)nfsd4_decode_notsupp,
> @@ -3591,6 +3617,38 @@ nfsd4_encode_test_stateid(struct nfsd4_compoundres *resp, __be32 nfserr,
> return nfserr;
> }
>
> +static void
> +nfsd42_encode_write_res(struct nfsd4_compoundres *resp, struct nfsd42_write_res *write)
> +{
> + __be32 *p;
> +
> + RESERVE_SPACE(4);
> +
> + if (write->wr_stid == NULL) {
> + WRITE32(0);
> + ADJUST_ARGS();
> + } else {
> + WRITE32(1);
> + ADJUST_ARGS();
> + nfsd4_encode_stateid(resp, &write->wr_stid->sc_stateid);
> + }
> +
> + RESERVE_SPACE(12 + NFS4_VERIFIER_SIZE);
> + WRITE64(write->wr_bytes_written);
> + WRITE32(write->wr_stable_how);
> + WRITEMEM(write->wr_verifier.data, NFS4_VERIFIER_SIZE);
> + ADJUST_ARGS();
> +}
> +
> +static __be32
> +nfsd4_encode_write_plus(struct nfsd4_compoundres *resp, __be32 nfserr,
> + struct nfsd4_write_plus *writeplus)
> +{
> + if (!nfserr)
> + nfsd42_encode_write_res(resp, &writeplus->wp_res);
> + return nfserr;
> +}
> +
> static __be32
> nfsd4_encode_noop(struct nfsd4_compoundres *resp, __be32 nfserr, void *p)
> {
> @@ -3670,7 +3728,7 @@ static nfsd4_enc nfsd4_enc_ops[] = {
> [OP_COPY_NOTIFY] = (nfsd4_enc)nfsd4_encode_noop,
> [OP_OFFLOAD_REVOKE] = (nfsd4_enc)nfsd4_encode_noop,
> [OP_OFFLOAD_STATUS] = (nfsd4_enc)nfsd4_encode_noop,
> - [OP_WRITE_PLUS] = (nfsd4_enc)nfsd4_encode_noop,
> + [OP_WRITE_PLUS] = (nfsd4_enc)nfsd4_encode_write_plus,
> [OP_READ_PLUS] = (nfsd4_enc)nfsd4_encode_noop,
> [OP_SEEK] = (nfsd4_enc)nfsd4_encode_noop,
> [OP_IO_ADVISE] = (nfsd4_enc)nfsd4_encode_noop,
> diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
> index c827acb..7fec087 100644
> --- a/fs/nfsd/vfs.c
> +++ b/fs/nfsd/vfs.c
> @@ -16,6 +16,7 @@
> #include <linux/fs.h>
> #include <linux/file.h>
> #include <linux/splice.h>
> +#include <linux/falloc.h>
> #include <linux/fcntl.h>
> #include <linux/namei.h>
> #include <linux/delay.h>
> @@ -649,6 +650,19 @@ __be32 nfsd4_set_nfs4_label(struct svc_rqst *rqstp, struct svc_fh *fhp,
> }
> #endif
>
> +__be32 nfsd4_vfs_fallocate(struct file *file, bool allocated, loff_t offset, loff_t len)
> +{
> + int error, mode = 0;
> +
> + if (allocated == false)
> + mode = FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE;
> +
> + error = do_fallocate(file, mode, offset, len);
> + if (error == 0)
> + error = vfs_fsync_range(file, offset, offset + len, 0);
> + return nfserrno(error);
> +}
> +
> #endif /* defined(CONFIG_NFSD_V4) */
>
> #ifdef CONFIG_NFSD_V3
> diff --git a/fs/nfsd/vfs.h b/fs/nfsd/vfs.h
> index a4be2e3..187eb95 100644
> --- a/fs/nfsd/vfs.h
> +++ b/fs/nfsd/vfs.h
> @@ -57,6 +57,7 @@ __be32 nfsd4_set_nfs4_acl(struct svc_rqst *, struct svc_fh *,
> int nfsd4_get_nfs4_acl(struct svc_rqst *, struct dentry *, struct nfs4_acl **);
> __be32 nfsd4_set_nfs4_label(struct svc_rqst *, struct svc_fh *,
> struct xdr_netobj *);
> +__be32 nfsd4_vfs_fallocate(struct file *, bool, loff_t, loff_t);
> #endif /* CONFIG_NFSD_V4 */
> __be32 nfsd_create(struct svc_rqst *, struct svc_fh *,
> char *name, int len, struct iattr *attrs,
> diff --git a/fs/nfsd/xdr4.h b/fs/nfsd/xdr4.h
> index b3ed644..aaef9ab 100644
> --- a/fs/nfsd/xdr4.h
> +++ b/fs/nfsd/xdr4.h
> @@ -430,6 +430,27 @@ struct nfsd4_reclaim_complete {
> u32 rca_one_fs;
> };
>
> +struct nfsd42_write_res {
> + struct nfs4_stid *wr_stid;
> + u64 wr_bytes_written;
> + u32 wr_stable_how;
> + nfs4_verifier wr_verifier;
> +};
> +
> +struct nfsd4_write_plus {
> + /* request */
> + stateid_t wp_stateid;
> + u32 wp_stable_how;
> + u32 wp_data_length;
> + u32 wp_data_content;
> + u64 wp_offset;
> + u64 wp_length;
> + u32 wp_allocated;
> +
> + /* response */
> + struct nfsd42_write_res wp_res;
> +};
> +
> struct nfsd4_op {
> int opnum;
> __be32 status;
> @@ -475,6 +496,9 @@ struct nfsd4_op {
> struct nfsd4_reclaim_complete reclaim_complete;
> struct nfsd4_test_stateid test_stateid;
> struct nfsd4_free_stateid free_stateid;
> +
> + /* NFSv4.2 */
> + struct nfsd4_write_plus write_plus;
> } u;
> struct nfs4_replay * replay;
> };
> diff --git a/fs/open.c b/fs/open.c
> index d420331..09db2d5 100644
> --- a/fs/open.c
> +++ b/fs/open.c
> @@ -278,6 +278,7 @@ int do_fallocate(struct file *file, int mode, loff_t offset, loff_t len)
> sb_end_write(inode->i_sb);
> return ret;
> }
> +EXPORT_SYMBOL_GPL(do_fallocate);
>
> SYSCALL_DEFINE4(fallocate, int, fd, int, mode, loff_t, offset, loff_t, len)
> {
> diff --git a/include/linux/nfs4.h b/include/linux/nfs4.h
> index 2bc5217..55ed991 100644
> --- a/include/linux/nfs4.h
> +++ b/include/linux/nfs4.h
> @@ -128,7 +128,7 @@ enum nfs_opnum4 {
> Needs to be updated if more operations are defined in future.*/
>
> #define FIRST_NFS4_OP OP_ACCESS
> -#define LAST_NFS4_OP OP_RECLAIM_COMPLETE
> +#define LAST_NFS4_OP OP_WRITE_PLUS
>
> enum nfsstat4 {
> NFS4_OK = 0,
> @@ -552,4 +552,10 @@ struct nfs4_deviceid {
> char data[NFS4_DEVICEID4_SIZE];
> };
>
> +enum data_content4 {
> + NFS4_CONTENT_DATA = 0,
> + NFS4_CONTENT_APP_DATA_HOLE = 1,
> + NFS4_CONTENT_HOLE = 2,
> +};
> +
> #endif
> --
> 1.8.4.1
>

2013-10-29 12:43:49

by Anna Schumaker

[permalink] [raw]
Subject: Re: [PATCH 2/4] NFSD: Create nfs v4.2 decode ops

On Mon 28 Oct 2013 05:11:19 PM EDT, J. Bruce Fields wrote:
> On Mon, Oct 28, 2013 at 08:59:59PM +0000, Myklebust, Trond wrote:
>>> -----Original Message-----
>>> From: [email protected] [mailto:linux-nfs-
>>> [email protected]] On Behalf Of J. Bruce Fields
>>> Sent: Monday, October 28, 2013 4:54 PM
>>> To: Schumaker, Bryan
>>> Cc: [email protected]
>>> Subject: Re: [PATCH 2/4] NFSD: Create nfs v4.2 decode ops
>>>
>>> On Mon, Oct 28, 2013 at 10:57:24AM -0400, Anna Schumaker wrote:
>>>> I'm doing this in a separate patch to keep from putting in a lot of
>>>> extra code when I go to add operations to the server for real.
>>>
>>> Makes sense.
>>>
>>> But: now we're duplicating the list of 4.0 op decoders 3 times and the
>>> 4.1 ops twice. We'll never need different decoders for different
>>> minorversions (worst case we can test for the minorversion in the decoder if
>>> necessary).
>>>
>>> I wonder if there's a better way to organize this.... Maybe something more
>>> like a single array with
>>>
>>> [OP_SETCLIENTID] = {
>>> .op_decode = (nfsd4_dec)nfsd4_decode_access,
>>> .op_unsupported_since_version = 1,
>>> }
>>> ...
>>> [OP_EXCHANGE_ID] = {
>>> .op_decode = (nfsd4_dec)nfsd4_decode_exchange_id,
>>> .op_first_supported_in_version = 1,
>>> }
>>>
>>> ?
>>
>> Is that really necessary? Why not just have a single array and have nfsd4_decode_clientid itself check the minor version?
>
> That'd work too. Every operation that's been introduced more recently
> than 4.0 or that's since been deprecated would need a version check at
> the top of its decoder. That'd be a dozen or so.
>
> But the information has to go somewhere and perhaps that's more
> straightforward than sticking this in data.... OK, I'd be fine with
> that.

Makes sense. Do you want me to put this in with this patch series or
do it as something separate either before or after?

>
> --b.



2013-10-29 12:49:09

by Anna Schumaker

[permalink] [raw]
Subject: Re: [PATCH 3/4] NFSD: Add WRITE_PLUS support for hole punches

On Mon 28 Oct 2013 05:40:30 PM EDT, J. Bruce Fields wrote:
> On Mon, Oct 28, 2013 at 10:57:25AM -0400, Anna Schumaker wrote:
>> This patch only implements DATA_CONTENT_HOLE support, but I tried to
>> write it so other arms of the discriminated union can be added easily
>> later.
>
> And there's an NFS4ERR_UNION_NOTSUPP to handle that case, OK. (Though
> the spec language is a little ambiguous:
>
> If the client asks for a hole and the server does not support
> that arm of the discriminated union, but does support one or
> more additional arms, it can signal to the client that it
> supports the operation, but not the arm with
> NFS4ERR_UNION_NOTSUPP.
>
> So it's unclear whether we can return this in the case the client is
> *not* asking for a hole. Are we sure the NFS4_CONTENT_DATA case is
> optional? The language in 5.3 ("Instead a client should utlize
> READ_PLUS and WRITE_PLUS") also suggests the CONTENT_DATA case might be
> meant to be mandatory.)

Fair enough. I wasn't planning on implementing the NFS4_CONTENT_DATA
arm on the client right now, but I might be able to hack something
together to test this on the server!

>
>> This patch is missing support for decoding multiple requests on the same
>> file. The way it's written now only the last range provided by the
>> client will be written to.
>
> But that doesn't seem to be a limitation anticipated by the spec, so I
> guess we need to fix that.
>
> (Why is there an array, anyway? Wouldn't it work just as well to send
> multiple RPC's, or a single RPC with multiple WRITE_PLUS ops?)

Yeah, I'll fix that up. The Linux client will send requests one at a
time, so that's what I tested against.

>
>>
>> Signed-off-by: Anna Schumaker <[email protected]>
>> ---
>> fs/nfsd/nfs4proc.c | 45 ++++++++++++++++++++++++++++++++++++++
>> fs/nfsd/nfs4xdr.c | 62 ++++++++++++++++++++++++++++++++++++++++++++++++++--
>> fs/nfsd/vfs.c | 14 ++++++++++++
>> fs/nfsd/vfs.h | 1 +
>> fs/nfsd/xdr4.h | 24 ++++++++++++++++++++
>> fs/open.c | 1 +
>> include/linux/nfs4.h | 8 ++++++-
>> 7 files changed, 152 insertions(+), 3 deletions(-)
>>
>> diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
>> index 419572f..3210c6f 100644
>> --- a/fs/nfsd/nfs4proc.c
>> +++ b/fs/nfsd/nfs4proc.c
>> @@ -1028,6 +1028,42 @@ nfsd4_write(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
>> return status;
>> }
>>
>> +static __be32
>> +nfsd4_write_plus_hole(struct file *file, struct nfsd4_write_plus *writeplus,
>> + struct net *net)
>> +{
>> + __be32 status;
>> +
>> + status = nfsd4_vfs_fallocate(file, writeplus->wp_allocated,
>> + writeplus->wp_offset, writeplus->wp_length);
>> + if (status == nfs_ok) {
>> + writeplus->wp_res.wr_stid = NULL;
>> + writeplus->wp_res.wr_bytes_written = writeplus->wp_length;
>> + writeplus->wp_res.wr_stable_how = NFS_FILE_SYNC;
>
> Do we need to sync?
>
>> + gen_boot_verifier(&writeplus->wp_res.wr_verifier, net);
>> + }
>> +
>> + return status;
>
> Nit, but I like the usual idiom better in most cases:
>
> if (status)
> return status;
> normal case here...
> return 0;
>
> --b.
>
>> +}
>> +
>> +static __be32
>> +nfsd4_write_plus(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
>> + struct nfsd4_write_plus *writeplus)
>> +{
>> + struct file *file;
>> + __be32 status;
>> + struct net *net = SVC_NET(rqstp);
>> +
>> + status = nfs4_preprocess_stateid_op(net, cstate, &writeplus->wp_stateid,
>> + WR_STATE, &file);
>> + if (status)
>> + return status;
>> +
>> + if (writeplus->wp_data_content == NFS4_CONTENT_HOLE)
>> + return nfsd4_write_plus_hole(file, writeplus, net);
>> + return nfserr_union_notsupp;
>> +}
>> +
>> /* This routine never returns NFS_OK! If there are no other errors, it
>> * will return NFSERR_SAME or NFSERR_NOT_SAME depending on whether the
>> * attributes matched. VERIFY is implemented by mapping NFSERR_SAME
>> @@ -1840,6 +1876,15 @@ static struct nfsd4_operation nfsd4_ops[] = {
>> .op_get_currentstateid = (stateid_getter)nfsd4_get_freestateid,
>> .op_rsize_bop = (nfsd4op_rsize)nfsd4_only_status_rsize,
>> },
>> +
>> + /* NFSv4.2 operations */
>> + [OP_WRITE_PLUS] = {
>> + .op_func = (nfsd4op_func)nfsd4_write_plus,
>> + .op_flags = OP_MODIFIES_SOMETHING | OP_CACHEME,
>> + .op_name = "OP_WRITE_PLUS",
>> + .op_rsize_bop = (nfsd4op_rsize)nfsd4_write_rsize,
>> + .op_get_currentstateid = (stateid_getter)nfsd4_get_writestateid,
>> + },
>> };
>>
>> #ifdef NFSD_DEBUG
>> diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
>> index 60f5a1f..1e4e9af 100644
>> --- a/fs/nfsd/nfs4xdr.c
>> +++ b/fs/nfsd/nfs4xdr.c
>> @@ -1485,6 +1485,32 @@ static __be32 nfsd4_decode_reclaim_complete(struct nfsd4_compoundargs *argp, str
>> }
>>
>> static __be32
>> +nfsd4_decode_write_plus(struct nfsd4_compoundargs *argp,
>> + struct nfsd4_write_plus *writeplus)
>> +{
>> + DECODE_HEAD;
>> + unsigned int i;
>> +
>> + status = nfsd4_decode_stateid(argp, &writeplus->wp_stateid);
>> + if (status)
>> + return status;
>> +
>> + READ_BUF(8);
>> + READ32(writeplus->wp_stable_how);
>> + READ32(writeplus->wp_data_length);
>> +
>> + for (i = 0; i < writeplus->wp_data_length; i++) {
>> + READ_BUF(24);
>> + READ32(writeplus->wp_data_content);
>> + READ64(writeplus->wp_offset);
>> + READ64(writeplus->wp_length);
>> + READ32(writeplus->wp_allocated);
>> + }
>> +
>> + DECODE_TAIL;
>> +}
>> +
>> +static __be32
>> nfsd4_decode_noop(struct nfsd4_compoundargs *argp, void *p)
>> {
>> return nfs_ok;
>> @@ -1665,7 +1691,7 @@ static nfsd4_dec nfsd42_dec_ops[] = {
>> [OP_COPY_NOTIFY] = (nfsd4_dec)nfsd4_decode_notsupp,
>> [OP_OFFLOAD_REVOKE] = (nfsd4_dec)nfsd4_decode_notsupp,
>> [OP_OFFLOAD_STATUS] = (nfsd4_dec)nfsd4_decode_notsupp,
>> - [OP_WRITE_PLUS] = (nfsd4_dec)nfsd4_decode_notsupp,
>> + [OP_WRITE_PLUS] = (nfsd4_dec)nfsd4_decode_write_plus,
>> [OP_READ_PLUS] = (nfsd4_dec)nfsd4_decode_notsupp,
>> [OP_SEEK] = (nfsd4_dec)nfsd4_decode_notsupp,
>> [OP_IO_ADVISE] = (nfsd4_dec)nfsd4_decode_notsupp,
>> @@ -3591,6 +3617,38 @@ nfsd4_encode_test_stateid(struct nfsd4_compoundres *resp, __be32 nfserr,
>> return nfserr;
>> }
>>
>> +static void
>> +nfsd42_encode_write_res(struct nfsd4_compoundres *resp, struct nfsd42_write_res *write)
>> +{
>> + __be32 *p;
>> +
>> + RESERVE_SPACE(4);
>> +
>> + if (write->wr_stid == NULL) {
>> + WRITE32(0);
>> + ADJUST_ARGS();
>> + } else {
>> + WRITE32(1);
>> + ADJUST_ARGS();
>> + nfsd4_encode_stateid(resp, &write->wr_stid->sc_stateid);
>> + }
>> +
>> + RESERVE_SPACE(12 + NFS4_VERIFIER_SIZE);
>> + WRITE64(write->wr_bytes_written);
>> + WRITE32(write->wr_stable_how);
>> + WRITEMEM(write->wr_verifier.data, NFS4_VERIFIER_SIZE);
>> + ADJUST_ARGS();
>> +}
>> +
>> +static __be32
>> +nfsd4_encode_write_plus(struct nfsd4_compoundres *resp, __be32 nfserr,
>> + struct nfsd4_write_plus *writeplus)
>> +{
>> + if (!nfserr)
>> + nfsd42_encode_write_res(resp, &writeplus->wp_res);
>> + return nfserr;
>> +}
>> +
>> static __be32
>> nfsd4_encode_noop(struct nfsd4_compoundres *resp, __be32 nfserr, void *p)
>> {
>> @@ -3670,7 +3728,7 @@ static nfsd4_enc nfsd4_enc_ops[] = {
>> [OP_COPY_NOTIFY] = (nfsd4_enc)nfsd4_encode_noop,
>> [OP_OFFLOAD_REVOKE] = (nfsd4_enc)nfsd4_encode_noop,
>> [OP_OFFLOAD_STATUS] = (nfsd4_enc)nfsd4_encode_noop,
>> - [OP_WRITE_PLUS] = (nfsd4_enc)nfsd4_encode_noop,
>> + [OP_WRITE_PLUS] = (nfsd4_enc)nfsd4_encode_write_plus,
>> [OP_READ_PLUS] = (nfsd4_enc)nfsd4_encode_noop,
>> [OP_SEEK] = (nfsd4_enc)nfsd4_encode_noop,
>> [OP_IO_ADVISE] = (nfsd4_enc)nfsd4_encode_noop,
>> diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
>> index c827acb..7fec087 100644
>> --- a/fs/nfsd/vfs.c
>> +++ b/fs/nfsd/vfs.c
>> @@ -16,6 +16,7 @@
>> #include <linux/fs.h>
>> #include <linux/file.h>
>> #include <linux/splice.h>
>> +#include <linux/falloc.h>
>> #include <linux/fcntl.h>
>> #include <linux/namei.h>
>> #include <linux/delay.h>
>> @@ -649,6 +650,19 @@ __be32 nfsd4_set_nfs4_label(struct svc_rqst *rqstp, struct svc_fh *fhp,
>> }
>> #endif
>>
>> +__be32 nfsd4_vfs_fallocate(struct file *file, bool allocated, loff_t offset, loff_t len)
>> +{
>> + int error, mode = 0;
>> +
>> + if (allocated == false)
>> + mode = FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE;
>> +
>> + error = do_fallocate(file, mode, offset, len);
>> + if (error == 0)
>> + error = vfs_fsync_range(file, offset, offset + len, 0);
>> + return nfserrno(error);
>> +}
>> +
>> #endif /* defined(CONFIG_NFSD_V4) */
>>
>> #ifdef CONFIG_NFSD_V3
>> diff --git a/fs/nfsd/vfs.h b/fs/nfsd/vfs.h
>> index a4be2e3..187eb95 100644
>> --- a/fs/nfsd/vfs.h
>> +++ b/fs/nfsd/vfs.h
>> @@ -57,6 +57,7 @@ __be32 nfsd4_set_nfs4_acl(struct svc_rqst *, struct svc_fh *,
>> int nfsd4_get_nfs4_acl(struct svc_rqst *, struct dentry *, struct nfs4_acl **);
>> __be32 nfsd4_set_nfs4_label(struct svc_rqst *, struct svc_fh *,
>> struct xdr_netobj *);
>> +__be32 nfsd4_vfs_fallocate(struct file *, bool, loff_t, loff_t);
>> #endif /* CONFIG_NFSD_V4 */
>> __be32 nfsd_create(struct svc_rqst *, struct svc_fh *,
>> char *name, int len, struct iattr *attrs,
>> diff --git a/fs/nfsd/xdr4.h b/fs/nfsd/xdr4.h
>> index b3ed644..aaef9ab 100644
>> --- a/fs/nfsd/xdr4.h
>> +++ b/fs/nfsd/xdr4.h
>> @@ -430,6 +430,27 @@ struct nfsd4_reclaim_complete {
>> u32 rca_one_fs;
>> };
>>
>> +struct nfsd42_write_res {
>> + struct nfs4_stid *wr_stid;
>> + u64 wr_bytes_written;
>> + u32 wr_stable_how;
>> + nfs4_verifier wr_verifier;
>> +};
>> +
>> +struct nfsd4_write_plus {
>> + /* request */
>> + stateid_t wp_stateid;
>> + u32 wp_stable_how;
>> + u32 wp_data_length;
>> + u32 wp_data_content;
>> + u64 wp_offset;
>> + u64 wp_length;
>> + u32 wp_allocated;
>> +
>> + /* response */
>> + struct nfsd42_write_res wp_res;
>> +};
>> +
>> struct nfsd4_op {
>> int opnum;
>> __be32 status;
>> @@ -475,6 +496,9 @@ struct nfsd4_op {
>> struct nfsd4_reclaim_complete reclaim_complete;
>> struct nfsd4_test_stateid test_stateid;
>> struct nfsd4_free_stateid free_stateid;
>> +
>> + /* NFSv4.2 */
>> + struct nfsd4_write_plus write_plus;
>> } u;
>> struct nfs4_replay * replay;
>> };
>> diff --git a/fs/open.c b/fs/open.c
>> index d420331..09db2d5 100644
>> --- a/fs/open.c
>> +++ b/fs/open.c
>> @@ -278,6 +278,7 @@ int do_fallocate(struct file *file, int mode, loff_t offset, loff_t len)
>> sb_end_write(inode->i_sb);
>> return ret;
>> }
>> +EXPORT_SYMBOL_GPL(do_fallocate);
>>
>> SYSCALL_DEFINE4(fallocate, int, fd, int, mode, loff_t, offset, loff_t, len)
>> {
>> diff --git a/include/linux/nfs4.h b/include/linux/nfs4.h
>> index 2bc5217..55ed991 100644
>> --- a/include/linux/nfs4.h
>> +++ b/include/linux/nfs4.h
>> @@ -128,7 +128,7 @@ enum nfs_opnum4 {
>> Needs to be updated if more operations are defined in future.*/
>>
>> #define FIRST_NFS4_OP OP_ACCESS
>> -#define LAST_NFS4_OP OP_RECLAIM_COMPLETE
>> +#define LAST_NFS4_OP OP_WRITE_PLUS
>>
>> enum nfsstat4 {
>> NFS4_OK = 0,
>> @@ -552,4 +552,10 @@ struct nfs4_deviceid {
>> char data[NFS4_DEVICEID4_SIZE];
>> };
>>
>> +enum data_content4 {
>> + NFS4_CONTENT_DATA = 0,
>> + NFS4_CONTENT_APP_DATA_HOLE = 1,
>> + NFS4_CONTENT_HOLE = 2,
>> +};
>> +
>> #endif
>> --
>> 1.8.4.1
>>



2013-10-28 20:54:16

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH 2/4] NFSD: Create nfs v4.2 decode ops

On Mon, Oct 28, 2013 at 10:57:24AM -0400, Anna Schumaker wrote:
> I'm doing this in a separate patch to keep from putting in a lot of
> extra code when I go to add operations to the server for real.

Makes sense.

But: now we're duplicating the list of 4.0 op decoders 3 times and the
4.1 ops twice. We'll never need different decoders for different
minorversions (worst case we can test for the minorversion in the
decoder if necessary).

I wonder if there's a better way to organize this.... Maybe something
more like a single array with

[OP_SETCLIENTID] = {
.op_decode = (nfsd4_dec)nfsd4_decode_access,
.op_unsupported_since_version = 1,
}
...
[OP_EXCHANGE_ID] = {
.op_decode = (nfsd4_dec)nfsd4_decode_exchange_id,
.op_first_supported_in_version = 1,
}

?

--b.

>
> Signed-off-by: Anna Schumaker <[email protected]>
> ---
> fs/nfsd/nfs4xdr.c | 85 +++++++++++++++++++++++++++++++++++++++++++++++++++-
> include/linux/nfs4.h | 11 +++++++
> 2 files changed, 95 insertions(+), 1 deletion(-)
>
> diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
> index d9454fe..60f5a1f 100644
> --- a/fs/nfsd/nfs4xdr.c
> +++ b/fs/nfsd/nfs4xdr.c
> @@ -1599,6 +1599,78 @@ static nfsd4_dec nfsd41_dec_ops[] = {
> [OP_RECLAIM_COMPLETE] = (nfsd4_dec)nfsd4_decode_reclaim_complete,
> };
>
> +static nfsd4_dec nfsd42_dec_ops[] = {
> + [OP_ACCESS] = (nfsd4_dec)nfsd4_decode_access,
> + [OP_CLOSE] = (nfsd4_dec)nfsd4_decode_close,
> + [OP_COMMIT] = (nfsd4_dec)nfsd4_decode_commit,
> + [OP_CREATE] = (nfsd4_dec)nfsd4_decode_create,
> + [OP_DELEGPURGE] = (nfsd4_dec)nfsd4_decode_notsupp,
> + [OP_DELEGRETURN] = (nfsd4_dec)nfsd4_decode_delegreturn,
> + [OP_GETATTR] = (nfsd4_dec)nfsd4_decode_getattr,
> + [OP_GETFH] = (nfsd4_dec)nfsd4_decode_noop,
> + [OP_LINK] = (nfsd4_dec)nfsd4_decode_link,
> + [OP_LOCK] = (nfsd4_dec)nfsd4_decode_lock,
> + [OP_LOCKT] = (nfsd4_dec)nfsd4_decode_lockt,
> + [OP_LOCKU] = (nfsd4_dec)nfsd4_decode_locku,
> + [OP_LOOKUP] = (nfsd4_dec)nfsd4_decode_lookup,
> + [OP_LOOKUPP] = (nfsd4_dec)nfsd4_decode_noop,
> + [OP_NVERIFY] = (nfsd4_dec)nfsd4_decode_verify,
> + [OP_OPEN] = (nfsd4_dec)nfsd4_decode_open,
> + [OP_OPENATTR] = (nfsd4_dec)nfsd4_decode_notsupp,
> + [OP_OPEN_CONFIRM] = (nfsd4_dec)nfsd4_decode_notsupp,
> + [OP_OPEN_DOWNGRADE] = (nfsd4_dec)nfsd4_decode_open_downgrade,
> + [OP_PUTFH] = (nfsd4_dec)nfsd4_decode_putfh,
> + [OP_PUTPUBFH] = (nfsd4_dec)nfsd4_decode_notsupp,
> + [OP_PUTROOTFH] = (nfsd4_dec)nfsd4_decode_noop,
> + [OP_READ] = (nfsd4_dec)nfsd4_decode_read,
> + [OP_READDIR] = (nfsd4_dec)nfsd4_decode_readdir,
> + [OP_READLINK] = (nfsd4_dec)nfsd4_decode_noop,
> + [OP_REMOVE] = (nfsd4_dec)nfsd4_decode_remove,
> + [OP_RENAME] = (nfsd4_dec)nfsd4_decode_rename,
> + [OP_RENEW] = (nfsd4_dec)nfsd4_decode_notsupp,
> + [OP_RESTOREFH] = (nfsd4_dec)nfsd4_decode_noop,
> + [OP_SAVEFH] = (nfsd4_dec)nfsd4_decode_noop,
> + [OP_SECINFO] = (nfsd4_dec)nfsd4_decode_secinfo,
> + [OP_SETATTR] = (nfsd4_dec)nfsd4_decode_setattr,
> + [OP_SETCLIENTID] = (nfsd4_dec)nfsd4_decode_notsupp,
> + [OP_SETCLIENTID_CONFIRM]= (nfsd4_dec)nfsd4_decode_notsupp,
> + [OP_VERIFY] = (nfsd4_dec)nfsd4_decode_verify,
> + [OP_WRITE] = (nfsd4_dec)nfsd4_decode_write,
> + [OP_RELEASE_LOCKOWNER] = (nfsd4_dec)nfsd4_decode_notsupp,
> +
> + /* new operations for NFSv4.1 */
> + [OP_BACKCHANNEL_CTL] = (nfsd4_dec)nfsd4_decode_backchannel_ctl,
> + [OP_BIND_CONN_TO_SESSION]= (nfsd4_dec)nfsd4_decode_bind_conn_to_session,
> + [OP_EXCHANGE_ID] = (nfsd4_dec)nfsd4_decode_exchange_id,
> + [OP_CREATE_SESSION] = (nfsd4_dec)nfsd4_decode_create_session,
> + [OP_DESTROY_SESSION] = (nfsd4_dec)nfsd4_decode_destroy_session,
> + [OP_FREE_STATEID] = (nfsd4_dec)nfsd4_decode_free_stateid,
> + [OP_GET_DIR_DELEGATION] = (nfsd4_dec)nfsd4_decode_notsupp,
> + [OP_GETDEVICEINFO] = (nfsd4_dec)nfsd4_decode_notsupp,
> + [OP_GETDEVICELIST] = (nfsd4_dec)nfsd4_decode_notsupp,
> + [OP_LAYOUTCOMMIT] = (nfsd4_dec)nfsd4_decode_notsupp,
> + [OP_LAYOUTGET] = (nfsd4_dec)nfsd4_decode_notsupp,
> + [OP_LAYOUTRETURN] = (nfsd4_dec)nfsd4_decode_notsupp,
> + [OP_SECINFO_NO_NAME] = (nfsd4_dec)nfsd4_decode_secinfo_no_name,
> + [OP_SEQUENCE] = (nfsd4_dec)nfsd4_decode_sequence,
> + [OP_SET_SSV] = (nfsd4_dec)nfsd4_decode_notsupp,
> + [OP_TEST_STATEID] = (nfsd4_dec)nfsd4_decode_test_stateid,
> + [OP_WANT_DELEGATION] = (nfsd4_dec)nfsd4_decode_notsupp,
> + [OP_DESTROY_CLIENTID] = (nfsd4_dec)nfsd4_decode_destroy_clientid,
> + [OP_RECLAIM_COMPLETE] = (nfsd4_dec)nfsd4_decode_reclaim_complete,
> +
> + /* new operations for NFSv4.2 */
> + [OP_COPY] = (nfsd4_dec)nfsd4_decode_notsupp,
> + [OP_OFFLOAD_ABORT] = (nfsd4_dec)nfsd4_decode_notsupp,
> + [OP_COPY_NOTIFY] = (nfsd4_dec)nfsd4_decode_notsupp,
> + [OP_OFFLOAD_REVOKE] = (nfsd4_dec)nfsd4_decode_notsupp,
> + [OP_OFFLOAD_STATUS] = (nfsd4_dec)nfsd4_decode_notsupp,
> + [OP_WRITE_PLUS] = (nfsd4_dec)nfsd4_decode_notsupp,
> + [OP_READ_PLUS] = (nfsd4_dec)nfsd4_decode_notsupp,
> + [OP_SEEK] = (nfsd4_dec)nfsd4_decode_notsupp,
> + [OP_IO_ADVISE] = (nfsd4_dec)nfsd4_decode_notsupp,
> +};
> +
> struct nfsd4_minorversion_ops {
> nfsd4_dec *decoders;
> int nops;
> @@ -1607,7 +1679,7 @@ struct nfsd4_minorversion_ops {
> static struct nfsd4_minorversion_ops nfsd4_minorversion[] = {
> [0] = { nfsd4_dec_ops, ARRAY_SIZE(nfsd4_dec_ops) },
> [1] = { nfsd41_dec_ops, ARRAY_SIZE(nfsd41_dec_ops) },
> - [2] = { nfsd41_dec_ops, ARRAY_SIZE(nfsd41_dec_ops) },
> + [2] = { nfsd42_dec_ops, ARRAY_SIZE(nfsd42_dec_ops) },
> };
>
> static __be32
> @@ -3591,6 +3663,17 @@ static nfsd4_enc nfsd4_enc_ops[] = {
> [OP_WANT_DELEGATION] = (nfsd4_enc)nfsd4_encode_noop,
> [OP_DESTROY_CLIENTID] = (nfsd4_enc)nfsd4_encode_noop,
> [OP_RECLAIM_COMPLETE] = (nfsd4_enc)nfsd4_encode_noop,
> +
> + /* NFSv4.2 operations */
> + [OP_COPY] = (nfsd4_enc)nfsd4_encode_noop,
> + [OP_OFFLOAD_ABORT] = (nfsd4_enc)nfsd4_encode_noop,
> + [OP_COPY_NOTIFY] = (nfsd4_enc)nfsd4_encode_noop,
> + [OP_OFFLOAD_REVOKE] = (nfsd4_enc)nfsd4_encode_noop,
> + [OP_OFFLOAD_STATUS] = (nfsd4_enc)nfsd4_encode_noop,
> + [OP_WRITE_PLUS] = (nfsd4_enc)nfsd4_encode_noop,
> + [OP_READ_PLUS] = (nfsd4_enc)nfsd4_encode_noop,
> + [OP_SEEK] = (nfsd4_enc)nfsd4_encode_noop,
> + [OP_IO_ADVISE] = (nfsd4_enc)nfsd4_encode_noop,
> };
>
> /*
> diff --git a/include/linux/nfs4.h b/include/linux/nfs4.h
> index a92e065..2bc5217 100644
> --- a/include/linux/nfs4.h
> +++ b/include/linux/nfs4.h
> @@ -110,6 +110,17 @@ enum nfs_opnum4 {
> OP_DESTROY_CLIENTID = 57,
> OP_RECLAIM_COMPLETE = 58,
>
> + /* nfs42 */
> + OP_COPY = 59,
> + OP_OFFLOAD_ABORT = 60,
> + OP_COPY_NOTIFY = 61,
> + OP_OFFLOAD_REVOKE = 62,
> + OP_OFFLOAD_STATUS = 63,
> + OP_WRITE_PLUS = 64,
> + OP_READ_PLUS = 65,
> + OP_SEEK = 66,
> + OP_IO_ADVISE = 67,
> +
> OP_ILLEGAL = 10044,
> };
>
> --
> 1.8.4.1
>

2013-10-28 20:00:21

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH 0/4] NFSD: Add support for WRITE_PLUS and SEEK

On Mon, Oct 28, 2013 at 10:57:22AM -0400, Anna Schumaker wrote:
> Patch 3 (Add WRITE_PLUS support for hole punches) exports the do_fallocate()
> function from fs/open.c. Should this be its own patch submitted somewhere
> else? Is there a better way to get into the VFS fallocate routines?

Should defintively be a separate patch, and Cc linux-fsdevel on the
whole series. Also I'd suggest to rename it to vfs_fallocate given that
it takes a file structure.


2013-10-29 13:32:21

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH 3/4] NFSD: Add WRITE_PLUS support for hole punches

On Tue, Oct 29, 2013 at 01:23:19PM +0000, Myklebust, Trond wrote:
>
> On Oct 29, 2013, at 9:06 AM, J. Bruce Fields <[email protected]> wrote:
>
> > On Tue, Oct 29, 2013 at 08:50:56AM -0400, Anna Schumaker wrote:
> >> On Mon 28 Oct 2013 05:40:30 PM EDT, J. Bruce Fields wrote:
> >>> On Mon, Oct 28, 2013 at 10:57:25AM -0400, Anna Schumaker wrote:
> >>>> diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
> >>>> index 419572f..3210c6f 100644
> >>>> --- a/fs/nfsd/nfs4proc.c
> >>>> +++ b/fs/nfsd/nfs4proc.c
> >>>> @@ -1028,6 +1028,42 @@ nfsd4_write(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
> >>>> return status;
> >>>> }
> >>>>
> >>>> +static __be32
> >>>> +nfsd4_write_plus_hole(struct file *file, struct nfsd4_write_plus *writeplus,
> >>>> + struct net *net)
> >>>> +{
> >>>> + __be32 status;
> >>>> +
> >>>> + status = nfsd4_vfs_fallocate(file, writeplus->wp_allocated,
> >>>> + writeplus->wp_offset, writeplus->wp_length);
> >>>> + if (status == nfs_ok) {
> >>>> + writeplus->wp_res.wr_stid = NULL;
> >>>> + writeplus->wp_res.wr_bytes_written = writeplus->wp_length;
> >>>> + writeplus->wp_res.wr_stable_how = NFS_FILE_SYNC;
> >>>
> >>> Do we need to sync?
> >>
> >> I did the sync in nfsd4_vfs_fallocate (below), but I can move it if
> >> that would make more sense.
> >
> > What I meant was--why are we doing a sync at all, instead of returning
> > NFS_UNSTABLE and making the client commit?
>
> What if the client specifies FILE_SYNC? The spec doesn't allow the server to return UNSTABLE in that situation.

Of course, sorry for being unclear. I was just wondering if there's
some particular reason we're not using NFS_UNSTABLE when the client
allows it.

But "because it's easier" is a fine answer. We can always add that
later if it turns out to be useful.

--b.

2013-10-31 16:17:41

by Anna Schumaker

[permalink] [raw]
Subject: Re: [PATCH 4/4] NFSD: Implement SEEK

On 10/31/2013 12:04 PM, Christoph Hellwig wrote:
> Btw, I just noticed that current nfs mainline fails xfstests 286 which
> test SEEK_DATA/SEEK_HOLE. That sounds like a useful test case for your
> patches.
>

I read too quickly. I've been using 285 to test against, but I'll see what 286 gets me! Thanks for the tip :)

2013-10-29 07:36:01

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH 4/4] NFSD: Implement SEEK

On Mon, Oct 28, 2013 at 10:57:26AM -0400, Anna Schumaker wrote:
> This patch adds in the SEEK operation used by clients doing an llseek on
> a file to find either hole or data sections in a file. I'm faking the
> "allocated" status right now, since I haven't quite figured out how to
> tell if a range is allocated on disk yet.
>
> This patch is missing correctly determining the "allocated" status of
> the HOLE / DATA range. I expect I'll need to learn all about how fiemap
> works before properly setting these values.

What is the definition of allocated in this context? Specificly how
does it different from meaning of allocated as used by SEEK_DATA?

Going out to fiemap is something we should absolutely avoid.


2013-10-29 13:07:22

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH 4/4] NFSD: Implement SEEK

On Tue, Oct 29, 2013 at 09:00:48AM -0400, Anna Schumaker wrote:
> From what I can tell, I think the allocated flag will just tell the
> clients if all the blocks exist on disk or not. Is there a way to have
> a hole with allocated blocks? Or maybe it's supposed to represent
> partially allocated blocks? I checked the draft, and it doesn't
> actually say what they expect allocated to represent...

You can have preallocated space, which behaves like a hole in that
reads return zeroes. As far as SEEK_DATA/SEEK_HOLE is concerned we
tend to treat them as holes when we can as that's what the users expect.

Note that at least the pnfs block rfc has a special state representing
this kind of preallocated space.


2013-10-29 13:38:01

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [RFC 5/4] NFSD: Add basic CB_OFFLOAD support

On Tue, Oct 29, 2013 at 09:36:11AM -0400, J. Bruce Fields wrote:
> Yeah, understood, I'm glad we're not implementing that, I just wonder
> why every one of these operations (COPY, WRITE_PLUS, etc.) has to have
> this asynchronous option.
>
> The client's still stuck implementing it even if the server does, it's
> extra protocol verbage even if nobody uses it, and I'm not completely
> clear what it's for.

Seems like Trond answered that question: feature creep that people
without the slightest sense of abstraction tried to overload over a few
operations.


2013-10-29 12:53:08

by Anna Schumaker

[permalink] [raw]
Subject: Re: [PATCH 4/4] NFSD: Implement SEEK

On Mon 28 Oct 2013 05:51:26 PM EDT, J. Bruce Fields wrote:
> On Mon, Oct 28, 2013 at 10:57:26AM -0400, Anna Schumaker wrote:
>> This patch adds in the SEEK operation used by clients doing an llseek on
>> a file to find either hole or data sections in a file. I'm faking the
>> "allocated" status right now, since I haven't quite figured out how to
>> tell if a range is allocated on disk yet.
>>
>> This patch is missing correctly determining the "allocated" status of
>> the HOLE / DATA range. I expect I'll need to learn all about how fiemap
>> works before properly setting these values.
>>
>> Signed-off-by: Anna Schumaker <[email protected]>
>> ---
>> fs/nfsd/nfs4proc.c | 44 ++++++++++++++++++++++++++++++++++++++++++++
>> fs/nfsd/nfs4xdr.c | 40 ++++++++++++++++++++++++++++++++++++++--
>> fs/nfsd/xdr4.h | 15 +++++++++++++++
>> include/linux/nfs4.h | 2 +-
>> 4 files changed, 98 insertions(+), 3 deletions(-)
>>
>> diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
>> index 3210c6f..bc45ed2 100644
>> --- a/fs/nfsd/nfs4proc.c
>> +++ b/fs/nfsd/nfs4proc.c
>> @@ -1064,6 +1064,45 @@ nfsd4_write_plus(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
>> return nfserr_union_notsupp;
>> }
>>
>> +static __be32
>> +nfsd4_seek(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
>> + struct nfsd4_seek *seek)
>> +{
>> + struct file *file;
>> + loff_t pos, end_pos;
>> + __be32 status;
>> +
>> + status = nfs4_preprocess_stateid_op(SVC_NET(rqstp), cstate,
>> + &seek->seek_stateid,
>> + RD_STATE | WR_STATE, &file);
>
> Note nfs4_preprocess_stateid_op requires the caller to hold the state
> lock. Andyou want to either hold it till you're done using "file" or
> take a reference on file before dropping it.

That's useful to know. I'll fix it up here, and anywhere else I call
nfs4_preprocess_stateid_op()!
>
> --b.
>
>> + if (status != nfs_ok)
>> + return status;
>> +
>> + if (seek->seek_whence == NFS4_CONTENT_DATA) {
>> + pos = vfs_llseek(file, seek->seek_offset, SEEK_DATA);
>> + end_pos = vfs_llseek(file, pos, SEEK_HOLE);
>> + seek->seek_allocated = true;
>> + } else {
>> + pos = vfs_llseek(file, seek->seek_offset, SEEK_HOLE);
>> + end_pos = vfs_llseek(file, pos, SEEK_DATA);
>> + seek->seek_allocated = false;
>> + }
>> +
>> + if (pos < 0)
>> + return nfserrno(pos);
>> + else if (end_pos == -ENXIO) {
>> + seek->seek_length = 0;
>> + seek->seek_eof = true;
>> + } else if (end_pos < 0)
>> + return nfserrno(end_pos);
>> + else
>> + seek->seek_length = end_pos - pos;
>> +
>> + seek->seek_offset = pos;
>> +
>> + return nfs_ok;
>> +}
>> +
>> /* This routine never returns NFS_OK! If there are no other errors, it
>> * will return NFSERR_SAME or NFSERR_NOT_SAME depending on whether the
>> * attributes matched. VERIFY is implemented by mapping NFSERR_SAME
>> @@ -1885,6 +1924,11 @@ static struct nfsd4_operation nfsd4_ops[] = {
>> .op_rsize_bop = (nfsd4op_rsize)nfsd4_write_rsize,
>> .op_get_currentstateid = (stateid_getter)nfsd4_get_writestateid,
>> },
>> +
>> + [OP_SEEK] = {
>> + .op_func = (nfsd4op_func)nfsd4_seek,
>> + .op_name = "OP_SEEK",
>> + },
>> };
>>
>> #ifdef NFSD_DEBUG
>> diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
>> index 1e4e9af..8434740 100644
>> --- a/fs/nfsd/nfs4xdr.c
>> +++ b/fs/nfsd/nfs4xdr.c
>> @@ -1511,6 +1511,22 @@ nfsd4_decode_write_plus(struct nfsd4_compoundargs *argp,
>> }
>>
>> static __be32
>> +nfsd4_decode_seek(struct nfsd4_compoundargs *argp, struct nfsd4_seek *seek)
>> +{
>> + DECODE_HEAD;
>> +
>> + status = nfsd4_decode_stateid(argp, &seek->seek_stateid);
>> + if (status)
>> + return status;
>> +
>> + READ_BUF(12);
>> + READ64(seek->seek_offset);
>> + READ32(seek->seek_whence);
>> +
>> + DECODE_TAIL;
>> +}
>> +
>> +static __be32
>> nfsd4_decode_noop(struct nfsd4_compoundargs *argp, void *p)
>> {
>> return nfs_ok;
>> @@ -1693,7 +1709,7 @@ static nfsd4_dec nfsd42_dec_ops[] = {
>> [OP_OFFLOAD_STATUS] = (nfsd4_dec)nfsd4_decode_notsupp,
>> [OP_WRITE_PLUS] = (nfsd4_dec)nfsd4_decode_write_plus,
>> [OP_READ_PLUS] = (nfsd4_dec)nfsd4_decode_notsupp,
>> - [OP_SEEK] = (nfsd4_dec)nfsd4_decode_notsupp,
>> + [OP_SEEK] = (nfsd4_dec)nfsd4_decode_seek,
>> [OP_IO_ADVISE] = (nfsd4_dec)nfsd4_decode_notsupp,
>> };
>>
>> @@ -3650,6 +3666,26 @@ nfsd4_encode_write_plus(struct nfsd4_compoundres *resp, __be32 nfserr,
>> }
>>
>> static __be32
>> +nfsd4_encode_seek(struct nfsd4_compoundres *resp, __be32 nfserr,
>> + struct nfsd4_seek *seek)
>> +{
>> + __be32 *p;
>> +
>> + if (nfserr)
>> + return nfserr;
>> +
>> + RESERVE_SPACE(28);
>> + WRITE32(seek->seek_eof);
>> + WRITE32(seek->seek_whence);
>> + WRITE64(seek->seek_offset);
>> + WRITE64(seek->seek_length);
>> + WRITE32(seek->seek_allocated);
>> + ADJUST_ARGS();
>> +
>> + return nfserr;
>> +}
>> +
>> +static __be32
>> nfsd4_encode_noop(struct nfsd4_compoundres *resp, __be32 nfserr, void *p)
>> {
>> return nfserr;
>> @@ -3730,7 +3766,7 @@ static nfsd4_enc nfsd4_enc_ops[] = {
>> [OP_OFFLOAD_STATUS] = (nfsd4_enc)nfsd4_encode_noop,
>> [OP_WRITE_PLUS] = (nfsd4_enc)nfsd4_encode_write_plus,
>> [OP_READ_PLUS] = (nfsd4_enc)nfsd4_encode_noop,
>> - [OP_SEEK] = (nfsd4_enc)nfsd4_encode_noop,
>> + [OP_SEEK] = (nfsd4_enc)nfsd4_encode_seek,
>> [OP_IO_ADVISE] = (nfsd4_enc)nfsd4_encode_noop,
>> };
>>
>> diff --git a/fs/nfsd/xdr4.h b/fs/nfsd/xdr4.h
>> index aaef9ab..ae9debc 100644
>> --- a/fs/nfsd/xdr4.h
>> +++ b/fs/nfsd/xdr4.h
>> @@ -451,6 +451,20 @@ struct nfsd4_write_plus {
>> struct nfsd42_write_res wp_res;
>> };
>>
>> +struct nfsd4_seek {
>> + /* request */
>> + stateid_t seek_stateid;
>> +
>> + /* Shared between request and response */
>> + u64 seek_offset;
>> + u32 seek_whence;
>> +
>> + /* response */
>> + u32 seek_eof;
>> + u64 seek_length;
>> + u32 seek_allocated;
>> +};
>> +
>> struct nfsd4_op {
>> int opnum;
>> __be32 status;
>> @@ -499,6 +513,7 @@ struct nfsd4_op {
>>
>> /* NFSv4.2 */
>> struct nfsd4_write_plus write_plus;
>> + struct nfsd4_seek seek;
>> } u;
>> struct nfs4_replay * replay;
>> };
>> diff --git a/include/linux/nfs4.h b/include/linux/nfs4.h
>> index 55ed991..81d6b09 100644
>> --- a/include/linux/nfs4.h
>> +++ b/include/linux/nfs4.h
>> @@ -128,7 +128,7 @@ enum nfs_opnum4 {
>> Needs to be updated if more operations are defined in future.*/
>>
>> #define FIRST_NFS4_OP OP_ACCESS
>> -#define LAST_NFS4_OP OP_WRITE_PLUS
>> +#define LAST_NFS4_OP OP_SEEK
>>
>> enum nfsstat4 {
>> NFS4_OK = 0,
>> --
>> 1.8.4.1
>>



2013-10-28 14:57:36

by Anna Schumaker

[permalink] [raw]
Subject: [RFC 5/4] NFSD: Add basic CB_OFFLOAD support

This patch adds basic offload support to the WRITE_PLUS operation.
Since I don't implement OFFLOAD_ABORT, OFFLOAD_REVOKE or OFFLOAD_STATUS
this patch is NOT spec compliant and should not be applied without
further work.

I'm including this patch to show how I handled offloading to test client
code.
---
fs/nfsd/nfs4callback.c | 132 +++++++++++++++++++++++++++++++++++++++++++++++++
fs/nfsd/nfs4proc.c | 56 ++++++++++++++++++---
fs/nfsd/nfs4state.c | 11 +++++
fs/nfsd/state.h | 4 ++
fs/nfsd/xdr4.h | 14 ++++++
fs/nfsd/xdr4cb.h | 9 ++++
6 files changed, 219 insertions(+), 7 deletions(-)

diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
index 7f05cd1..1f6c150 100644
--- a/fs/nfsd/nfs4callback.c
+++ b/fs/nfsd/nfs4callback.c
@@ -35,6 +35,7 @@
#include <linux/sunrpc/svc_xprt.h>
#include <linux/slab.h>
#include "nfsd.h"
+#include "xdr4.h"
#include "state.h"
#include "netns.h"
#include "xdr4cb.h"
@@ -52,6 +53,9 @@ enum {
NFSPROC4_CLNT_CB_NULL = 0,
NFSPROC4_CLNT_CB_RECALL,
NFSPROC4_CLNT_CB_SEQUENCE,
+
+ /* NFS v4.2 callback */
+ NFSPROC4_CLNT_CB_OFFLOAD,
};

struct nfs4_cb_compound_hdr {
@@ -110,6 +114,7 @@ enum nfs_cb_opnum4 {
OP_CB_WANTS_CANCELLED = 12,
OP_CB_NOTIFY_LOCK = 13,
OP_CB_NOTIFY_DEVICEID = 14,
+ OP_CB_OFFLOAD = 15,
OP_CB_ILLEGAL = 10044
};

@@ -469,6 +474,31 @@ out_default:
return nfs_cb_stat_to_errno(nfserr);
}

+static void encode_cb_offload4args(struct xdr_stream *xdr,
+ const struct nfsd4_cb_offload *offload,
+ struct nfs4_cb_compound_hdr *hdr)
+{
+ __be32 *p;
+
+ if (hdr->minorversion < 2)
+ return;
+
+ encode_nfs_cb_opnum4(xdr, OP_CB_OFFLOAD);
+ encode_nfs_fh4(xdr, &offload->co_fh);
+ encode_stateid4(xdr, &offload->co_res.wr_stid->sc_stateid);
+
+ p = xdr_reserve_space(xdr, 4);
+ *p = cpu_to_be32(1);
+ encode_stateid4(xdr, &offload->co_res.wr_stid->sc_stateid);
+
+ p = xdr_reserve_space(xdr, 12 + NFS4_VERIFIER_SIZE);
+ p = xdr_encode_hyper(p, offload->co_res.wr_bytes_written);
+ *p++ = cpu_to_be32(offload->co_res.wr_stable_how);
+ xdr_encode_opaque_fixed(p, offload->co_res.wr_verifier.data, NFS4_VERIFIER_SIZE);
+
+ hdr->nops++;
+}
+
/*
* NFSv4.0 and NFSv4.1 XDR encode functions
*
@@ -505,6 +535,23 @@ static void nfs4_xdr_enc_cb_recall(struct rpc_rqst *req, struct xdr_stream *xdr,
encode_cb_nops(&hdr);
}

+/*
+ * CB_OFFLOAD
+ */
+static void nfs4_xdr_enc_cb_offload(struct rpc_rqst *req, struct xdr_stream *xdr,
+ const struct nfsd4_callback *cb)
+{
+ const struct nfsd4_cb_offload *args = cb->cb_op;
+ struct nfs4_cb_compound_hdr hdr = {
+ .ident = cb->cb_clp->cl_cb_ident,
+ .minorversion = cb->cb_minorversion,
+ };
+
+ encode_cb_compound4args(xdr, &hdr);
+ encode_cb_sequence4args(xdr, cb, &hdr);
+ encode_cb_offload4args(xdr, args, &hdr);
+ encode_cb_nops(&hdr);
+}

/*
* NFSv4.0 and NFSv4.1 XDR decode functions
@@ -552,6 +599,36 @@ out:
}

/*
+ * CB_OFFLOAD
+ */
+static int nfs4_xdr_dec_cb_offload(struct rpc_rqst *rqstp, struct xdr_stream *xdr,
+ struct nfsd4_callback *cb)
+{
+ struct nfs4_cb_compound_hdr hdr;
+ enum nfsstat4 nfserr;
+ int status;
+
+ status = decode_cb_compound4res(xdr, &hdr);
+ if (unlikely(status))
+ goto out;
+
+ if (cb != NULL) {
+ status = decode_cb_sequence4res(xdr, cb);
+ if (unlikely(status))
+ goto out;
+ }
+
+ status = decode_cb_op_status(xdr, OP_CB_OFFLOAD, &nfserr);
+ if (unlikely(status))
+ goto out;
+ if (unlikely(nfserr != NFS4_OK))
+ status = nfs_cb_stat_to_errno(nfserr);
+
+out:
+ return status;
+}
+
+/*
* RPC procedure tables
*/
#define PROC(proc, call, argtype, restype) \
@@ -568,6 +645,7 @@ out:
static struct rpc_procinfo nfs4_cb_procedures[] = {
PROC(CB_NULL, NULL, cb_null, cb_null),
PROC(CB_RECALL, COMPOUND, cb_recall, cb_recall),
+ PROC(CB_OFFLOAD, COMPOUND, cb_offload, cb_offload),
};

static struct rpc_version nfs_cb_version4 = {
@@ -1036,3 +1114,57 @@ void nfsd4_cb_recall(struct nfs4_delegation *dp)

run_nfsd4_cb(&dp->dl_recall);
}
+
+static void nfsd4_cb_offload_done(struct rpc_task *task, void *calldata)
+{
+ struct nfsd4_callback *cb = calldata;
+ struct nfs4_client *clp = cb->cb_clp;
+ struct rpc_clnt *current_rpc_client = clp->cl_cb_client;
+
+ nfsd4_cb_done(task, calldata);
+
+ if (current_rpc_client != task->tk_client)
+ return;
+
+ if (cb->cb_done)
+ return;
+
+ if (task->tk_status != 0)
+ nfsd4_mark_cb_down(clp, task->tk_status);
+ cb->cb_done = true;
+}
+
+static void nfsd4_cb_offload_release(void *calldata)
+{
+ struct nfsd4_callback *cb = calldata;
+ struct nfsd4_cb_offload *offload = container_of(cb, struct nfsd4_cb_offload, co_callback);
+
+ if (cb->cb_done) {
+ nfs4_free_offload_stateid(offload->co_res.wr_stid);
+ kfree(offload);
+ }
+}
+
+static const struct rpc_call_ops nfsd4_cb_offload_ops = {
+ .rpc_call_prepare = nfsd4_cb_prepare,
+ .rpc_call_done = nfsd4_cb_offload_done,
+ .rpc_release = nfsd4_cb_offload_release,
+};
+
+void nfsd4_cb_offload(struct nfsd4_cb_offload *offload)
+{
+ struct nfsd4_callback *cb = &offload->co_callback;
+
+ cb->cb_op = offload;
+ cb->cb_clp = offload->co_res.wr_stid->sc_client;
+ cb->cb_msg.rpc_proc = &nfs4_cb_procedures[NFSPROC4_CLNT_CB_OFFLOAD];
+ cb->cb_msg.rpc_argp = cb;
+ cb->cb_msg.rpc_resp = cb;
+
+ cb->cb_ops = &nfsd4_cb_offload_ops;
+
+ INIT_LIST_HEAD(&cb->cb_per_client);
+ cb->cb_done = true;
+
+ run_nfsd4_cb(cb);
+}
diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
index bc45ed2..d56f7fe 100644
--- a/fs/nfsd/nfs4proc.c
+++ b/fs/nfsd/nfs4proc.c
@@ -1028,22 +1028,64 @@ nfsd4_write(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
return status;
}

-static __be32
-nfsd4_write_plus_hole(struct file *file, struct nfsd4_write_plus *writeplus,
- struct net *net)
+static void
+nfsd4_offload_work(struct nfsd4_cb_offload *offload,
+ struct nfsd4_compound_state *cstate,
+ struct nfsd42_write_res *write_res,
+ void (*offload_func)(struct work_struct *))
+{
+ offload->co_res.wr_stid = nfs4_alloc_offload_stateid(cstate->session->se_client);
+ memcpy(&offload->co_fh, &cstate->current_fh, sizeof(struct knfsd_fh));
+
+ write_res->wr_stid = offload->co_res.wr_stid;
+ write_res->wr_bytes_written = 0;
+ write_res->wr_stable_how = NFS_UNSTABLE;
+
+ nfsd4_init_callback(&offload->co_callback);
+ INIT_WORK(&offload->co_work, offload_func);
+ schedule_work(&offload->co_work);
+}
+
+static void
+nfsd4_write_plus_hole_async(struct work_struct *work)
{
+ struct nfsd4_cb_offload *offload;
+ struct nfsd4_write_plus *writeplus;
__be32 status;

- status = nfsd4_vfs_fallocate(file, writeplus->wp_allocated,
+ offload = container_of(work, struct nfsd4_cb_offload, co_work);
+ writeplus = &offload->co_u.write_plus;
+
+ status = nfsd4_vfs_fallocate(offload->co_file, writeplus->wp_allocated,
writeplus->wp_offset, writeplus->wp_length);
if (status == nfs_ok) {
writeplus->wp_res.wr_stid = NULL;
writeplus->wp_res.wr_bytes_written = writeplus->wp_length;
writeplus->wp_res.wr_stable_how = NFS_FILE_SYNC;
- gen_boot_verifier(&writeplus->wp_res.wr_verifier, net);
+ gen_boot_verifier(&writeplus->wp_res.wr_verifier, offload->co_net);
}

- return status;
+ fput(offload->co_file);
+ nfsd4_cb_offload(offload);
+}
+
+static __be32
+nfsd4_write_plus_hole(struct file *file, struct nfsd4_write_plus *writeplus,
+ struct nfsd4_compound_state *cstate, struct net *net)
+{
+ struct nfsd4_cb_offload *offload;
+
+ offload = kmalloc(sizeof(struct nfsd4_cb_offload), GFP_KERNEL);
+ if (!offload)
+ return nfserrno(PTR_ERR(offload));
+
+ memcpy(&offload->co_u.write_plus, writeplus, sizeof(struct nfsd4_write_plus));
+ offload->co_file = get_file(file);
+ offload->co_net = net;
+
+ nfsd4_offload_work(offload, cstate, &writeplus->wp_res, nfsd4_write_plus_hole_async);
+
+ return 0;
}

static __be32
@@ -1060,7 +1102,7 @@ nfsd4_write_plus(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
return status;

if (writeplus->wp_data_content == NFS4_CONTENT_HOLE)
- return nfsd4_write_plus_hole(file, writeplus, net);
+ return nfsd4_write_plus_hole(file, writeplus, cstate, net);
return nfserr_union_notsupp;
}

diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index 0874998..6342167 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -364,6 +364,11 @@ static struct nfs4_ol_stateid * nfs4_alloc_stateid(struct nfs4_client *clp)
return openlockstateid(nfs4_alloc_stid(clp, stateid_slab));
}

+struct nfs4_stid *nfs4_alloc_offload_stateid(struct nfs4_client *clp)
+{
+ return nfs4_alloc_stid(clp, stateid_slab);
+}
+
static struct nfs4_delegation *
alloc_init_deleg(struct nfs4_client *clp, struct nfs4_ol_stateid *stp, struct svc_fh *current_fh)
{
@@ -613,6 +618,12 @@ static void free_generic_stateid(struct nfs4_ol_stateid *stp)
kmem_cache_free(stateid_slab, stp);
}

+void nfs4_free_offload_stateid(struct nfs4_stid *stid)
+{
+ remove_stid(stid);
+ kmem_cache_free(stateid_slab, stid);
+}
+
static void release_lock_stateid(struct nfs4_ol_stateid *stp)
{
struct file *file;
diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h
index 424d8f5..a40d29a 100644
--- a/fs/nfsd/state.h
+++ b/fs/nfsd/state.h
@@ -452,6 +452,7 @@ static inline struct nfs4_ol_stateid *openlockstateid(struct nfs4_stid *s)
#define WR_STATE 0x00000020

struct nfsd4_compound_state;
+struct nfsd4_cb_offload;
struct nfsd_net;

extern __be32 nfs4_preprocess_stateid_op(struct net *net,
@@ -472,6 +473,7 @@ extern void nfsd4_probe_callback(struct nfs4_client *clp);
extern void nfsd4_probe_callback_sync(struct nfs4_client *clp);
extern void nfsd4_change_callback(struct nfs4_client *clp, struct nfs4_cb_conn *);
extern void nfsd4_cb_recall(struct nfs4_delegation *dp);
+extern void nfsd4_cb_offload(struct nfsd4_cb_offload *);
extern int nfsd4_create_callback_queue(void);
extern void nfsd4_destroy_callback_queue(void);
extern void nfsd4_shutdown_callback(struct nfs4_client *);
@@ -480,6 +482,8 @@ extern struct nfs4_client_reclaim *nfs4_client_to_reclaim(const char *name,
struct nfsd_net *nn);
extern bool nfs4_has_reclaimed_state(const char *name, struct nfsd_net *nn);
extern void put_client_renew(struct nfs4_client *clp);
+extern struct nfs4_stid *nfs4_alloc_offload_stateid(struct nfs4_client *);
+extern void nfs4_free_offload_stateid(struct nfs4_stid *);

/* nfs4recover operations */
extern int nfsd4_client_tracking_init(struct net *net);
diff --git a/fs/nfsd/xdr4.h b/fs/nfsd/xdr4.h
index ae9debc..24e2c07 100644
--- a/fs/nfsd/xdr4.h
+++ b/fs/nfsd/xdr4.h
@@ -451,6 +451,20 @@ struct nfsd4_write_plus {
struct nfsd42_write_res wp_res;
};

+struct nfsd4_cb_offload {
+ union {
+ struct nfsd4_write_plus write_plus;
+ } co_u;
+
+ struct knfsd_fh co_fh;
+ struct file *co_file;
+ struct net *co_net;
+
+ struct nfsd42_write_res co_res;
+ struct work_struct co_work;
+ struct nfsd4_callback co_callback;
+};
+
struct nfsd4_seek {
/* request */
stateid_t seek_stateid;
diff --git a/fs/nfsd/xdr4cb.h b/fs/nfsd/xdr4cb.h
index c5c55df..75b0ef7 100644
--- a/fs/nfsd/xdr4cb.h
+++ b/fs/nfsd/xdr4cb.h
@@ -21,3 +21,12 @@
#define NFS4_dec_cb_recall_sz (cb_compound_dec_hdr_sz + \
cb_sequence_dec_sz + \
op_dec_sz)
+
+#define NFS4_enc_cb_offload_sz (cb_compound_enc_hdr_sz + \
+ cb_sequence_enc_sz + \
+ 1 + enc_stateid_sz + 2 + 1 + \
+ XDR_QUADLEN(NFS4_VERIFIER_SIZE))
+
+#define NFS4_dec_cb_offload_sz (cb_compound_dec_hdr_sz + \
+ cb_sequence_dec_sz + \
+ op_dec_sz)
--
1.8.4.1


2013-10-28 21:52:22

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [RFC 5/4] NFSD: Add basic CB_OFFLOAD support

On Mon, Oct 28, 2013 at 10:57:27AM -0400, Anna Schumaker wrote:
> This patch adds basic offload support to the WRITE_PLUS operation.
> Since I don't implement OFFLOAD_ABORT, OFFLOAD_REVOKE or OFFLOAD_STATUS
> this patch is NOT spec compliant and should not be applied without
> further work.

Ugh. I don't understand why we need asynchronous modes for all these
operations.

--b.

>
> I'm including this patch to show how I handled offloading to test client
> code.
> ---
> fs/nfsd/nfs4callback.c | 132 +++++++++++++++++++++++++++++++++++++++++++++++++
> fs/nfsd/nfs4proc.c | 56 ++++++++++++++++++---
> fs/nfsd/nfs4state.c | 11 +++++
> fs/nfsd/state.h | 4 ++
> fs/nfsd/xdr4.h | 14 ++++++
> fs/nfsd/xdr4cb.h | 9 ++++
> 6 files changed, 219 insertions(+), 7 deletions(-)
>
> diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
> index 7f05cd1..1f6c150 100644
> --- a/fs/nfsd/nfs4callback.c
> +++ b/fs/nfsd/nfs4callback.c
> @@ -35,6 +35,7 @@
> #include <linux/sunrpc/svc_xprt.h>
> #include <linux/slab.h>
> #include "nfsd.h"
> +#include "xdr4.h"
> #include "state.h"
> #include "netns.h"
> #include "xdr4cb.h"
> @@ -52,6 +53,9 @@ enum {
> NFSPROC4_CLNT_CB_NULL = 0,
> NFSPROC4_CLNT_CB_RECALL,
> NFSPROC4_CLNT_CB_SEQUENCE,
> +
> + /* NFS v4.2 callback */
> + NFSPROC4_CLNT_CB_OFFLOAD,
> };
>
> struct nfs4_cb_compound_hdr {
> @@ -110,6 +114,7 @@ enum nfs_cb_opnum4 {
> OP_CB_WANTS_CANCELLED = 12,
> OP_CB_NOTIFY_LOCK = 13,
> OP_CB_NOTIFY_DEVICEID = 14,
> + OP_CB_OFFLOAD = 15,
> OP_CB_ILLEGAL = 10044
> };
>
> @@ -469,6 +474,31 @@ out_default:
> return nfs_cb_stat_to_errno(nfserr);
> }
>
> +static void encode_cb_offload4args(struct xdr_stream *xdr,
> + const struct nfsd4_cb_offload *offload,
> + struct nfs4_cb_compound_hdr *hdr)
> +{
> + __be32 *p;
> +
> + if (hdr->minorversion < 2)
> + return;
> +
> + encode_nfs_cb_opnum4(xdr, OP_CB_OFFLOAD);
> + encode_nfs_fh4(xdr, &offload->co_fh);
> + encode_stateid4(xdr, &offload->co_res.wr_stid->sc_stateid);
> +
> + p = xdr_reserve_space(xdr, 4);
> + *p = cpu_to_be32(1);
> + encode_stateid4(xdr, &offload->co_res.wr_stid->sc_stateid);
> +
> + p = xdr_reserve_space(xdr, 12 + NFS4_VERIFIER_SIZE);
> + p = xdr_encode_hyper(p, offload->co_res.wr_bytes_written);
> + *p++ = cpu_to_be32(offload->co_res.wr_stable_how);
> + xdr_encode_opaque_fixed(p, offload->co_res.wr_verifier.data, NFS4_VERIFIER_SIZE);
> +
> + hdr->nops++;
> +}
> +
> /*
> * NFSv4.0 and NFSv4.1 XDR encode functions
> *
> @@ -505,6 +535,23 @@ static void nfs4_xdr_enc_cb_recall(struct rpc_rqst *req, struct xdr_stream *xdr,
> encode_cb_nops(&hdr);
> }
>
> +/*
> + * CB_OFFLOAD
> + */
> +static void nfs4_xdr_enc_cb_offload(struct rpc_rqst *req, struct xdr_stream *xdr,
> + const struct nfsd4_callback *cb)
> +{
> + const struct nfsd4_cb_offload *args = cb->cb_op;
> + struct nfs4_cb_compound_hdr hdr = {
> + .ident = cb->cb_clp->cl_cb_ident,
> + .minorversion = cb->cb_minorversion,
> + };
> +
> + encode_cb_compound4args(xdr, &hdr);
> + encode_cb_sequence4args(xdr, cb, &hdr);
> + encode_cb_offload4args(xdr, args, &hdr);
> + encode_cb_nops(&hdr);
> +}
>
> /*
> * NFSv4.0 and NFSv4.1 XDR decode functions
> @@ -552,6 +599,36 @@ out:
> }
>
> /*
> + * CB_OFFLOAD
> + */
> +static int nfs4_xdr_dec_cb_offload(struct rpc_rqst *rqstp, struct xdr_stream *xdr,
> + struct nfsd4_callback *cb)
> +{
> + struct nfs4_cb_compound_hdr hdr;
> + enum nfsstat4 nfserr;
> + int status;
> +
> + status = decode_cb_compound4res(xdr, &hdr);
> + if (unlikely(status))
> + goto out;
> +
> + if (cb != NULL) {
> + status = decode_cb_sequence4res(xdr, cb);
> + if (unlikely(status))
> + goto out;
> + }
> +
> + status = decode_cb_op_status(xdr, OP_CB_OFFLOAD, &nfserr);
> + if (unlikely(status))
> + goto out;
> + if (unlikely(nfserr != NFS4_OK))
> + status = nfs_cb_stat_to_errno(nfserr);
> +
> +out:
> + return status;
> +}
> +
> +/*
> * RPC procedure tables
> */
> #define PROC(proc, call, argtype, restype) \
> @@ -568,6 +645,7 @@ out:
> static struct rpc_procinfo nfs4_cb_procedures[] = {
> PROC(CB_NULL, NULL, cb_null, cb_null),
> PROC(CB_RECALL, COMPOUND, cb_recall, cb_recall),
> + PROC(CB_OFFLOAD, COMPOUND, cb_offload, cb_offload),
> };
>
> static struct rpc_version nfs_cb_version4 = {
> @@ -1036,3 +1114,57 @@ void nfsd4_cb_recall(struct nfs4_delegation *dp)
>
> run_nfsd4_cb(&dp->dl_recall);
> }
> +
> +static void nfsd4_cb_offload_done(struct rpc_task *task, void *calldata)
> +{
> + struct nfsd4_callback *cb = calldata;
> + struct nfs4_client *clp = cb->cb_clp;
> + struct rpc_clnt *current_rpc_client = clp->cl_cb_client;
> +
> + nfsd4_cb_done(task, calldata);
> +
> + if (current_rpc_client != task->tk_client)
> + return;
> +
> + if (cb->cb_done)
> + return;
> +
> + if (task->tk_status != 0)
> + nfsd4_mark_cb_down(clp, task->tk_status);
> + cb->cb_done = true;
> +}
> +
> +static void nfsd4_cb_offload_release(void *calldata)
> +{
> + struct nfsd4_callback *cb = calldata;
> + struct nfsd4_cb_offload *offload = container_of(cb, struct nfsd4_cb_offload, co_callback);
> +
> + if (cb->cb_done) {
> + nfs4_free_offload_stateid(offload->co_res.wr_stid);
> + kfree(offload);
> + }
> +}
> +
> +static const struct rpc_call_ops nfsd4_cb_offload_ops = {
> + .rpc_call_prepare = nfsd4_cb_prepare,
> + .rpc_call_done = nfsd4_cb_offload_done,
> + .rpc_release = nfsd4_cb_offload_release,
> +};
> +
> +void nfsd4_cb_offload(struct nfsd4_cb_offload *offload)
> +{
> + struct nfsd4_callback *cb = &offload->co_callback;
> +
> + cb->cb_op = offload;
> + cb->cb_clp = offload->co_res.wr_stid->sc_client;
> + cb->cb_msg.rpc_proc = &nfs4_cb_procedures[NFSPROC4_CLNT_CB_OFFLOAD];
> + cb->cb_msg.rpc_argp = cb;
> + cb->cb_msg.rpc_resp = cb;
> +
> + cb->cb_ops = &nfsd4_cb_offload_ops;
> +
> + INIT_LIST_HEAD(&cb->cb_per_client);
> + cb->cb_done = true;
> +
> + run_nfsd4_cb(cb);
> +}
> diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
> index bc45ed2..d56f7fe 100644
> --- a/fs/nfsd/nfs4proc.c
> +++ b/fs/nfsd/nfs4proc.c
> @@ -1028,22 +1028,64 @@ nfsd4_write(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
> return status;
> }
>
> -static __be32
> -nfsd4_write_plus_hole(struct file *file, struct nfsd4_write_plus *writeplus,
> - struct net *net)
> +static void
> +nfsd4_offload_work(struct nfsd4_cb_offload *offload,
> + struct nfsd4_compound_state *cstate,
> + struct nfsd42_write_res *write_res,
> + void (*offload_func)(struct work_struct *))
> +{
> + offload->co_res.wr_stid = nfs4_alloc_offload_stateid(cstate->session->se_client);
> + memcpy(&offload->co_fh, &cstate->current_fh, sizeof(struct knfsd_fh));
> +
> + write_res->wr_stid = offload->co_res.wr_stid;
> + write_res->wr_bytes_written = 0;
> + write_res->wr_stable_how = NFS_UNSTABLE;
> +
> + nfsd4_init_callback(&offload->co_callback);
> + INIT_WORK(&offload->co_work, offload_func);
> + schedule_work(&offload->co_work);
> +}
> +
> +static void
> +nfsd4_write_plus_hole_async(struct work_struct *work)
> {
> + struct nfsd4_cb_offload *offload;
> + struct nfsd4_write_plus *writeplus;
> __be32 status;
>
> - status = nfsd4_vfs_fallocate(file, writeplus->wp_allocated,
> + offload = container_of(work, struct nfsd4_cb_offload, co_work);
> + writeplus = &offload->co_u.write_plus;
> +
> + status = nfsd4_vfs_fallocate(offload->co_file, writeplus->wp_allocated,
> writeplus->wp_offset, writeplus->wp_length);
> if (status == nfs_ok) {
> writeplus->wp_res.wr_stid = NULL;
> writeplus->wp_res.wr_bytes_written = writeplus->wp_length;
> writeplus->wp_res.wr_stable_how = NFS_FILE_SYNC;
> - gen_boot_verifier(&writeplus->wp_res.wr_verifier, net);
> + gen_boot_verifier(&writeplus->wp_res.wr_verifier, offload->co_net);
> }
>
> - return status;
> + fput(offload->co_file);
> + nfsd4_cb_offload(offload);
> +}
> +
> +static __be32
> +nfsd4_write_plus_hole(struct file *file, struct nfsd4_write_plus *writeplus,
> + struct nfsd4_compound_state *cstate, struct net *net)
> +{
> + struct nfsd4_cb_offload *offload;
> +
> + offload = kmalloc(sizeof(struct nfsd4_cb_offload), GFP_KERNEL);
> + if (!offload)
> + return nfserrno(PTR_ERR(offload));
> +
> + memcpy(&offload->co_u.write_plus, writeplus, sizeof(struct nfsd4_write_plus));
> + offload->co_file = get_file(file);
> + offload->co_net = net;
> +
> + nfsd4_offload_work(offload, cstate, &writeplus->wp_res, nfsd4_write_plus_hole_async);
> +
> + return 0;
> }
>
> static __be32
> @@ -1060,7 +1102,7 @@ nfsd4_write_plus(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
> return status;
>
> if (writeplus->wp_data_content == NFS4_CONTENT_HOLE)
> - return nfsd4_write_plus_hole(file, writeplus, net);
> + return nfsd4_write_plus_hole(file, writeplus, cstate, net);
> return nfserr_union_notsupp;
> }
>
> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> index 0874998..6342167 100644
> --- a/fs/nfsd/nfs4state.c
> +++ b/fs/nfsd/nfs4state.c
> @@ -364,6 +364,11 @@ static struct nfs4_ol_stateid * nfs4_alloc_stateid(struct nfs4_client *clp)
> return openlockstateid(nfs4_alloc_stid(clp, stateid_slab));
> }
>
> +struct nfs4_stid *nfs4_alloc_offload_stateid(struct nfs4_client *clp)
> +{
> + return nfs4_alloc_stid(clp, stateid_slab);
> +}
> +
> static struct nfs4_delegation *
> alloc_init_deleg(struct nfs4_client *clp, struct nfs4_ol_stateid *stp, struct svc_fh *current_fh)
> {
> @@ -613,6 +618,12 @@ static void free_generic_stateid(struct nfs4_ol_stateid *stp)
> kmem_cache_free(stateid_slab, stp);
> }
>
> +void nfs4_free_offload_stateid(struct nfs4_stid *stid)
> +{
> + remove_stid(stid);
> + kmem_cache_free(stateid_slab, stid);
> +}
> +
> static void release_lock_stateid(struct nfs4_ol_stateid *stp)
> {
> struct file *file;
> diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h
> index 424d8f5..a40d29a 100644
> --- a/fs/nfsd/state.h
> +++ b/fs/nfsd/state.h
> @@ -452,6 +452,7 @@ static inline struct nfs4_ol_stateid *openlockstateid(struct nfs4_stid *s)
> #define WR_STATE 0x00000020
>
> struct nfsd4_compound_state;
> +struct nfsd4_cb_offload;
> struct nfsd_net;
>
> extern __be32 nfs4_preprocess_stateid_op(struct net *net,
> @@ -472,6 +473,7 @@ extern void nfsd4_probe_callback(struct nfs4_client *clp);
> extern void nfsd4_probe_callback_sync(struct nfs4_client *clp);
> extern void nfsd4_change_callback(struct nfs4_client *clp, struct nfs4_cb_conn *);
> extern void nfsd4_cb_recall(struct nfs4_delegation *dp);
> +extern void nfsd4_cb_offload(struct nfsd4_cb_offload *);
> extern int nfsd4_create_callback_queue(void);
> extern void nfsd4_destroy_callback_queue(void);
> extern void nfsd4_shutdown_callback(struct nfs4_client *);
> @@ -480,6 +482,8 @@ extern struct nfs4_client_reclaim *nfs4_client_to_reclaim(const char *name,
> struct nfsd_net *nn);
> extern bool nfs4_has_reclaimed_state(const char *name, struct nfsd_net *nn);
> extern void put_client_renew(struct nfs4_client *clp);
> +extern struct nfs4_stid *nfs4_alloc_offload_stateid(struct nfs4_client *);
> +extern void nfs4_free_offload_stateid(struct nfs4_stid *);
>
> /* nfs4recover operations */
> extern int nfsd4_client_tracking_init(struct net *net);
> diff --git a/fs/nfsd/xdr4.h b/fs/nfsd/xdr4.h
> index ae9debc..24e2c07 100644
> --- a/fs/nfsd/xdr4.h
> +++ b/fs/nfsd/xdr4.h
> @@ -451,6 +451,20 @@ struct nfsd4_write_plus {
> struct nfsd42_write_res wp_res;
> };
>
> +struct nfsd4_cb_offload {
> + union {
> + struct nfsd4_write_plus write_plus;
> + } co_u;
> +
> + struct knfsd_fh co_fh;
> + struct file *co_file;
> + struct net *co_net;
> +
> + struct nfsd42_write_res co_res;
> + struct work_struct co_work;
> + struct nfsd4_callback co_callback;
> +};
> +
> struct nfsd4_seek {
> /* request */
> stateid_t seek_stateid;
> diff --git a/fs/nfsd/xdr4cb.h b/fs/nfsd/xdr4cb.h
> index c5c55df..75b0ef7 100644
> --- a/fs/nfsd/xdr4cb.h
> +++ b/fs/nfsd/xdr4cb.h
> @@ -21,3 +21,12 @@
> #define NFS4_dec_cb_recall_sz (cb_compound_dec_hdr_sz + \
> cb_sequence_dec_sz + \
> op_dec_sz)
> +
> +#define NFS4_enc_cb_offload_sz (cb_compound_enc_hdr_sz + \
> + cb_sequence_enc_sz + \
> + 1 + enc_stateid_sz + 2 + 1 + \
> + XDR_QUADLEN(NFS4_VERIFIER_SIZE))
> +
> +#define NFS4_dec_cb_offload_sz (cb_compound_dec_hdr_sz + \
> + cb_sequence_dec_sz + \
> + op_dec_sz)
> --
> 1.8.4.1
>

2013-10-31 17:13:29

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH 4/4] NFSD: Implement SEEK

On Thu, Oct 31, 2013 at 12:17:30PM -0400, Anna Schumaker wrote:
> On 10/31/2013 12:04 PM, Christoph Hellwig wrote:
> > Btw, I just noticed that current nfs mainline fails xfstests 286 which
> > test SEEK_DATA/SEEK_HOLE. That sounds like a useful test case for your
> > patches.
> >
>
> I read too quickly. I've been using 285 to test against, but I'll see what 286 gets me! Thanks for the tip :)

I actually meant 285. 286 tests SEEK_HOLE as well but passes already
with the dumb default behaviour.

2013-10-29 13:30:11

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH 4/4] NFSD: Implement SEEK

On Tue, Oct 29, 2013 at 06:07:21AM -0700, Christoph Hellwig wrote:
> On Tue, Oct 29, 2013 at 09:00:48AM -0400, Anna Schumaker wrote:
> > From what I can tell, I think the allocated flag will just tell the
> > clients if all the blocks exist on disk or not. Is there a way to have
> > a hole with allocated blocks? Or maybe it's supposed to represent
> > partially allocated blocks? I checked the draft, and it doesn't
> > actually say what they expect allocated to represent...
>
> You can have preallocated space, which behaves like a hole in that
> reads return zeroes. As far as SEEK_DATA/SEEK_HOLE is concerned we
> tend to treat them as holes when we can as that's what the users expect.
>
> Note that at least the pnfs block rfc has a special state representing
> this kind of preallocated space.

http://tools.ietf.org/html/draft-ietf-nfsv4-minorversion2-20#section-5.2

E.g. "Hole: A byte range within a Sparse file that contains regions of
all zeroes. For block-based file systems, this could also be an
unallocated region of the file."

So it sounds like a hole can either be allocated or not, which seems to
agree with what you expect?

I haven't read these parts of the spec yet at all....

--b.

2013-10-28 14:57:35

by Anna Schumaker

[permalink] [raw]
Subject: [PATCH 4/4] NFSD: Implement SEEK

This patch adds in the SEEK operation used by clients doing an llseek on
a file to find either hole or data sections in a file. I'm faking the
"allocated" status right now, since I haven't quite figured out how to
tell if a range is allocated on disk yet.

This patch is missing correctly determining the "allocated" status of
the HOLE / DATA range. I expect I'll need to learn all about how fiemap
works before properly setting these values.

Signed-off-by: Anna Schumaker <[email protected]>
---
fs/nfsd/nfs4proc.c | 44 ++++++++++++++++++++++++++++++++++++++++++++
fs/nfsd/nfs4xdr.c | 40 ++++++++++++++++++++++++++++++++++++++--
fs/nfsd/xdr4.h | 15 +++++++++++++++
include/linux/nfs4.h | 2 +-
4 files changed, 98 insertions(+), 3 deletions(-)

diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
index 3210c6f..bc45ed2 100644
--- a/fs/nfsd/nfs4proc.c
+++ b/fs/nfsd/nfs4proc.c
@@ -1064,6 +1064,45 @@ nfsd4_write_plus(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
return nfserr_union_notsupp;
}

+static __be32
+nfsd4_seek(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
+ struct nfsd4_seek *seek)
+{
+ struct file *file;
+ loff_t pos, end_pos;
+ __be32 status;
+
+ status = nfs4_preprocess_stateid_op(SVC_NET(rqstp), cstate,
+ &seek->seek_stateid,
+ RD_STATE | WR_STATE, &file);
+ if (status != nfs_ok)
+ return status;
+
+ if (seek->seek_whence == NFS4_CONTENT_DATA) {
+ pos = vfs_llseek(file, seek->seek_offset, SEEK_DATA);
+ end_pos = vfs_llseek(file, pos, SEEK_HOLE);
+ seek->seek_allocated = true;
+ } else {
+ pos = vfs_llseek(file, seek->seek_offset, SEEK_HOLE);
+ end_pos = vfs_llseek(file, pos, SEEK_DATA);
+ seek->seek_allocated = false;
+ }
+
+ if (pos < 0)
+ return nfserrno(pos);
+ else if (end_pos == -ENXIO) {
+ seek->seek_length = 0;
+ seek->seek_eof = true;
+ } else if (end_pos < 0)
+ return nfserrno(end_pos);
+ else
+ seek->seek_length = end_pos - pos;
+
+ seek->seek_offset = pos;
+
+ return nfs_ok;
+}
+
/* This routine never returns NFS_OK! If there are no other errors, it
* will return NFSERR_SAME or NFSERR_NOT_SAME depending on whether the
* attributes matched. VERIFY is implemented by mapping NFSERR_SAME
@@ -1885,6 +1924,11 @@ static struct nfsd4_operation nfsd4_ops[] = {
.op_rsize_bop = (nfsd4op_rsize)nfsd4_write_rsize,
.op_get_currentstateid = (stateid_getter)nfsd4_get_writestateid,
},
+
+ [OP_SEEK] = {
+ .op_func = (nfsd4op_func)nfsd4_seek,
+ .op_name = "OP_SEEK",
+ },
};

#ifdef NFSD_DEBUG
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index 1e4e9af..8434740 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -1511,6 +1511,22 @@ nfsd4_decode_write_plus(struct nfsd4_compoundargs *argp,
}

static __be32
+nfsd4_decode_seek(struct nfsd4_compoundargs *argp, struct nfsd4_seek *seek)
+{
+ DECODE_HEAD;
+
+ status = nfsd4_decode_stateid(argp, &seek->seek_stateid);
+ if (status)
+ return status;
+
+ READ_BUF(12);
+ READ64(seek->seek_offset);
+ READ32(seek->seek_whence);
+
+ DECODE_TAIL;
+}
+
+static __be32
nfsd4_decode_noop(struct nfsd4_compoundargs *argp, void *p)
{
return nfs_ok;
@@ -1693,7 +1709,7 @@ static nfsd4_dec nfsd42_dec_ops[] = {
[OP_OFFLOAD_STATUS] = (nfsd4_dec)nfsd4_decode_notsupp,
[OP_WRITE_PLUS] = (nfsd4_dec)nfsd4_decode_write_plus,
[OP_READ_PLUS] = (nfsd4_dec)nfsd4_decode_notsupp,
- [OP_SEEK] = (nfsd4_dec)nfsd4_decode_notsupp,
+ [OP_SEEK] = (nfsd4_dec)nfsd4_decode_seek,
[OP_IO_ADVISE] = (nfsd4_dec)nfsd4_decode_notsupp,
};

@@ -3650,6 +3666,26 @@ nfsd4_encode_write_plus(struct nfsd4_compoundres *resp, __be32 nfserr,
}

static __be32
+nfsd4_encode_seek(struct nfsd4_compoundres *resp, __be32 nfserr,
+ struct nfsd4_seek *seek)
+{
+ __be32 *p;
+
+ if (nfserr)
+ return nfserr;
+
+ RESERVE_SPACE(28);
+ WRITE32(seek->seek_eof);
+ WRITE32(seek->seek_whence);
+ WRITE64(seek->seek_offset);
+ WRITE64(seek->seek_length);
+ WRITE32(seek->seek_allocated);
+ ADJUST_ARGS();
+
+ return nfserr;
+}
+
+static __be32
nfsd4_encode_noop(struct nfsd4_compoundres *resp, __be32 nfserr, void *p)
{
return nfserr;
@@ -3730,7 +3766,7 @@ static nfsd4_enc nfsd4_enc_ops[] = {
[OP_OFFLOAD_STATUS] = (nfsd4_enc)nfsd4_encode_noop,
[OP_WRITE_PLUS] = (nfsd4_enc)nfsd4_encode_write_plus,
[OP_READ_PLUS] = (nfsd4_enc)nfsd4_encode_noop,
- [OP_SEEK] = (nfsd4_enc)nfsd4_encode_noop,
+ [OP_SEEK] = (nfsd4_enc)nfsd4_encode_seek,
[OP_IO_ADVISE] = (nfsd4_enc)nfsd4_encode_noop,
};

diff --git a/fs/nfsd/xdr4.h b/fs/nfsd/xdr4.h
index aaef9ab..ae9debc 100644
--- a/fs/nfsd/xdr4.h
+++ b/fs/nfsd/xdr4.h
@@ -451,6 +451,20 @@ struct nfsd4_write_plus {
struct nfsd42_write_res wp_res;
};

+struct nfsd4_seek {
+ /* request */
+ stateid_t seek_stateid;
+
+ /* Shared between request and response */
+ u64 seek_offset;
+ u32 seek_whence;
+
+ /* response */
+ u32 seek_eof;
+ u64 seek_length;
+ u32 seek_allocated;
+};
+
struct nfsd4_op {
int opnum;
__be32 status;
@@ -499,6 +513,7 @@ struct nfsd4_op {

/* NFSv4.2 */
struct nfsd4_write_plus write_plus;
+ struct nfsd4_seek seek;
} u;
struct nfs4_replay * replay;
};
diff --git a/include/linux/nfs4.h b/include/linux/nfs4.h
index 55ed991..81d6b09 100644
--- a/include/linux/nfs4.h
+++ b/include/linux/nfs4.h
@@ -128,7 +128,7 @@ enum nfs_opnum4 {
Needs to be updated if more operations are defined in future.*/

#define FIRST_NFS4_OP OP_ACCESS
-#define LAST_NFS4_OP OP_WRITE_PLUS
+#define LAST_NFS4_OP OP_SEEK

enum nfsstat4 {
NFS4_OK = 0,
--
1.8.4.1


2013-10-28 21:51:27

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH 4/4] NFSD: Implement SEEK

On Mon, Oct 28, 2013 at 10:57:26AM -0400, Anna Schumaker wrote:
> This patch adds in the SEEK operation used by clients doing an llseek on
> a file to find either hole or data sections in a file. I'm faking the
> "allocated" status right now, since I haven't quite figured out how to
> tell if a range is allocated on disk yet.
>
> This patch is missing correctly determining the "allocated" status of
> the HOLE / DATA range. I expect I'll need to learn all about how fiemap
> works before properly setting these values.
>
> Signed-off-by: Anna Schumaker <[email protected]>
> ---
> fs/nfsd/nfs4proc.c | 44 ++++++++++++++++++++++++++++++++++++++++++++
> fs/nfsd/nfs4xdr.c | 40 ++++++++++++++++++++++++++++++++++++++--
> fs/nfsd/xdr4.h | 15 +++++++++++++++
> include/linux/nfs4.h | 2 +-
> 4 files changed, 98 insertions(+), 3 deletions(-)
>
> diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
> index 3210c6f..bc45ed2 100644
> --- a/fs/nfsd/nfs4proc.c
> +++ b/fs/nfsd/nfs4proc.c
> @@ -1064,6 +1064,45 @@ nfsd4_write_plus(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
> return nfserr_union_notsupp;
> }
>
> +static __be32
> +nfsd4_seek(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
> + struct nfsd4_seek *seek)
> +{
> + struct file *file;
> + loff_t pos, end_pos;
> + __be32 status;
> +
> + status = nfs4_preprocess_stateid_op(SVC_NET(rqstp), cstate,
> + &seek->seek_stateid,
> + RD_STATE | WR_STATE, &file);

Note nfs4_preprocess_stateid_op requires the caller to hold the state
lock. Andyou want to either hold it till you're done using "file" or
take a reference on file before dropping it.

--b.

> + if (status != nfs_ok)
> + return status;
> +
> + if (seek->seek_whence == NFS4_CONTENT_DATA) {
> + pos = vfs_llseek(file, seek->seek_offset, SEEK_DATA);
> + end_pos = vfs_llseek(file, pos, SEEK_HOLE);
> + seek->seek_allocated = true;
> + } else {
> + pos = vfs_llseek(file, seek->seek_offset, SEEK_HOLE);
> + end_pos = vfs_llseek(file, pos, SEEK_DATA);
> + seek->seek_allocated = false;
> + }
> +
> + if (pos < 0)
> + return nfserrno(pos);
> + else if (end_pos == -ENXIO) {
> + seek->seek_length = 0;
> + seek->seek_eof = true;
> + } else if (end_pos < 0)
> + return nfserrno(end_pos);
> + else
> + seek->seek_length = end_pos - pos;
> +
> + seek->seek_offset = pos;
> +
> + return nfs_ok;
> +}
> +
> /* This routine never returns NFS_OK! If there are no other errors, it
> * will return NFSERR_SAME or NFSERR_NOT_SAME depending on whether the
> * attributes matched. VERIFY is implemented by mapping NFSERR_SAME
> @@ -1885,6 +1924,11 @@ static struct nfsd4_operation nfsd4_ops[] = {
> .op_rsize_bop = (nfsd4op_rsize)nfsd4_write_rsize,
> .op_get_currentstateid = (stateid_getter)nfsd4_get_writestateid,
> },
> +
> + [OP_SEEK] = {
> + .op_func = (nfsd4op_func)nfsd4_seek,
> + .op_name = "OP_SEEK",
> + },
> };
>
> #ifdef NFSD_DEBUG
> diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
> index 1e4e9af..8434740 100644
> --- a/fs/nfsd/nfs4xdr.c
> +++ b/fs/nfsd/nfs4xdr.c
> @@ -1511,6 +1511,22 @@ nfsd4_decode_write_plus(struct nfsd4_compoundargs *argp,
> }
>
> static __be32
> +nfsd4_decode_seek(struct nfsd4_compoundargs *argp, struct nfsd4_seek *seek)
> +{
> + DECODE_HEAD;
> +
> + status = nfsd4_decode_stateid(argp, &seek->seek_stateid);
> + if (status)
> + return status;
> +
> + READ_BUF(12);
> + READ64(seek->seek_offset);
> + READ32(seek->seek_whence);
> +
> + DECODE_TAIL;
> +}
> +
> +static __be32
> nfsd4_decode_noop(struct nfsd4_compoundargs *argp, void *p)
> {
> return nfs_ok;
> @@ -1693,7 +1709,7 @@ static nfsd4_dec nfsd42_dec_ops[] = {
> [OP_OFFLOAD_STATUS] = (nfsd4_dec)nfsd4_decode_notsupp,
> [OP_WRITE_PLUS] = (nfsd4_dec)nfsd4_decode_write_plus,
> [OP_READ_PLUS] = (nfsd4_dec)nfsd4_decode_notsupp,
> - [OP_SEEK] = (nfsd4_dec)nfsd4_decode_notsupp,
> + [OP_SEEK] = (nfsd4_dec)nfsd4_decode_seek,
> [OP_IO_ADVISE] = (nfsd4_dec)nfsd4_decode_notsupp,
> };
>
> @@ -3650,6 +3666,26 @@ nfsd4_encode_write_plus(struct nfsd4_compoundres *resp, __be32 nfserr,
> }
>
> static __be32
> +nfsd4_encode_seek(struct nfsd4_compoundres *resp, __be32 nfserr,
> + struct nfsd4_seek *seek)
> +{
> + __be32 *p;
> +
> + if (nfserr)
> + return nfserr;
> +
> + RESERVE_SPACE(28);
> + WRITE32(seek->seek_eof);
> + WRITE32(seek->seek_whence);
> + WRITE64(seek->seek_offset);
> + WRITE64(seek->seek_length);
> + WRITE32(seek->seek_allocated);
> + ADJUST_ARGS();
> +
> + return nfserr;
> +}
> +
> +static __be32
> nfsd4_encode_noop(struct nfsd4_compoundres *resp, __be32 nfserr, void *p)
> {
> return nfserr;
> @@ -3730,7 +3766,7 @@ static nfsd4_enc nfsd4_enc_ops[] = {
> [OP_OFFLOAD_STATUS] = (nfsd4_enc)nfsd4_encode_noop,
> [OP_WRITE_PLUS] = (nfsd4_enc)nfsd4_encode_write_plus,
> [OP_READ_PLUS] = (nfsd4_enc)nfsd4_encode_noop,
> - [OP_SEEK] = (nfsd4_enc)nfsd4_encode_noop,
> + [OP_SEEK] = (nfsd4_enc)nfsd4_encode_seek,
> [OP_IO_ADVISE] = (nfsd4_enc)nfsd4_encode_noop,
> };
>
> diff --git a/fs/nfsd/xdr4.h b/fs/nfsd/xdr4.h
> index aaef9ab..ae9debc 100644
> --- a/fs/nfsd/xdr4.h
> +++ b/fs/nfsd/xdr4.h
> @@ -451,6 +451,20 @@ struct nfsd4_write_plus {
> struct nfsd42_write_res wp_res;
> };
>
> +struct nfsd4_seek {
> + /* request */
> + stateid_t seek_stateid;
> +
> + /* Shared between request and response */
> + u64 seek_offset;
> + u32 seek_whence;
> +
> + /* response */
> + u32 seek_eof;
> + u64 seek_length;
> + u32 seek_allocated;
> +};
> +
> struct nfsd4_op {
> int opnum;
> __be32 status;
> @@ -499,6 +513,7 @@ struct nfsd4_op {
>
> /* NFSv4.2 */
> struct nfsd4_write_plus write_plus;
> + struct nfsd4_seek seek;
> } u;
> struct nfs4_replay * replay;
> };
> diff --git a/include/linux/nfs4.h b/include/linux/nfs4.h
> index 55ed991..81d6b09 100644
> --- a/include/linux/nfs4.h
> +++ b/include/linux/nfs4.h
> @@ -128,7 +128,7 @@ enum nfs_opnum4 {
> Needs to be updated if more operations are defined in future.*/
>
> #define FIRST_NFS4_OP OP_ACCESS
> -#define LAST_NFS4_OP OP_WRITE_PLUS
> +#define LAST_NFS4_OP OP_SEEK
>
> enum nfsstat4 {
> NFS4_OK = 0,
> --
> 1.8.4.1
>

2013-10-29 13:11:22

by Anna Schumaker

[permalink] [raw]
Subject: Re: [PATCH 3/4] NFSD: Add WRITE_PLUS support for hole punches

On Tue 29 Oct 2013 09:06:49 AM EDT, J. Bruce Fields wrote:
> On Tue, Oct 29, 2013 at 08:50:56AM -0400, Anna Schumaker wrote:
>> On Mon 28 Oct 2013 05:40:30 PM EDT, J. Bruce Fields wrote:
>>> On Mon, Oct 28, 2013 at 10:57:25AM -0400, Anna Schumaker wrote:
>>>> diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
>>>> index 419572f..3210c6f 100644
>>>> --- a/fs/nfsd/nfs4proc.c
>>>> +++ b/fs/nfsd/nfs4proc.c
>>>> @@ -1028,6 +1028,42 @@ nfsd4_write(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
>>>> return status;
>>>> }
>>>>
>>>> +static __be32
>>>> +nfsd4_write_plus_hole(struct file *file, struct nfsd4_write_plus *writeplus,
>>>> + struct net *net)
>>>> +{
>>>> + __be32 status;
>>>> +
>>>> + status = nfsd4_vfs_fallocate(file, writeplus->wp_allocated,
>>>> + writeplus->wp_offset, writeplus->wp_length);
>>>> + if (status == nfs_ok) {
>>>> + writeplus->wp_res.wr_stid = NULL;
>>>> + writeplus->wp_res.wr_bytes_written = writeplus->wp_length;
>>>> + writeplus->wp_res.wr_stable_how = NFS_FILE_SYNC;
>>>
>>> Do we need to sync?
>>
>> I did the sync in nfsd4_vfs_fallocate (below), but I can move it if
>> that would make more sense.
>
> What I meant was--why are we doing a sync at all, instead of returning
> NFS_UNSTABLE and making the client commit?
>
> Honest question, I haven't thought about which is best.

No reason other than it seemed easier for my initial implementation.

>
> --b.



2013-10-29 13:28:21

by Anna Schumaker

[permalink] [raw]
Subject: Re: [PATCH 3/4] NFSD: Add WRITE_PLUS support for hole punches

On Tue 29 Oct 2013 09:23:19 AM EDT, Myklebust, Trond wrote:
>
> On Oct 29, 2013, at 9:06 AM, J. Bruce Fields <[email protected]> wrote:
>
>> On Tue, Oct 29, 2013 at 08:50:56AM -0400, Anna Schumaker wrote:
>>> On Mon 28 Oct 2013 05:40:30 PM EDT, J. Bruce Fields wrote:
>>>> On Mon, Oct 28, 2013 at 10:57:25AM -0400, Anna Schumaker wrote:
>>>>> diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
>>>>> index 419572f..3210c6f 100644
>>>>> --- a/fs/nfsd/nfs4proc.c
>>>>> +++ b/fs/nfsd/nfs4proc.c
>>>>> @@ -1028,6 +1028,42 @@ nfsd4_write(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
>>>>> return status;
>>>>> }
>>>>>
>>>>> +static __be32
>>>>> +nfsd4_write_plus_hole(struct file *file, struct nfsd4_write_plus *writeplus,
>>>>> + struct net *net)
>>>>> +{
>>>>> + __be32 status;
>>>>> +
>>>>> + status = nfsd4_vfs_fallocate(file, writeplus->wp_allocated,
>>>>> + writeplus->wp_offset, writeplus->wp_length);
>>>>> + if (status == nfs_ok) {
>>>>> + writeplus->wp_res.wr_stid = NULL;
>>>>> + writeplus->wp_res.wr_bytes_written = writeplus->wp_length;
>>>>> + writeplus->wp_res.wr_stable_how = NFS_FILE_SYNC;
>>>>
>>>> Do we need to sync?
>>>
>>> I did the sync in nfsd4_vfs_fallocate (below), but I can move it if
>>> that would make more sense.
>>
>> What I meant was--why are we doing a sync at all, instead of returning
>> NFS_UNSTABLE and making the client commit?
>
> What if the client specifies FILE_SYNC? The spec doesn't allow the server to return UNSTABLE in that situation.

I don't have that check in there either, but I do think it would be
useful if the client were prepared to handle an UNSTABLE reply from the
server...

>
> Trond



2013-10-28 14:57:33

by Anna Schumaker

[permalink] [raw]
Subject: [PATCH 2/4] NFSD: Create nfs v4.2 decode ops

I'm doing this in a separate patch to keep from putting in a lot of
extra code when I go to add operations to the server for real.

Signed-off-by: Anna Schumaker <[email protected]>
---
fs/nfsd/nfs4xdr.c | 85 +++++++++++++++++++++++++++++++++++++++++++++++++++-
include/linux/nfs4.h | 11 +++++++
2 files changed, 95 insertions(+), 1 deletion(-)

diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index d9454fe..60f5a1f 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -1599,6 +1599,78 @@ static nfsd4_dec nfsd41_dec_ops[] = {
[OP_RECLAIM_COMPLETE] = (nfsd4_dec)nfsd4_decode_reclaim_complete,
};

+static nfsd4_dec nfsd42_dec_ops[] = {
+ [OP_ACCESS] = (nfsd4_dec)nfsd4_decode_access,
+ [OP_CLOSE] = (nfsd4_dec)nfsd4_decode_close,
+ [OP_COMMIT] = (nfsd4_dec)nfsd4_decode_commit,
+ [OP_CREATE] = (nfsd4_dec)nfsd4_decode_create,
+ [OP_DELEGPURGE] = (nfsd4_dec)nfsd4_decode_notsupp,
+ [OP_DELEGRETURN] = (nfsd4_dec)nfsd4_decode_delegreturn,
+ [OP_GETATTR] = (nfsd4_dec)nfsd4_decode_getattr,
+ [OP_GETFH] = (nfsd4_dec)nfsd4_decode_noop,
+ [OP_LINK] = (nfsd4_dec)nfsd4_decode_link,
+ [OP_LOCK] = (nfsd4_dec)nfsd4_decode_lock,
+ [OP_LOCKT] = (nfsd4_dec)nfsd4_decode_lockt,
+ [OP_LOCKU] = (nfsd4_dec)nfsd4_decode_locku,
+ [OP_LOOKUP] = (nfsd4_dec)nfsd4_decode_lookup,
+ [OP_LOOKUPP] = (nfsd4_dec)nfsd4_decode_noop,
+ [OP_NVERIFY] = (nfsd4_dec)nfsd4_decode_verify,
+ [OP_OPEN] = (nfsd4_dec)nfsd4_decode_open,
+ [OP_OPENATTR] = (nfsd4_dec)nfsd4_decode_notsupp,
+ [OP_OPEN_CONFIRM] = (nfsd4_dec)nfsd4_decode_notsupp,
+ [OP_OPEN_DOWNGRADE] = (nfsd4_dec)nfsd4_decode_open_downgrade,
+ [OP_PUTFH] = (nfsd4_dec)nfsd4_decode_putfh,
+ [OP_PUTPUBFH] = (nfsd4_dec)nfsd4_decode_notsupp,
+ [OP_PUTROOTFH] = (nfsd4_dec)nfsd4_decode_noop,
+ [OP_READ] = (nfsd4_dec)nfsd4_decode_read,
+ [OP_READDIR] = (nfsd4_dec)nfsd4_decode_readdir,
+ [OP_READLINK] = (nfsd4_dec)nfsd4_decode_noop,
+ [OP_REMOVE] = (nfsd4_dec)nfsd4_decode_remove,
+ [OP_RENAME] = (nfsd4_dec)nfsd4_decode_rename,
+ [OP_RENEW] = (nfsd4_dec)nfsd4_decode_notsupp,
+ [OP_RESTOREFH] = (nfsd4_dec)nfsd4_decode_noop,
+ [OP_SAVEFH] = (nfsd4_dec)nfsd4_decode_noop,
+ [OP_SECINFO] = (nfsd4_dec)nfsd4_decode_secinfo,
+ [OP_SETATTR] = (nfsd4_dec)nfsd4_decode_setattr,
+ [OP_SETCLIENTID] = (nfsd4_dec)nfsd4_decode_notsupp,
+ [OP_SETCLIENTID_CONFIRM]= (nfsd4_dec)nfsd4_decode_notsupp,
+ [OP_VERIFY] = (nfsd4_dec)nfsd4_decode_verify,
+ [OP_WRITE] = (nfsd4_dec)nfsd4_decode_write,
+ [OP_RELEASE_LOCKOWNER] = (nfsd4_dec)nfsd4_decode_notsupp,
+
+ /* new operations for NFSv4.1 */
+ [OP_BACKCHANNEL_CTL] = (nfsd4_dec)nfsd4_decode_backchannel_ctl,
+ [OP_BIND_CONN_TO_SESSION]= (nfsd4_dec)nfsd4_decode_bind_conn_to_session,
+ [OP_EXCHANGE_ID] = (nfsd4_dec)nfsd4_decode_exchange_id,
+ [OP_CREATE_SESSION] = (nfsd4_dec)nfsd4_decode_create_session,
+ [OP_DESTROY_SESSION] = (nfsd4_dec)nfsd4_decode_destroy_session,
+ [OP_FREE_STATEID] = (nfsd4_dec)nfsd4_decode_free_stateid,
+ [OP_GET_DIR_DELEGATION] = (nfsd4_dec)nfsd4_decode_notsupp,
+ [OP_GETDEVICEINFO] = (nfsd4_dec)nfsd4_decode_notsupp,
+ [OP_GETDEVICELIST] = (nfsd4_dec)nfsd4_decode_notsupp,
+ [OP_LAYOUTCOMMIT] = (nfsd4_dec)nfsd4_decode_notsupp,
+ [OP_LAYOUTGET] = (nfsd4_dec)nfsd4_decode_notsupp,
+ [OP_LAYOUTRETURN] = (nfsd4_dec)nfsd4_decode_notsupp,
+ [OP_SECINFO_NO_NAME] = (nfsd4_dec)nfsd4_decode_secinfo_no_name,
+ [OP_SEQUENCE] = (nfsd4_dec)nfsd4_decode_sequence,
+ [OP_SET_SSV] = (nfsd4_dec)nfsd4_decode_notsupp,
+ [OP_TEST_STATEID] = (nfsd4_dec)nfsd4_decode_test_stateid,
+ [OP_WANT_DELEGATION] = (nfsd4_dec)nfsd4_decode_notsupp,
+ [OP_DESTROY_CLIENTID] = (nfsd4_dec)nfsd4_decode_destroy_clientid,
+ [OP_RECLAIM_COMPLETE] = (nfsd4_dec)nfsd4_decode_reclaim_complete,
+
+ /* new operations for NFSv4.2 */
+ [OP_COPY] = (nfsd4_dec)nfsd4_decode_notsupp,
+ [OP_OFFLOAD_ABORT] = (nfsd4_dec)nfsd4_decode_notsupp,
+ [OP_COPY_NOTIFY] = (nfsd4_dec)nfsd4_decode_notsupp,
+ [OP_OFFLOAD_REVOKE] = (nfsd4_dec)nfsd4_decode_notsupp,
+ [OP_OFFLOAD_STATUS] = (nfsd4_dec)nfsd4_decode_notsupp,
+ [OP_WRITE_PLUS] = (nfsd4_dec)nfsd4_decode_notsupp,
+ [OP_READ_PLUS] = (nfsd4_dec)nfsd4_decode_notsupp,
+ [OP_SEEK] = (nfsd4_dec)nfsd4_decode_notsupp,
+ [OP_IO_ADVISE] = (nfsd4_dec)nfsd4_decode_notsupp,
+};
+
struct nfsd4_minorversion_ops {
nfsd4_dec *decoders;
int nops;
@@ -1607,7 +1679,7 @@ struct nfsd4_minorversion_ops {
static struct nfsd4_minorversion_ops nfsd4_minorversion[] = {
[0] = { nfsd4_dec_ops, ARRAY_SIZE(nfsd4_dec_ops) },
[1] = { nfsd41_dec_ops, ARRAY_SIZE(nfsd41_dec_ops) },
- [2] = { nfsd41_dec_ops, ARRAY_SIZE(nfsd41_dec_ops) },
+ [2] = { nfsd42_dec_ops, ARRAY_SIZE(nfsd42_dec_ops) },
};

static __be32
@@ -3591,6 +3663,17 @@ static nfsd4_enc nfsd4_enc_ops[] = {
[OP_WANT_DELEGATION] = (nfsd4_enc)nfsd4_encode_noop,
[OP_DESTROY_CLIENTID] = (nfsd4_enc)nfsd4_encode_noop,
[OP_RECLAIM_COMPLETE] = (nfsd4_enc)nfsd4_encode_noop,
+
+ /* NFSv4.2 operations */
+ [OP_COPY] = (nfsd4_enc)nfsd4_encode_noop,
+ [OP_OFFLOAD_ABORT] = (nfsd4_enc)nfsd4_encode_noop,
+ [OP_COPY_NOTIFY] = (nfsd4_enc)nfsd4_encode_noop,
+ [OP_OFFLOAD_REVOKE] = (nfsd4_enc)nfsd4_encode_noop,
+ [OP_OFFLOAD_STATUS] = (nfsd4_enc)nfsd4_encode_noop,
+ [OP_WRITE_PLUS] = (nfsd4_enc)nfsd4_encode_noop,
+ [OP_READ_PLUS] = (nfsd4_enc)nfsd4_encode_noop,
+ [OP_SEEK] = (nfsd4_enc)nfsd4_encode_noop,
+ [OP_IO_ADVISE] = (nfsd4_enc)nfsd4_encode_noop,
};

/*
diff --git a/include/linux/nfs4.h b/include/linux/nfs4.h
index a92e065..2bc5217 100644
--- a/include/linux/nfs4.h
+++ b/include/linux/nfs4.h
@@ -110,6 +110,17 @@ enum nfs_opnum4 {
OP_DESTROY_CLIENTID = 57,
OP_RECLAIM_COMPLETE = 58,

+ /* nfs42 */
+ OP_COPY = 59,
+ OP_OFFLOAD_ABORT = 60,
+ OP_COPY_NOTIFY = 61,
+ OP_OFFLOAD_REVOKE = 62,
+ OP_OFFLOAD_STATUS = 63,
+ OP_WRITE_PLUS = 64,
+ OP_READ_PLUS = 65,
+ OP_SEEK = 66,
+ OP_IO_ADVISE = 67,
+
OP_ILLEGAL = 10044,
};

--
1.8.4.1


2013-10-29 13:00:51

by Anna Schumaker

[permalink] [raw]
Subject: Re: [PATCH 4/4] NFSD: Implement SEEK

On Tue 29 Oct 2013 03:35:58 AM EDT, Christoph Hellwig wrote:
> On Mon, Oct 28, 2013 at 10:57:26AM -0400, Anna Schumaker wrote:
>> This patch adds in the SEEK operation used by clients doing an llseek on
>> a file to find either hole or data sections in a file. I'm faking the
>> "allocated" status right now, since I haven't quite figured out how to
>> tell if a range is allocated on disk yet.
>>
>> This patch is missing correctly determining the "allocated" status of
>> the HOLE / DATA range. I expect I'll need to learn all about how fiemap
>> works before properly setting these values.
>
> What is the definition of allocated in this context? Specificly how
> does it different from meaning of allocated as used by SEEK_DATA?

From what I can tell, I think the allocated flag will just tell the
clients if all the blocks exist on disk or not. Is there a way to have
a hole with allocated blocks? Or maybe it's supposed to represent
partially allocated blocks? I checked the draft, and it doesn't
actually say what they expect allocated to represent...

>
> Going out to fiemap is something we should absolutely avoid.
>



2013-10-28 21:00:00

by Myklebust, Trond

[permalink] [raw]
Subject: RE: [PATCH 2/4] NFSD: Create nfs v4.2 decode ops

> -----Original Message-----
> From: [email protected] [mailto:linux-nfs-
> [email protected]] On Behalf Of J. Bruce Fields
> Sent: Monday, October 28, 2013 4:54 PM
> To: Schumaker, Bryan
> Cc: [email protected]
> Subject: Re: [PATCH 2/4] NFSD: Create nfs v4.2 decode ops
>
> On Mon, Oct 28, 2013 at 10:57:24AM -0400, Anna Schumaker wrote:
> > I'm doing this in a separate patch to keep from putting in a lot of
> > extra code when I go to add operations to the server for real.
>
> Makes sense.
>
> But: now we're duplicating the list of 4.0 op decoders 3 times and the
> 4.1 ops twice. We'll never need different decoders for different
> minorversions (worst case we can test for the minorversion in the decoder if
> necessary).
>
> I wonder if there's a better way to organize this.... Maybe something more
> like a single array with
>
> [OP_SETCLIENTID] = {
> .op_decode = (nfsd4_dec)nfsd4_decode_access,
> .op_unsupported_since_version = 1,
> }
> ...
> [OP_EXCHANGE_ID] = {
> .op_decode = (nfsd4_dec)nfsd4_decode_exchange_id,
> .op_first_supported_in_version = 1,
> }
>
> ?

Is that really necessary? Why not just have a single array and have nfsd4_decode_clientid itself check the minor version?

Cheers
Trond

2013-10-29 13:34:13

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH 3/4] NFSD: Add WRITE_PLUS support for hole punches

On Tue, Oct 29, 2013 at 08:49:06AM -0400, Anna Schumaker wrote:
> On Mon 28 Oct 2013 05:40:30 PM EDT, J. Bruce Fields wrote:
> > On Mon, Oct 28, 2013 at 10:57:25AM -0400, Anna Schumaker wrote:
> >> This patch only implements DATA_CONTENT_HOLE support, but I tried to
> >> write it so other arms of the discriminated union can be added easily
> >> later.
> >
> > And there's an NFS4ERR_UNION_NOTSUPP to handle that case, OK. (Though
> > the spec language is a little ambiguous:
> >
> > If the client asks for a hole and the server does not support
> > that arm of the discriminated union, but does support one or
> > more additional arms, it can signal to the client that it
> > supports the operation, but not the arm with
> > NFS4ERR_UNION_NOTSUPP.
> >
> > So it's unclear whether we can return this in the case the client is
> > *not* asking for a hole. Are we sure the NFS4_CONTENT_DATA case is
> > optional? The language in 5.3 ("Instead a client should utlize
> > READ_PLUS and WRITE_PLUS") also suggests the CONTENT_DATA case might be
> > meant to be mandatory.)
>
> Fair enough. I wasn't planning on implementing the NFS4_CONTENT_DATA
> arm on the client right now, but I might be able to hack something
> together to test this on the server!

OK. Either way, could you also submit a patch to the spec to make it
explicit whether CONTENT_DATA is mandatory or not?

--b.

>
> >
> >> This patch is missing support for decoding multiple requests on the same
> >> file. The way it's written now only the last range provided by the
> >> client will be written to.
> >
> > But that doesn't seem to be a limitation anticipated by the spec, so I
> > guess we need to fix that.
> >
> > (Why is there an array, anyway? Wouldn't it work just as well to send
> > multiple RPC's, or a single RPC with multiple WRITE_PLUS ops?)
>
> Yeah, I'll fix that up. The Linux client will send requests one at a
> time, so that's what I tested against.
>
> >
> >>
> >> Signed-off-by: Anna Schumaker <[email protected]>
> >> ---
> >> fs/nfsd/nfs4proc.c | 45 ++++++++++++++++++++++++++++++++++++++
> >> fs/nfsd/nfs4xdr.c | 62 ++++++++++++++++++++++++++++++++++++++++++++++++++--
> >> fs/nfsd/vfs.c | 14 ++++++++++++
> >> fs/nfsd/vfs.h | 1 +
> >> fs/nfsd/xdr4.h | 24 ++++++++++++++++++++
> >> fs/open.c | 1 +
> >> include/linux/nfs4.h | 8 ++++++-
> >> 7 files changed, 152 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
> >> index 419572f..3210c6f 100644
> >> --- a/fs/nfsd/nfs4proc.c
> >> +++ b/fs/nfsd/nfs4proc.c
> >> @@ -1028,6 +1028,42 @@ nfsd4_write(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
> >> return status;
> >> }
> >>
> >> +static __be32
> >> +nfsd4_write_plus_hole(struct file *file, struct nfsd4_write_plus *writeplus,
> >> + struct net *net)
> >> +{
> >> + __be32 status;
> >> +
> >> + status = nfsd4_vfs_fallocate(file, writeplus->wp_allocated,
> >> + writeplus->wp_offset, writeplus->wp_length);
> >> + if (status == nfs_ok) {
> >> + writeplus->wp_res.wr_stid = NULL;
> >> + writeplus->wp_res.wr_bytes_written = writeplus->wp_length;
> >> + writeplus->wp_res.wr_stable_how = NFS_FILE_SYNC;
> >
> > Do we need to sync?
> >
> >> + gen_boot_verifier(&writeplus->wp_res.wr_verifier, net);
> >> + }
> >> +
> >> + return status;
> >
> > Nit, but I like the usual idiom better in most cases:
> >
> > if (status)
> > return status;
> > normal case here...
> > return 0;
> >
> > --b.
> >
> >> +}
> >> +
> >> +static __be32
> >> +nfsd4_write_plus(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
> >> + struct nfsd4_write_plus *writeplus)
> >> +{
> >> + struct file *file;
> >> + __be32 status;
> >> + struct net *net = SVC_NET(rqstp);
> >> +
> >> + status = nfs4_preprocess_stateid_op(net, cstate, &writeplus->wp_stateid,
> >> + WR_STATE, &file);
> >> + if (status)
> >> + return status;
> >> +
> >> + if (writeplus->wp_data_content == NFS4_CONTENT_HOLE)
> >> + return nfsd4_write_plus_hole(file, writeplus, net);
> >> + return nfserr_union_notsupp;
> >> +}
> >> +
> >> /* This routine never returns NFS_OK! If there are no other errors, it
> >> * will return NFSERR_SAME or NFSERR_NOT_SAME depending on whether the
> >> * attributes matched. VERIFY is implemented by mapping NFSERR_SAME
> >> @@ -1840,6 +1876,15 @@ static struct nfsd4_operation nfsd4_ops[] = {
> >> .op_get_currentstateid = (stateid_getter)nfsd4_get_freestateid,
> >> .op_rsize_bop = (nfsd4op_rsize)nfsd4_only_status_rsize,
> >> },
> >> +
> >> + /* NFSv4.2 operations */
> >> + [OP_WRITE_PLUS] = {
> >> + .op_func = (nfsd4op_func)nfsd4_write_plus,
> >> + .op_flags = OP_MODIFIES_SOMETHING | OP_CACHEME,
> >> + .op_name = "OP_WRITE_PLUS",
> >> + .op_rsize_bop = (nfsd4op_rsize)nfsd4_write_rsize,
> >> + .op_get_currentstateid = (stateid_getter)nfsd4_get_writestateid,
> >> + },
> >> };
> >>
> >> #ifdef NFSD_DEBUG
> >> diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
> >> index 60f5a1f..1e4e9af 100644
> >> --- a/fs/nfsd/nfs4xdr.c
> >> +++ b/fs/nfsd/nfs4xdr.c
> >> @@ -1485,6 +1485,32 @@ static __be32 nfsd4_decode_reclaim_complete(struct nfsd4_compoundargs *argp, str
> >> }
> >>
> >> static __be32
> >> +nfsd4_decode_write_plus(struct nfsd4_compoundargs *argp,
> >> + struct nfsd4_write_plus *writeplus)
> >> +{
> >> + DECODE_HEAD;
> >> + unsigned int i;
> >> +
> >> + status = nfsd4_decode_stateid(argp, &writeplus->wp_stateid);
> >> + if (status)
> >> + return status;
> >> +
> >> + READ_BUF(8);
> >> + READ32(writeplus->wp_stable_how);
> >> + READ32(writeplus->wp_data_length);
> >> +
> >> + for (i = 0; i < writeplus->wp_data_length; i++) {
> >> + READ_BUF(24);
> >> + READ32(writeplus->wp_data_content);
> >> + READ64(writeplus->wp_offset);
> >> + READ64(writeplus->wp_length);
> >> + READ32(writeplus->wp_allocated);
> >> + }
> >> +
> >> + DECODE_TAIL;
> >> +}
> >> +
> >> +static __be32
> >> nfsd4_decode_noop(struct nfsd4_compoundargs *argp, void *p)
> >> {
> >> return nfs_ok;
> >> @@ -1665,7 +1691,7 @@ static nfsd4_dec nfsd42_dec_ops[] = {
> >> [OP_COPY_NOTIFY] = (nfsd4_dec)nfsd4_decode_notsupp,
> >> [OP_OFFLOAD_REVOKE] = (nfsd4_dec)nfsd4_decode_notsupp,
> >> [OP_OFFLOAD_STATUS] = (nfsd4_dec)nfsd4_decode_notsupp,
> >> - [OP_WRITE_PLUS] = (nfsd4_dec)nfsd4_decode_notsupp,
> >> + [OP_WRITE_PLUS] = (nfsd4_dec)nfsd4_decode_write_plus,
> >> [OP_READ_PLUS] = (nfsd4_dec)nfsd4_decode_notsupp,
> >> [OP_SEEK] = (nfsd4_dec)nfsd4_decode_notsupp,
> >> [OP_IO_ADVISE] = (nfsd4_dec)nfsd4_decode_notsupp,
> >> @@ -3591,6 +3617,38 @@ nfsd4_encode_test_stateid(struct nfsd4_compoundres *resp, __be32 nfserr,
> >> return nfserr;
> >> }
> >>
> >> +static void
> >> +nfsd42_encode_write_res(struct nfsd4_compoundres *resp, struct nfsd42_write_res *write)
> >> +{
> >> + __be32 *p;
> >> +
> >> + RESERVE_SPACE(4);
> >> +
> >> + if (write->wr_stid == NULL) {
> >> + WRITE32(0);
> >> + ADJUST_ARGS();
> >> + } else {
> >> + WRITE32(1);
> >> + ADJUST_ARGS();
> >> + nfsd4_encode_stateid(resp, &write->wr_stid->sc_stateid);
> >> + }
> >> +
> >> + RESERVE_SPACE(12 + NFS4_VERIFIER_SIZE);
> >> + WRITE64(write->wr_bytes_written);
> >> + WRITE32(write->wr_stable_how);
> >> + WRITEMEM(write->wr_verifier.data, NFS4_VERIFIER_SIZE);
> >> + ADJUST_ARGS();
> >> +}
> >> +
> >> +static __be32
> >> +nfsd4_encode_write_plus(struct nfsd4_compoundres *resp, __be32 nfserr,
> >> + struct nfsd4_write_plus *writeplus)
> >> +{
> >> + if (!nfserr)
> >> + nfsd42_encode_write_res(resp, &writeplus->wp_res);
> >> + return nfserr;
> >> +}
> >> +
> >> static __be32
> >> nfsd4_encode_noop(struct nfsd4_compoundres *resp, __be32 nfserr, void *p)
> >> {
> >> @@ -3670,7 +3728,7 @@ static nfsd4_enc nfsd4_enc_ops[] = {
> >> [OP_COPY_NOTIFY] = (nfsd4_enc)nfsd4_encode_noop,
> >> [OP_OFFLOAD_REVOKE] = (nfsd4_enc)nfsd4_encode_noop,
> >> [OP_OFFLOAD_STATUS] = (nfsd4_enc)nfsd4_encode_noop,
> >> - [OP_WRITE_PLUS] = (nfsd4_enc)nfsd4_encode_noop,
> >> + [OP_WRITE_PLUS] = (nfsd4_enc)nfsd4_encode_write_plus,
> >> [OP_READ_PLUS] = (nfsd4_enc)nfsd4_encode_noop,
> >> [OP_SEEK] = (nfsd4_enc)nfsd4_encode_noop,
> >> [OP_IO_ADVISE] = (nfsd4_enc)nfsd4_encode_noop,
> >> diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
> >> index c827acb..7fec087 100644
> >> --- a/fs/nfsd/vfs.c
> >> +++ b/fs/nfsd/vfs.c
> >> @@ -16,6 +16,7 @@
> >> #include <linux/fs.h>
> >> #include <linux/file.h>
> >> #include <linux/splice.h>
> >> +#include <linux/falloc.h>
> >> #include <linux/fcntl.h>
> >> #include <linux/namei.h>
> >> #include <linux/delay.h>
> >> @@ -649,6 +650,19 @@ __be32 nfsd4_set_nfs4_label(struct svc_rqst *rqstp, struct svc_fh *fhp,
> >> }
> >> #endif
> >>
> >> +__be32 nfsd4_vfs_fallocate(struct file *file, bool allocated, loff_t offset, loff_t len)
> >> +{
> >> + int error, mode = 0;
> >> +
> >> + if (allocated == false)
> >> + mode = FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE;
> >> +
> >> + error = do_fallocate(file, mode, offset, len);
> >> + if (error == 0)
> >> + error = vfs_fsync_range(file, offset, offset + len, 0);
> >> + return nfserrno(error);
> >> +}
> >> +
> >> #endif /* defined(CONFIG_NFSD_V4) */
> >>
> >> #ifdef CONFIG_NFSD_V3
> >> diff --git a/fs/nfsd/vfs.h b/fs/nfsd/vfs.h
> >> index a4be2e3..187eb95 100644
> >> --- a/fs/nfsd/vfs.h
> >> +++ b/fs/nfsd/vfs.h
> >> @@ -57,6 +57,7 @@ __be32 nfsd4_set_nfs4_acl(struct svc_rqst *, struct svc_fh *,
> >> int nfsd4_get_nfs4_acl(struct svc_rqst *, struct dentry *, struct nfs4_acl **);
> >> __be32 nfsd4_set_nfs4_label(struct svc_rqst *, struct svc_fh *,
> >> struct xdr_netobj *);
> >> +__be32 nfsd4_vfs_fallocate(struct file *, bool, loff_t, loff_t);
> >> #endif /* CONFIG_NFSD_V4 */
> >> __be32 nfsd_create(struct svc_rqst *, struct svc_fh *,
> >> char *name, int len, struct iattr *attrs,
> >> diff --git a/fs/nfsd/xdr4.h b/fs/nfsd/xdr4.h
> >> index b3ed644..aaef9ab 100644
> >> --- a/fs/nfsd/xdr4.h
> >> +++ b/fs/nfsd/xdr4.h
> >> @@ -430,6 +430,27 @@ struct nfsd4_reclaim_complete {
> >> u32 rca_one_fs;
> >> };
> >>
> >> +struct nfsd42_write_res {
> >> + struct nfs4_stid *wr_stid;
> >> + u64 wr_bytes_written;
> >> + u32 wr_stable_how;
> >> + nfs4_verifier wr_verifier;
> >> +};
> >> +
> >> +struct nfsd4_write_plus {
> >> + /* request */
> >> + stateid_t wp_stateid;
> >> + u32 wp_stable_how;
> >> + u32 wp_data_length;
> >> + u32 wp_data_content;
> >> + u64 wp_offset;
> >> + u64 wp_length;
> >> + u32 wp_allocated;
> >> +
> >> + /* response */
> >> + struct nfsd42_write_res wp_res;
> >> +};
> >> +
> >> struct nfsd4_op {
> >> int opnum;
> >> __be32 status;
> >> @@ -475,6 +496,9 @@ struct nfsd4_op {
> >> struct nfsd4_reclaim_complete reclaim_complete;
> >> struct nfsd4_test_stateid test_stateid;
> >> struct nfsd4_free_stateid free_stateid;
> >> +
> >> + /* NFSv4.2 */
> >> + struct nfsd4_write_plus write_plus;
> >> } u;
> >> struct nfs4_replay * replay;
> >> };
> >> diff --git a/fs/open.c b/fs/open.c
> >> index d420331..09db2d5 100644
> >> --- a/fs/open.c
> >> +++ b/fs/open.c
> >> @@ -278,6 +278,7 @@ int do_fallocate(struct file *file, int mode, loff_t offset, loff_t len)
> >> sb_end_write(inode->i_sb);
> >> return ret;
> >> }
> >> +EXPORT_SYMBOL_GPL(do_fallocate);
> >>
> >> SYSCALL_DEFINE4(fallocate, int, fd, int, mode, loff_t, offset, loff_t, len)
> >> {
> >> diff --git a/include/linux/nfs4.h b/include/linux/nfs4.h
> >> index 2bc5217..55ed991 100644
> >> --- a/include/linux/nfs4.h
> >> +++ b/include/linux/nfs4.h
> >> @@ -128,7 +128,7 @@ enum nfs_opnum4 {
> >> Needs to be updated if more operations are defined in future.*/
> >>
> >> #define FIRST_NFS4_OP OP_ACCESS
> >> -#define LAST_NFS4_OP OP_RECLAIM_COMPLETE
> >> +#define LAST_NFS4_OP OP_WRITE_PLUS
> >>
> >> enum nfsstat4 {
> >> NFS4_OK = 0,
> >> @@ -552,4 +552,10 @@ struct nfs4_deviceid {
> >> char data[NFS4_DEVICEID4_SIZE];
> >> };
> >>
> >> +enum data_content4 {
> >> + NFS4_CONTENT_DATA = 0,
> >> + NFS4_CONTENT_APP_DATA_HOLE = 1,
> >> + NFS4_CONTENT_HOLE = 2,
> >> +};
> >> +
> >> #endif
> >> --
> >> 1.8.4.1
> >>
>
>

2013-10-29 15:11:56

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [RFC 5/4] NFSD: Add basic CB_OFFLOAD support

On Tue, Oct 29, 2013 at 09:53:21AM -0400, J. Bruce Fields wrote:
> Your complaint as I understand it is that quick and long-running
> operations were combined into one one operation when they would have
> better been separated. I agree.

Not just quick and long running but semantically different -
preallocating space and writing zeroes over every byte are fundamentally
different operations.

> But I also don't understand why the long-running operations need an
> async option. Maybe they do, I just don't understand why.

I didn't even bother questioning that, but it's not a bad question
either.


2013-11-02 14:37:40

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH 4/4] NFSD: Implement SEEK

On Sat, Nov 02, 2013 at 06:48:37AM -0700, Christoph Hellwig wrote:
> On Tue, Oct 29, 2013 at 09:30:06AM -0400, J. Bruce Fields wrote:
> > I haven't read these parts of the spec yet at all....
>
> Btw, is there a way do to something like a git blame for the spec? I'd
> really love to confront people personally with some of the odd bits they
> came up with.

It *is* in git, so you can literally run git blame:

https://github.com/loghyr/NFSv4.2

But it'll probably credit almost everything to Tom and he's typically
just the messenger....

Just direct gripes to [email protected]. They also take patches.

--b.

2013-11-04 16:41:20

by Anna Schumaker

[permalink] [raw]
Subject: Re: [PATCH 3/4] NFSD: Add WRITE_PLUS support for hole punches

On 11/02/2013 09:52 AM, Christoph Hellwig wrote:
>> +__be32 nfsd4_vfs_fallocate(struct file *file, bool allocated, loff_t offset, loff_t len)
>> +{
>> + int error, mode = 0;
>> +
>> + if (allocated == false)
>> + mode = FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE;
>> +
>> + error = do_fallocate(file, mode, offset, len);
>> + if (error == 0)
>> + error = vfs_fsync_range(file, offset, offset + len, 0);
>
> What are you trying to do with the fsync_range here? If it's just to
> make sure the metadata operation is stable on disk please use commit_metadata()
> from fs/nfsd/vfs.c. As there's no data involved here I can't really see
> what else it would be for.

My understanding is that if I call fallocate with a mode of "0" it will write a range of 0s to the file. I'll move the fsync() call here and only do it if the client doesn't ask for NFS_UNSTABLE.

>
>
> Also the allocated flag doesn't make any sense for people not taking the
> same drugs as the spec authors, please add a comment what it means.

Sure

>
> Btw, how did anyone come up with the name WRITE PLUS for something that
> doesn't actually involve any writes?
>

Write plus does write more than just holes, I just didn't implement that part of the spec. I'm implementing the NFS4_CONTENT_HOLE arm, but NFS4_CONTENT_DATA is intended to eventually replace the WRITE operation.


2013-11-04 17:05:10

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH 4/4] NFSD: Implement SEEK

On Mon, Nov 04, 2013 at 11:46:58AM -0500, J. Bruce Fields wrote:
> Imagine it was me, if it helps.
>
> The important thing is that we understand what should be fixed.

The important part is how someone got the idea for the pattern writing
and the weird ADBs from and why they believe they should overload a
single command for all of them. And apparently reqular data writes as
well, even if I haven't found that part of the spec yet.


2013-11-04 18:57:05

by Anna Schumaker

[permalink] [raw]
Subject: Re: [PATCH 3/4] NFSD: Add WRITE_PLUS support for hole punches

On 11/04/2013 01:53 PM, Christoph Hellwig wrote:
> On Mon, Nov 04, 2013 at 12:23:27PM -0500, Anna Schumaker wrote:
>>> state" that will return zeroes when read, even if those zeroes didn't
>>> make it to disk.
>>
>> And that's all done through metadata? Doing the commit_metadata() call makes a bit more sense now.
>
> Yes. The only filesystem in tree that actually seems to write zeroes
> is gfs2, but even then it doesn't do so through the usual mechanisms.
>
>>> Also what does the data arm buy us over good old WRITE?
>>>
>>
>> I've been working off of draft #21, the most recent commit is 2b3ab1740b1ea843faa59566fb4213a42d8c724a from Aug 19. The eventual goal is to phase out WRITE and replace it with WRITE_PLUS, but that won't happen until at least 4.3. I don't know why it's being done this way instead of just adding an FALLOCATE operation.
>
> Where can I find draft 21? The newest document in the git repo I was
> pointed to earlier is draft-ietf-nfsv4-minorversion2-13.txt, and the
> newest I could find on tools.ietf.org is draft 20.

You may need to run `make` in the NFSv4.2 directory.

>
> I still don't understand why anyone would phase out WRITE in favour of
> something that doesn't actually add any value for the write case.
>
> I'll try to take it up directly with the working group, but my post to
> the list yesterday in your SEEK thread didn't seem to have made it
> trough. I also tried to research how they came up with this idiotic
> design, but the mailing list archives tell very little. Could it be
> that most of these decisions are actually made in a smokey backroom and
> not on the list?
>


2013-11-02 13:52:39

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH 3/4] NFSD: Add WRITE_PLUS support for hole punches

> +__be32 nfsd4_vfs_fallocate(struct file *file, bool allocated, loff_t offset, loff_t len)
> +{
> + int error, mode = 0;
> +
> + if (allocated == false)
> + mode = FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE;
> +
> + error = do_fallocate(file, mode, offset, len);
> + if (error == 0)
> + error = vfs_fsync_range(file, offset, offset + len, 0);

What are you trying to do with the fsync_range here? If it's just to
make sure the metadata operation is stable on disk please use commit_metadata()
from fs/nfsd/vfs.c. As there's no data involved here I can't really see
what else it would be for.


Also the allocated flag doesn't make any sense for people not taking the
same drugs as the spec authors, please add a comment what it means.

Btw, how did anyone come up with the name WRITE PLUS for something that
doesn't actually involve any writes?


2013-11-02 13:48:40

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH 4/4] NFSD: Implement SEEK

On Tue, Oct 29, 2013 at 09:30:06AM -0400, J. Bruce Fields wrote:
> I haven't read these parts of the spec yet at all....

Btw, is there a way do to something like a git blame for the spec? I'd
really love to confront people personally with some of the odd bits they
came up with.


2013-11-05 09:40:16

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH 3/4] NFSD: Add WRITE_PLUS support for hole punches

> +{
> + int error, mode = 0;
> +
> + if (allocated == false)
> + mode = FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE;
> +
> + error = do_fallocate(file, mode, offset, len);
> + if (error == 0)
> + error = vfs_fsync_range(file, offset, offset + len, 0);
> + return nfserrno(error);

Same problem with the falloc vs WRITE_PLUS mismatch here. For XFS you
could make the allocated case all XFS_IOC_ZERO_RANGE, but the proposed
VFS equivalent never seemed to have made it in.

Also no need for the FALLOC_FL_KEEP_SIZE in the hole punch case, it is
ignored.


2013-11-05 08:23:08

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH 4/4] NFSD: Implement SEEK

On Tue, Nov 05, 2013 at 02:07:15AM +0000, Haynes, Tom wrote:
> We did not want many new operators and also wanted the operators to
> be extensible. With this approach, you can define a new arm of the
> discriminated union, not have to implement it, and not burn an
> operator.

That seems very much like a non-argument. If saving operators was a
good argument the NFS operations should be

MULTIPLEX1 with many sub opcodes, followed by MULTIPLEX2 once it fills
up.

> Some of the history is captured here:
>
> http://www.ietf.org/mail-archive/web/nfsv4/current/msg11235.html

That mail seems to draw the wrong conclusion that a hole punching
or a preallocation are equivalent to a server side copy from /dev/zero.

Treating a pattern write as a server side copy is fine and I'd fully
support that. Hole punching and preallocation on the other hand are
primarily metadata operations that reserve or free space. They only
happen to zero out the range as zero is the most convenient well known
pattern to avoid stale data exposure.

> http://www.ietf.org/mail-archive/web/nfsv4/current/msg11470.html

I think Chuck's reply summarizes very well why a pattern initialization
should not be mixed with an actual data write.

> http://www.ietf.org/proceedings/84/slides/slides-84-nfsv4-1.pdf (slide 6)

That side seems to inadvertently sum up a lot of what's wrong with merging
hole punching and preallocations into some form of super write:

- COMMIT doesn't really apply to pure metadata operations like a hole
punch and preallocation, so fitting it into a WRITE that expects
COMMIT causes all kinds of problems (as we saw in the thread about
Annas implementation).
- requiring the server to be able to handle offloads for these
operations does not make any sense, because they are again very
quick metadata operation, and not long running operation like
a pattern initialization on the server.

Note that the arbitrary pattern initialization over multiple blocks is
a very different operation from a space allocation even if the latter
happens to also zero the range as a side effect.


>
> It doesn't capture the intent of NFS4ERR_UNION_NOTSUPP in
> this decision.
>
> 11.1.1.1. NFS4ERR_UNION_NOTSUPP (Error Code 10090)
>
> One of the arguments to the operation is a discriminated union and
> while the server supports the given operation, it does not support
> the selected arm of the discriminated union. For an example, see
> READ_PLUS (Section 14.10).

Btw, there is an odd use of this error in 14.7.3.:

"WRITE_PLUS has to support all of the errors which are returned by
WRITE plus NFS4ERR_UNION_NOTSUPP. If the client asks for a hole and
the server does not support that arm of the discriminated union, but
does support one or more additional arms, it can signal to the
client that it supports the operation, but not the arm with
NFS4ERR_UNION_NOTSUPP."

This does not specicly writes but appears to assume hole punching
is the only optional arm. On the other hand to me it appears the only
interesting arm, with the data arm buying nothing over WRITE in 4.2 and
thus being entirely superflous, and ADHs being a complicated map to Unix
filesystems on the backend.

2013-11-04 17:22:47

by Chuck Lever

[permalink] [raw]
Subject: Re: [PATCH 4/4] NFSD: Implement SEEK


On Nov 4, 2013, at 9:05 AM, Christoph Hellwig <[email protected]> wrote:

> On Mon, Nov 04, 2013 at 11:46:58AM -0500, J. Bruce Fields wrote:
>> Imagine it was me, if it helps.
>>
>> The important thing is that we understand what should be fixed.
>
> The important part is how someone got the idea for the pattern writing
> and the weird ADBs from

The original proposal is

http://tools.ietf.org/html/draft-eisler-nfsv4-enterprise-apps-01

and has been somewhat modified in its expression in NFSv4.2.

> and why they believe they should overload a
> single command for all of them.

I believe that file initialization is closer in semantics to COPY_OFFLOAD than it is to WRITE, and I find it awkward to use the WRITE_PLUS operation for initialization. But I could never make a strong technical argument why they should not be joined, other than "Clutter!".

> And apparently reqular data writes as
> well, even if I haven't found that part of the spec yet.

http://www.ietf.org/id/draft-ietf-nfsv4-minorversion2-20.txt

is the latest version I can find.

The usual process is to explain what needs to be fixed (ie, make a problem statement) as Bruce said. That is often accompanied by an alternate design proposal that addresses your issues.

It is appropriate to address your comments to the group, and not to a single person. As in most online communities, ad hominem and confrontation are not appreciated.

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com




2013-11-04 20:03:58

by Haynes, Tom

[permalink] [raw]
Subject: Re: [PATCH 3/4] NFSD: Add WRITE_PLUS support for hole punches


On Nov 4, 2013, at 11:50 AM, Chuck Lever <[email protected]> wrote:

>
> On Nov 4, 2013, at 11:19 AM, Christoph Hellwig <[email protected]> wrote:
>
>> On Mon, Nov 04, 2013 at 11:16:13AM -0800, Chuck Lever wrote:
>>>> I still don't understand why anyone would phase out WRITE in favour of
>>>> something that doesn't actually add any value for the write case.
>>>
>>> Protocol extensibility. WRITE_PLUS adds a discriminated union of data types that can be extended easily to include initialization patterns, integrity metadata, and other things like holes. It's entirely another question as to whether any particular extension is tasteful.
>>
>> Maybe I'm missing something, but what does multiplexing entirely
>> different operation actually buy you? It is different operation after
>> all.


Do you have an issue with READ_PLUS?

I.e., are you okay with that operation because the answer is not known beforehand?


>
> I think a successful argument could be made that initialization or hole punching may not belong in a WRITE_PLUS operation. For example, using a COMMIT for a long-running server-performed file initialization doesn't make sense to me. The infrastructure we introduced for COPY_OFFLOAD seems better suited for file initialization.

WRITE_PLUS does utilize COPY_OFFLOAD for such actions.


>
>> It's not like adding new operations to NFS is all that hard.
>
> Actually, it is harder than you think. For various reasons, it takes five or more years to create a new NFS protocol specification, something we hope to address.
>
>>> The mailing list is not the only place where they are discussed. We also use conference calls, and there are face-to-face meetings too. Not all of that content is distilled into a public record. Very much like Linux kernel development.
>>
>> Linux development strives very hard documenting rationales, something I
>> haven't found on the NFS list or in the repository.


There was a strong consensus in the biweekly calls to merge all of the operations together into WRITE_PLUS.
I think we also voted on this at one of the meetings.

But in any event, the document is still open and I'm glad Anna is funneling implementation experience into
it.




2013-11-04 19:19:09

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH 3/4] NFSD: Add WRITE_PLUS support for hole punches

On Mon, Nov 04, 2013 at 11:16:13AM -0800, Chuck Lever wrote:
> > I still don't understand why anyone would phase out WRITE in favour of
> > something that doesn't actually add any value for the write case.
>
> Protocol extensibility. WRITE_PLUS adds a discriminated union of data types that can be extended easily to include initialization patterns, integrity metadata, and other things like holes. It's entirely another question as to whether any particular extension is tasteful.

Maybe I'm missing something, but what does multiplexing entirely
different operation actually buy you? It is different operation after
all. It's not like adding new operations to NFS is all that hard.

> The mailing list is not the only place where they are discussed. We also use conference calls, and there are face-to-face meetings too. Not all of that content is distilled into a public record. Very much like Linux kernel development.

Linux development strives very hard documenting rationales, something I
haven't found on the NFS list or in the repository.


2013-11-05 01:03:34

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH 4/4] NFSD: Implement SEEK

On Mon, Nov 04, 2013 at 01:56:39PM -0800, Thomas Haynes wrote:
> Nah, most likely me, no imagination needed.
>
> We were also looking at bringing larger WRITEs during 4.3 and I wanted
> to have WRITE_PLUS already in place for that.

What exactly does a WRITE PLUS with various non-related overloads in 4.2
buy you for introducing large writes in 4.3?

2013-11-04 17:23:29

by Anna Schumaker

[permalink] [raw]
Subject: Re: [PATCH 3/4] NFSD: Add WRITE_PLUS support for hole punches

On 11/04/2013 12:03 PM, Christoph Hellwig wrote:
> On Mon, Nov 04, 2013 at 11:41:07AM -0500, Anna Schumaker wrote:
>>> what else it would be for.
>>
>> My understanding is that if I call fallocate with a mode of "0" it will write a range of 0s to the file.
>
> While writing zeroes to the file would be a correct implementation,
> that's now how the operation is defined or usually implemented. The
> defintion is that the blocks in the range will be allocated, and reads
> from it will return zero (and the file size will be updated if needed).
>
> The usual way to implement it is to created extents in an "unwritten
> state" that will return zeroes when read, even if those zeroes didn't
> make it to disk.

And that's all done through metadata? Doing the commit_metadata() call makes a bit more sense now.

>
>> I'll move the fsync() call here and only do it if the client doesn't ask for NFS_UNSTABLE.
>
> As said the right way to handle it would be to use commit_metadata, as
> the cache flushes a fsync does would be unessecary and too expensive for
> many workloads typically using fallocate.

Okay, sorry for me not understanding earlier!

>
>>>
>>> Btw, how did anyone come up with the name WRITE PLUS for something that
>>> doesn't actually involve any writes?
>>>
>>
>> Write plus does write more than just holes, I just didn't implement that part of the spec. I'm implementing the NFS4_CONTENT_HOLE arm, but NFS4_CONTENT_DATA is intended to eventually replace the WRITE operation.
>
> Where is WRITE PLUS defined? I the various
> draft-ietf-nfsv4-minorversion2-*.txt I can only find what appears to be
> the predecessor, but that doesn't have a data arm.
>
> Also what does the data arm buy us over good old WRITE?
>

I've been working off of draft #21, the most recent commit is 2b3ab1740b1ea843faa59566fb4213a42d8c724a from Aug 19. The eventual goal is to phase out WRITE and replace it with WRITE_PLUS, but that won't happen until at least 4.3. I don't know why it's being done this way instead of just adding an FALLOCATE operation.


2013-11-04 16:47:07

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH 4/4] NFSD: Implement SEEK

On Sat, Nov 02, 2013 at 07:41:07AM -0700, Christoph Hellwig wrote:
> On Sat, Nov 02, 2013 at 10:37:29AM -0400, J. Bruce Fields wrote:
> > On Sat, Nov 02, 2013 at 06:48:37AM -0700, Christoph Hellwig wrote:
> > > On Tue, Oct 29, 2013 at 09:30:06AM -0400, J. Bruce Fields wrote:
> > > > I haven't read these parts of the spec yet at all....
> > >
> > > Btw, is there a way do to something like a git blame for the spec? I'd
> > > really love to confront people personally with some of the odd bits they
> > > came up with.
> >
> > It *is* in git, so you can literally run git blame:
> >
> > https://github.com/loghyr/NFSv4.2
> >
> > But it'll probably credit almost everything to Tom and he's typically
> > just the messenger....
>
> Indeed, a quick try attributes everything to him. No good way to figure
> out who is to blame for the whole WRITE PLUS idiocy.

Imagine it was me, if it helps.

The important thing is that we understand what should be fixed.

--b.

2013-11-02 14:44:42

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH 3/4] NFSD: Add WRITE_PLUS support for hole punches

On Sat, Nov 02, 2013 at 06:54:09AM -0700, Christoph Hellwig wrote:
> On Tue, Oct 29, 2013 at 09:06:49AM -0400, J. Bruce Fields wrote:
> > What I meant was--why are we doing a sync at all, instead of returning
> > NFS_UNSTABLE and making the client commit?
>
> Did NFSv4.2 introduce a concept of unstable metadata operations?

No, but I think WRITE_PLUS does have a stable/unstable bit so I think we
could choose not to do the sync if that'd make sense.

--b.

2013-11-04 18:53:12

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH 3/4] NFSD: Add WRITE_PLUS support for hole punches

On Mon, Nov 04, 2013 at 12:23:27PM -0500, Anna Schumaker wrote:
> > state" that will return zeroes when read, even if those zeroes didn't
> > make it to disk.
>
> And that's all done through metadata? Doing the commit_metadata() call makes a bit more sense now.

Yes. The only filesystem in tree that actually seems to write zeroes
is gfs2, but even then it doesn't do so through the usual mechanisms.

> > Also what does the data arm buy us over good old WRITE?
> >
>
> I've been working off of draft #21, the most recent commit is 2b3ab1740b1ea843faa59566fb4213a42d8c724a from Aug 19. The eventual goal is to phase out WRITE and replace it with WRITE_PLUS, but that won't happen until at least 4.3. I don't know why it's being done this way instead of just adding an FALLOCATE operation.

Where can I find draft 21? The newest document in the git repo I was
pointed to earlier is draft-ietf-nfsv4-minorversion2-13.txt, and the
newest I could find on tools.ietf.org is draft 20.

I still don't understand why anyone would phase out WRITE in favour of
something that doesn't actually add any value for the write case.

I'll try to take it up directly with the working group, but my post to
the list yesterday in your SEEK thread didn't seem to have made it
trough. I also tried to research how they came up with this idiotic
design, but the mailing list archives tell very little. Could it be
that most of these decisions are actually made in a smokey backroom and
not on the list?

2013-11-02 15:02:37

by Myklebust, Trond

[permalink] [raw]
Subject: Re: [PATCH 3/4] NFSD: Add WRITE_PLUS support for hole punches


On Nov 2, 2013, at 10:51, Christoph Hellwig <[email protected]> wrote:

> On Sat, Nov 02, 2013 at 10:44:31AM -0400, J. Bruce Fields wrote:
>> On Sat, Nov 02, 2013 at 06:54:09AM -0700, Christoph Hellwig wrote:
>>> On Tue, Oct 29, 2013 at 09:06:49AM -0400, J. Bruce Fields wrote:
>>>> What I meant was--why are we doing a sync at all, instead of returning
>>>> NFS_UNSTABLE and making the client commit?
>>>
>>> Did NFSv4.2 introduce a concept of unstable metadata operations?
>>
>> No, but I think WRITE_PLUS does have a stable/unstable bit so I think we
>> could choose not to do the sync if that'd make sense.
>
> Both operations are idempotent, so supporting it shouldn't be a major
> obstactle. But suddenly having some metadata operations that can be
> unstable seems like a major wart in the spec.

COMMIT has always had a metadata part to it, though. It guarantees stability of the ctime/mime and change attributes in addition to the file size and data.

IOW: it really is more akin to fsync() than to fdatasync().

Cheers
Trond

2013-11-04 19:16:44

by Chuck Lever

[permalink] [raw]
Subject: Re: [PATCH 3/4] NFSD: Add WRITE_PLUS support for hole punches


On Nov 4, 2013, at 10:53 AM, Christoph Hellwig <[email protected]> wrote:

> On Mon, Nov 04, 2013 at 12:23:27PM -0500, Anna Schumaker wrote:
>>> state" that will return zeroes when read, even if those zeroes didn't
>>> make it to disk.
>>
>> And that's all done through metadata? Doing the commit_metadata() call makes a bit more sense now.
>
> Yes. The only filesystem in tree that actually seems to write zeroes
> is gfs2, but even then it doesn't do so through the usual mechanisms.
>
>>> Also what does the data arm buy us over good old WRITE?
>>>
>>
>> I've been working off of draft #21, the most recent commit is 2b3ab1740b1ea843faa59566fb4213a42d8c724a from Aug 19. The eventual goal is to phase out WRITE and replace it with WRITE_PLUS, but that won't happen until at least 4.3. I don't know why it's being done this way instead of just adding an FALLOCATE operation.
>
> Where can I find draft 21? The newest document in the git repo I was
> pointed to earlier is draft-ietf-nfsv4-minorversion2-13.txt, and the
> newest I could find on tools.ietf.org is draft 20.
>
> I still don't understand why anyone would phase out WRITE in favour of
> something that doesn't actually add any value for the write case.

Protocol extensibility. WRITE_PLUS adds a discriminated union of data types that can be extended easily to include initialization patterns, integrity metadata, and other things like holes. It's entirely another question as to whether any particular extension is tasteful.

> I'll try to take it up directly with the working group, but my post to
> the list yesterday in your SEEK thread didn't seem to have made it
> trough. I also tried to research how they came up with this idiotic
> design, but the mailing list archives tell very little. Could it be
> that most of these decisions are actually made in a smokey backroom and
> not on the list?

The documents themselves count as public proposals. That's why they are referred to as "Requests For Comments".

The mailing list is not the only place where they are discussed. We also use conference calls, and there are face-to-face meetings too. Not all of that content is distilled into a public record. Very much like Linux kernel development.

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com




2013-11-04 20:13:23

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH 3/4] NFSD: Add WRITE_PLUS support for hole punches

On Mon, Nov 04, 2013 at 08:03:58PM +0000, Haynes, Tom wrote:
> Do you have an issue with READ_PLUS?
>
> I.e., are you okay with that operation because the answer is not known beforehand?

The hole vs data part of it looks reasonable. The whole ADH complex
looks a little fishy to me, but I will have to dig deeper into before
writing down a detailed critism.

But one very obvious question already comes up: what data is supposed
to be returned when a client that doesn't suport 4.2 at all are doesn't
support ADHs reads a file that contains them. From a quick glance it
seems a mapping from ADHs to flat blocks is possible, but such a mapping
should be specified in the standard.

> There was a strong consensus in the biweekly calls to merge all of the operations together into WRITE_PLUS.

Do you remember any arguments for it? Also any rational for adding the
plain old data write to it?


2013-11-02 14:41:08

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH 4/4] NFSD: Implement SEEK

On Sat, Nov 02, 2013 at 10:37:29AM -0400, J. Bruce Fields wrote:
> On Sat, Nov 02, 2013 at 06:48:37AM -0700, Christoph Hellwig wrote:
> > On Tue, Oct 29, 2013 at 09:30:06AM -0400, J. Bruce Fields wrote:
> > > I haven't read these parts of the spec yet at all....
> >
> > Btw, is there a way do to something like a git blame for the spec? I'd
> > really love to confront people personally with some of the odd bits they
> > came up with.
>
> It *is* in git, so you can literally run git blame:
>
> https://github.com/loghyr/NFSv4.2
>
> But it'll probably credit almost everything to Tom and he's typically
> just the messenger....

Indeed, a quick try attributes everything to him. No good way to figure
out who is to blame for the whole WRITE PLUS idiocy.


2013-11-04 21:56:47

by Thomas Haynes

[permalink] [raw]
Subject: Re: [PATCH 4/4] NFSD: Implement SEEK


On Nov 4, 2013, at 8:46 AM, J. Bruce Fields <[email protected]> wrote:

> On Sat, Nov 02, 2013 at 07:41:07AM -0700, Christoph Hellwig wrote:
>> On Sat, Nov 02, 2013 at 10:37:29AM -0400, J. Bruce Fields wrote:
>>> On Sat, Nov 02, 2013 at 06:48:37AM -0700, Christoph Hellwig wrote:
>>>> On Tue, Oct 29, 2013 at 09:30:06AM -0400, J. Bruce Fields wrote:
>>>>> I haven't read these parts of the spec yet at all....
>>>>
>>>> Btw, is there a way do to something like a git blame for the spec? I'd
>>>> really love to confront people personally with some of the odd bits they
>>>> came up with.
>>>
>>> It *is* in git, so you can literally run git blame:
>>>
>>> https://github.com/loghyr/NFSv4.2
>>>
>>> But it'll probably credit almost everything to Tom and he's typically
>>> just the messenger....
>>
>> Indeed, a quick try attributes everything to him. No good way to figure
>> out who is to blame for the whole WRITE PLUS idiocy.
>
> Imagine it was me, if it helps.
>

Nah, most likely me, no imagination needed.

We were also looking at bringing larger WRITEs during 4.3 and I wanted
to have WRITE_PLUS already in place for that.


> The important thing is that we understand what should be fixed.
>
> --b.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html


2013-11-04 17:03:46

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH 3/4] NFSD: Add WRITE_PLUS support for hole punches

On Mon, Nov 04, 2013 at 11:41:07AM -0500, Anna Schumaker wrote:
> > what else it would be for.
>
> My understanding is that if I call fallocate with a mode of "0" it will write a range of 0s to the file.

While writing zeroes to the file would be a correct implementation,
that's now how the operation is defined or usually implemented. The
defintion is that the blocks in the range will be allocated, and reads
from it will return zero (and the file size will be updated if needed).

The usual way to implement it is to created extents in an "unwritten
state" that will return zeroes when read, even if those zeroes didn't
make it to disk.

>I'll move the fsync() call here and only do it if the client doesn't ask for NFS_UNSTABLE.

As said the right way to handle it would be to use commit_metadata, as
the cache flushes a fsync does would be unessecary and too expensive for
many workloads typically using fallocate.

> >
> > Btw, how did anyone come up with the name WRITE PLUS for something that
> > doesn't actually involve any writes?
> >
>
> Write plus does write more than just holes, I just didn't implement that part of the spec. I'm implementing the NFS4_CONTENT_HOLE arm, but NFS4_CONTENT_DATA is intended to eventually replace the WRITE operation.

Where is WRITE PLUS defined? I the various
draft-ietf-nfsv4-minorversion2-*.txt I can only find what appears to be
the predecessor, but that doesn't have a data arm.

Also what does the data arm buy us over good old WRITE?


2013-11-02 14:51:08

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH 3/4] NFSD: Add WRITE_PLUS support for hole punches

On Sat, Nov 02, 2013 at 10:44:31AM -0400, J. Bruce Fields wrote:
> On Sat, Nov 02, 2013 at 06:54:09AM -0700, Christoph Hellwig wrote:
> > On Tue, Oct 29, 2013 at 09:06:49AM -0400, J. Bruce Fields wrote:
> > > What I meant was--why are we doing a sync at all, instead of returning
> > > NFS_UNSTABLE and making the client commit?
> >
> > Did NFSv4.2 introduce a concept of unstable metadata operations?
>
> No, but I think WRITE_PLUS does have a stable/unstable bit so I think we
> could choose not to do the sync if that'd make sense.

Both operations are idempotent, so supporting it shouldn't be a major
obstactle. But suddenly having some metadata operations that can be
unstable seems like a major wart in the spec.


2013-11-05 15:11:58

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH 3/4] NFSD: Add WRITE_PLUS support for hole punches

On Tue, Nov 05, 2013 at 09:23:59AM -0500, Anna Schumaker wrote:
> It's not ignored in do_fallocate() (fs/open.c) where it returns -EOPNOTSUPP if FALLOC_FL_PUNCH_HOLE is set without FALLOC_FL_KEEP_SIZE :)

Oops, looks like we decided to enforce it, despite hole punches never
having an effect on file size. I stand corrected.

2013-11-02 15:07:38

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH 3/4] NFSD: Add WRITE_PLUS support for hole punches

On Sat, Nov 02, 2013 at 03:02:29PM +0000, Myklebust, Trond wrote:
> COMMIT has always had a metadata part to it, though. It guarantees stability of the ctime/mime and change attributes in addition to the file size and data.
>
> IOW: it really is more akin to fsync() than to fdatasync().

That's fine, but so far it wasn't used for any purely metadata
operation. E.g. if we allow it for a preallocation we should also
allow it for updating the size put the current file size.

And diverting from the spec to your implementation: we'd need a lot more
generic infrastructure to deal with unstable random non-data operations.

2013-11-02 13:54:10

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH 3/4] NFSD: Add WRITE_PLUS support for hole punches

On Tue, Oct 29, 2013 at 09:06:49AM -0400, J. Bruce Fields wrote:
> What I meant was--why are we doing a sync at all, instead of returning
> NFS_UNSTABLE and making the client commit?

Did NFSv4.2 introduce a concept of unstable metadata operations?


2013-11-05 14:24:04

by Anna Schumaker

[permalink] [raw]
Subject: Re: [PATCH 3/4] NFSD: Add WRITE_PLUS support for hole punches

On 11/05/2013 04:40 AM, Christoph Hellwig wrote:
>> +{
>> + int error, mode = 0;
>> +
>> + if (allocated == false)
>> + mode = FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE;
>> +
>> + error = do_fallocate(file, mode, offset, len);
>> + if (error == 0)
>> + error = vfs_fsync_range(file, offset, offset + len, 0);
>> + return nfserrno(error);
>
> Same problem with the falloc vs WRITE_PLUS mismatch here. For XFS you
> could make the allocated case all XFS_IOC_ZERO_RANGE, but the proposed
> VFS equivalent never seemed to have made it in.
>
> Also no need for the FALLOC_FL_KEEP_SIZE in the hole punch case, it is
> ignored.
>

It's not ignored in do_fallocate() (fs/open.c) where it returns -EOPNOTSUPP if FALLOC_FL_PUNCH_HOLE is set without FALLOC_FL_KEEP_SIZE :)

2013-11-02 15:20:29

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH 3/4] NFSD: Add WRITE_PLUS support for hole punches

On Mon, Oct 28, 2013 at 05:40:30PM -0400, J. Bruce Fields wrote:
> On Mon, Oct 28, 2013 at 10:57:25AM -0400, Anna Schumaker wrote:
> > This patch only implements DATA_CONTENT_HOLE support, but I tried to
> > write it so other arms of the discriminated union can be added easily
> > later.
>
> And there's an NFS4ERR_UNION_NOTSUPP to handle that case, OK. (Though
> the spec language is a little ambiguous:
>
> If the client asks for a hole and the server does not support
> that arm of the discriminated union, but does support one or
> more additional arms, it can signal to the client that it
> supports the operation, but not the arm with
> NFS4ERR_UNION_NOTSUPP.

Where is this language coming from? Seems like the latest document in
NFSv4.2/ repo still calls WRITE_PLUS INITIALIZE and doesn't have
anything content related, just a weird "ADB" concept.


2013-11-04 19:51:22

by Chuck Lever

[permalink] [raw]
Subject: Re: [PATCH 3/4] NFSD: Add WRITE_PLUS support for hole punches


On Nov 4, 2013, at 11:19 AM, Christoph Hellwig <[email protected]> wrote:

> On Mon, Nov 04, 2013 at 11:16:13AM -0800, Chuck Lever wrote:
>>> I still don't understand why anyone would phase out WRITE in favour of
>>> something that doesn't actually add any value for the write case.
>>
>> Protocol extensibility. WRITE_PLUS adds a discriminated union of data types that can be extended easily to include initialization patterns, integrity metadata, and other things like holes. It's entirely another question as to whether any particular extension is tasteful.
>
> Maybe I'm missing something, but what does multiplexing entirely
> different operation actually buy you? It is different operation after
> all.

I think a successful argument could be made that initialization or hole punching may not belong in a WRITE_PLUS operation. For example, using a COMMIT for a long-running server-performed file initialization doesn't make sense to me. The infrastructure we introduced for COPY_OFFLOAD seems better suited for file initialization.

> It's not like adding new operations to NFS is all that hard.

Actually, it is harder than you think. For various reasons, it takes five or more years to create a new NFS protocol specification, something we hope to address.

>> The mailing list is not the only place where they are discussed. We also use conference calls, and there are face-to-face meetings too. Not all of that content is distilled into a public record. Very much like Linux kernel development.
>
> Linux development strives very hard documenting rationales, something I
> haven't found on the NFS list or in the repository.

There are plenty of times where I find the kernel git log or e-mail archives entirely unhelpful in understanding the why's and wherefor's. It's a human endeavor, and like all such things, is imperfect, and is often a matter of knowing who to ask rather than where to look.

By the way, e-mail from non-members to [email protected] is moderated. The moderator may be traveling or otherwise unavailable, as IETF 88 is happening this week. If you subscribe to the list, you should be able to post without restriction.

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com


2013-11-05 02:07:16

by Haynes, Tom

[permalink] [raw]
Subject: Re: [PATCH 4/4] NFSD: Implement SEEK


On Nov 4, 2013, at 5:03 PM, Christoph Hellwig <[email protected]> wrote:

> On Mon, Nov 04, 2013 at 01:56:39PM -0800, Thomas Haynes wrote:
>> Nah, most likely me, no imagination needed.
>>
>> We were also looking at bringing larger WRITEs during 4.3 and I wanted
>> to have WRITE_PLUS already in place for that.
>
> What exactly does a WRITE PLUS with various non-related overloads in 4.2
> buy you for introducing large writes in 4.3?

In 4.3, READ_PLUS would be extended for large reads. We would at that
time need a WRITE_PLUS to be extended for large writes.

The relationship falls from having WRITE_PLUS handle all of the variants
handled by READ_PLUS. It needs to accept all of them because it
does not know what will be returned.

We did not want many new operators and also wanted the operators to
be extensible. With this approach, you can define a new arm of the
discriminated union, not have to implement it, and not burn an
operator.

Some of the history is captured here:

http://www.ietf.org/mail-archive/web/nfsv4/current/msg11235.html
http://www.ietf.org/mail-archive/web/nfsv4/current/msg11470.html
http://www.ietf.org/proceedings/84/slides/slides-84-nfsv4-1.pdf (slide 6)

It doesn't capture the intent of NFS4ERR_UNION_NOTSUPP in
this decision.

11.1.1.1. NFS4ERR_UNION_NOTSUPP (Error Code 10090)

One of the arguments to the operation is a discriminated union and
while the server supports the given operation, it does not support
the selected arm of the discriminated union. For an example, see
READ_PLUS (Section 14.10).

2013-11-05 12:33:15

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH 4/4] NFSD: Implement SEEK

On Mon, Nov 04, 2013 at 09:22:11AM -0800, Chuck Lever wrote:
> The original proposal is
>
> http://tools.ietf.org/html/draft-eisler-nfsv4-enterprise-apps-01
>
> and has been somewhat modified in its expression in NFSv4.2.

The INITIALIZE operation in that proposal look like a sane translation of
the SCSI WRITE SAME commands to NFS, but I have no idea
how that eveloved into the monster of ADHs in the NFS4.2 draft.


2013-11-04 19:54:12

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH 3/4] NFSD: Add WRITE_PLUS support for hole punches

On Mon, Nov 04, 2013 at 11:50:51AM -0800, Chuck Lever wrote:
> I think a successful argument could be made that initialization or hole punching may not belong in a WRITE_PLUS operation. For example, using a COMMIT for a long-running server-performed file initialization doesn't make sense to me. The infrastructure we introduced for COPY_OFFLOAD seems better suited for file initialization.

I've been going through the latest draft and will submit a counter
proposal in a few days time.

> > It's not like adding new operations to NFS is all that hard.
>
> Actually, it is harder than you think. For various reasons, it takes five or more years to create a new NFS protocol specification, something we hope to address.

I see the pain in getting new minor versions out. But that pain is
shared between adding a new operation and a new arm to an union.

> By the way, e-mail from non-members to [email protected] is moderated. The moderator may be traveling or otherwise unavailable, as IETF 88 is happening this week. If you subscribe to the list, you should be able to post without restriction.

I have been subscribed longer than I've been posting to it.


2013-11-04 19:55:38

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH 3/4] NFSD: Add WRITE_PLUS support for hole punches

On Mon, Nov 04, 2013 at 11:50:51AM -0800, Chuck Lever wrote:
>
> On Nov 4, 2013, at 11:19 AM, Christoph Hellwig <[email protected]> wrote:
>
> > On Mon, Nov 04, 2013 at 11:16:13AM -0800, Chuck Lever wrote:
> >>> I still don't understand why anyone would phase out WRITE in favour of
> >>> something that doesn't actually add any value for the write case.
> >>
> >> Protocol extensibility. WRITE_PLUS adds a discriminated union of data types that can be extended easily to include initialization patterns, integrity metadata, and other things like holes. It's entirely another question as to whether any particular extension is tasteful.
> >
> > Maybe I'm missing something, but what does multiplexing entirely
> > different operation actually buy you? It is different operation after
> > all.
>
> I think a successful argument could be made that initialization or hole punching may not belong in a WRITE_PLUS operation. For example, using a COMMIT for a long-running server-performed file initialization doesn't make sense to me. The infrastructure we introduced for COPY_OFFLOAD seems better suited for file initialization.
>
> > It's not like adding new operations to NFS is all that hard.
>
> Actually, it is harder than you think. For various reasons, it takes five or more years to create a new NFS protocol specification, something we hope to address.

Well, but that's not really a function of the number of operations.
It's not as though implementing hole punching will take more or less
time depending on whether we implement something as a compound op or a
union.

--b.