The following patchset implements an extension to nfsd
providing a complete minimal pnfs server exporting
DLM-based clustered file systems such as GFS2 or OCFS2.
The pNFS operations that are implemented are
GETDEVICELIST and GETDEVICEINFO,
LAYOUTGET and LAYOUTRETURN.
The server does the bookkeeping of the outstanding layout
state in response to layout get and return.
Also, the implementation cleans up the client layout state
opon client expiry and on CLOSE when the return_on_close
flag is set on the LAYOUTGET response. The latter is the
default behavior until layout recalls are implemented
with which the server can reclaim its resources in case
the client holds layout state post closing files.
The patchset is based on v3.12-rc2 and it's available also online here:
git://linux-nfs.org/~bhalevy/linux-pnfs.git pnfsd-dlm-3.12-rc2-2013-09-26
Benny
General infrastructure:
[PATCH RFC v0 01/49] pnfsd: Define CONFIG_PNFSD
[PATCH RFC v0 02/49] pnfsd: define NFSDDBG_PNFS
[PATCH RFC v0 03/49] pnfsd: return pnfs flags on exchange_id
[PATCH RFC v0 04/49] pnfsd: don't set up back channel on create_session for ds
[PATCH RFC v0 05/49] pnfsd: introduce pnfsd header files
[PATCH RFC v0 06/49] pnfsd: define pnfs_export_operations
[PATCH RFC v0 07/49] pnfsd: add pnfs export option
[PATCH RFC v0 08/49] pnfsd: layout verify
[PATCH RFC v0 09/49] pnfsd: initial stub
Device ops:
[PATCH RFC v0 10/49] pnfsd: use sbid hash table to map super_blocks to devid major identifiers
[PATCH RFC v0 11/49] NFSD: introduce exp_xdr.h
[PATCH RFC v0 12/49] pnfsd: get device list/info
[PATCH RFC v0 13/49] pnfsd: filelayout: get device list/info
[PATCH RFC v0 14/49] pnfsd: provide helper for xdr encoding of deviceid
[PATCH RFC v0 15/49] pnfsd: add helper functions for identifying DS filehandles
[PATCH RFC v0 16/49] pnfsd: accept all ds stateids
layout get:
[PATCH RFC v0 17/49] DEBUG: nfsd: more client_lock asserts
[PATCH RFC v0 18/49] pnfsd: nfs4_assert_state_locked
[PATCH RFC v0 19/49] pnfsd: layout get
[PATCH RFC v0 20/49] pnfsd: filelayout: layout encoding
layout state handling for layout get:
[PATCH RFC v0 21/49] nfsd: no need to unhash_stid before free
[PATCH RFC v0 22/49] nfsd: cleanup free_stid
[PATCH RFC v0 23/49] pnfsd: layout state allocation
[PATCH RFC v0 24/49] pnfsd: process the layout stateid
[PATCH RFC v0 25/49] pnfsd: layout state per client tracking
[PATCH RFC v0 26/49] pnfsd: layout state per file tracking
[PATCH RFC v0 27/49] pnfsd: hash layouts on layout state
[PATCH RFC v0 28/49] pnfsd: support layout segment merging
pnfs attributes:
[PATCH RFC v0 29/49] pnfsd: support layout_type attribute
[PATCH RFC v0 30/49] pnfsd: make pnfs server return layout_blksize when the client asks for it
[PATCH RFC v0 31/49] pnfsd: add support for per-file layout_types attribute
pnfsd over dlm:
[PATCH RFC v0 32/49] pnfsd: per block device dlm data server list cache
[PATCH RFC v0 33/49] pnfsd: Add IP address validation to nfsd4_set_pnfs_dlm_device()
[PATCH RFC v0 34/49] pnfsd: new nfsd filesystem file: pnfs_dlm_device
[PATCH RFC v0 35/49] pnfsd: nfsd4_pnfs_dlm_getdeviter
[PATCH RFC v0 36/49] pnfsd: nfsd4_pnfs_dlm_getdevinfo
[PATCH RFC v0 37/49] pnfsd: make /proc/fs/nfsd/pnfs_dlm_device report dlm device list.
[PATCH RFC v0 38/49] pnfsd: nfsd4_pnfs_dlm_layoutget
[PATCH RFC v0 39/49] pnfsd: DLM file layout only support read iomode layouts
[PATCH RFC v0 40/49] pnfsd: add dlm file layout layout-type
[PATCH RFC v0 41/49] pnfsd: dlm pnfs_export_operations
[PATCH RFC v0 42/49] pnfsd: gfs2: use generic file layout pnfs operations vector
layout return / expire / return_on_close:
[PATCH RFC v0 43/49] pnfsd: release state lock around iput in put_nfs4_file
[PATCH RFC v0 44/49] posix_acl: resolve compile dependency in posix_acl.h
[PATCH RFC v0 45/49] nfs: resolve compile dependency in nfs_xdr.h
[PATCH RFC v0 46/49] pnfsd: layout return generic implementation
[PATCH RFC v0 47/49] pnfsd: pnfs_expire_client
[PATCH RFC v0 48/49] pnfsd: return on close
[PATCH RFC v0 49/49] pnfsd: dlm set return_on_close to true
From: Andy Adamson <[email protected]>
Export nfsd4_pnfs_dlm_layouttype for use by dlm cluster file systems.
Signed-off-by: Andy Adamson <[email protected]>
Acked-by: Steven Whitehouse <[email protected]>
[pnfsd: dlm: fixup LAYOUT_NFSV4_1_FILES]
Signed-off-by: Benny Halevy <[email protected]>
Signed-off-by: Benny Halevy <[email protected]>
---
fs/nfsd/nfs4pnfsdlm.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/fs/nfsd/nfs4pnfsdlm.c b/fs/nfsd/nfs4pnfsdlm.c
index acc1f91..de1af22 100644
--- a/fs/nfsd/nfs4pnfsdlm.c
+++ b/fs/nfsd/nfs4pnfsdlm.c
@@ -446,8 +446,15 @@ static enum nfsstat4 nfsd4_pnfs_dlm_layoutget(struct inode *inode,
goto exit;
}
+static int
+nfsd4_pnfs_dlm_layouttype(struct super_block *sb)
+{
+ return LAYOUT_NFSV4_1_FILES;
+}
+
/* For use by DLM cluster file systems exported by pNFSD */
const struct pnfs_export_operations pnfs_dlm_export_ops = {
+ .layout_type = nfsd4_pnfs_dlm_layouttype,
.get_device_info = nfsd4_pnfs_dlm_getdevinfo,
.get_device_iter = nfsd4_pnfs_dlm_getdeviter,
.layout_get = nfsd4_pnfs_dlm_layoutget,
--
1.8.3.1
From: Benny Halevy <[email protected]>
struct pnfs_export_operations defines the VFS level API for pNFS,
not including callbacks. A pnfs-exportable filesystem sets
a pointer to its pnfs export vector in its struct super_block.s_pnfs_op.
The file system provides the per-superblock layout_type method that
determines if it supports pnfs for the filesystem identified by
the superblock, and if so, with which layout type (only one per-sb is
supported).
Device ops:
get_device_iter is used to fill-in the device list for GETDEVICELIST
and get_device_info is used to encode the device info for GETDEVICEINFO.
Layout ops:
layout_get, layout_commit, and layout_return implement the file system- and
layout type- specific parts of their respective protocol operations: LAYOUTGET,
LAYOUTCOMMIT, and LAYOUTRETURN.
The following methods are mandatory to be implemented:
layout_type, get_device_info, and layout_get.
Note: define pnfs export operations in a stub form in this patch.
Actual operations are defined along with their usage.
[pnfsd: provide default no-op operations]
Signed-off-by: Benny Halevy <[email protected]>
[pnfsd: compile fixes for pnfsd branch]
Signed-off-by: Fred Isaman <[email protected]>
[gfs2: set pnfs_dlm_export_ops only for CONFIG_PNFSD]
[pnfsd: handle s_pnfs_op==NULL]
Signed-off-by: Benny Halevy <[email protected]>
Signed-off-by: Benny Halevy <[email protected]>
---
fs/nfsd/export.c | 2 +-
include/linux/fs.h | 2 ++
include/linux/nfsd/nfsd4_pnfs.h | 14 ++++++++++++++
3 files changed, 17 insertions(+), 1 deletion(-)
diff --git a/fs/nfsd/export.c b/fs/nfsd/export.c
index 5f38ea3..f26b0b9 100644
--- a/fs/nfsd/export.c
+++ b/fs/nfsd/export.c
@@ -16,7 +16,7 @@
#include <linux/module.h>
#include <linux/exportfs.h>
#include <linux/sunrpc/svc_xprt.h>
-
+#include <linux/nfsd/nfsd4_pnfs.h>
#include <net/ipv6.h>
#include "nfsd.h"
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 3f40547..d9186a4 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -34,6 +34,7 @@
#include <uapi/linux/fs.h>
struct export_operations;
+struct pnfs_export_operations;
struct hd_geometry;
struct iovec;
struct nameidata;
@@ -1251,6 +1252,7 @@ struct super_block {
const struct dquot_operations *dq_op;
const struct quotactl_ops *s_qcop;
const struct export_operations *s_export_op;
+ const struct pnfs_export_operations *s_pnfs_op;
unsigned long s_flags;
unsigned long s_magic;
struct dentry *s_root;
diff --git a/include/linux/nfsd/nfsd4_pnfs.h b/include/linux/nfsd/nfsd4_pnfs.h
index 9e7d95e..ff6613e 100644
--- a/include/linux/nfsd/nfsd4_pnfs.h
+++ b/include/linux/nfsd/nfsd4_pnfs.h
@@ -34,4 +34,18 @@
#ifndef _LINUX_NFSD_NFSD4_PNFS_H
#define _LINUX_NFSD_NFSD4_PNFS_H
+/*
+ * pNFS export operations vector.
+ *
+ * The filesystem must implement the following methods:
+ * layout_type
+ * get_device_info
+ * layout_get
+ *
+ * All other methods are optional and can be set to NULL if not implemented.
+ */
+struct pnfs_export_operations {
+ /* stub */
+};
+
#endif /* _LINUX_NFSD_NFSD4_PNFS_H */
--
1.8.3.1
On Thu, Sep 26, 2013 at 02:40:02PM -0400, Benny Halevy wrote:
> From: Benny Halevy <[email protected]>
>
> Set the cl_exchange_flags to be non_pnfs if we do not set
> either pnfs or ds (in the plain old nfs41 case).
>
> Note that we always set both MDS and DS exchangeid capability flags
> when CONFIG_PNFSD is enabled.
> The client needs to remember what the session is used for
> if it cares to distiguish between DSs and MDSs.
>
> EXCHGID4_FLAG_USE_NON_PNFS should be set when the server does not support
> operations (e.g. LAYOUTGET) or attributes that pertain to pNFS.
Minor nit: since we don't actually support those operations yet, this
patch should probably come later in the series.
--b.
>
> [extraced from pnfsd: Initial pNFS server implementation.]
> Signed-off-by: Benny Halevy <[email protected]>
> [pnfsd: Fixup nfsd4_set_ex_flags.]
> Signed-off-by: Dean Hildebrand <[email protected]>
> [pnfsd: set EXCHGID4_FLAG_USE_NON_PNFS when !CONFIG_PNFSD]
> [pnfsd: fix compiler warning in nfsd4_set_ex_flags when CONFIG_PNFSD is not defined]
> [pnfsd: always set both MDS and DS exchangeid capability flags]
> Signed-off-by: Benny Halevy <[email protected]>
> Signed-off-by: Benny Halevy <[email protected]>
> ---
> fs/nfsd/nfs4state.c | 6 +++++-
> 1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> index 57a0340..21c15fc 100644
> --- a/fs/nfsd/nfs4state.c
> +++ b/fs/nfsd/nfs4state.c
> @@ -1614,8 +1614,12 @@ static bool clp_used_exchangeid(struct nfs4_client *clp)
> static void
> nfsd4_set_ex_flags(struct nfs4_client *new, struct nfsd4_exchange_id *clid)
> {
> - /* pNFS is not supported */
> +#if defined(CONFIG_PNFSD)
> + new->cl_exchange_flags |= EXCHGID4_FLAG_USE_PNFS_MDS |
> + EXCHGID4_FLAG_USE_PNFS_DS;
> +#else /* CONFIG_PNFSD */
> new->cl_exchange_flags |= EXCHGID4_FLAG_USE_NON_PNFS;
> +#endif /* CONFIG_PNFSD */
>
> /* Referrals are supported, Migration is not. */
> new->cl_exchange_flags |= EXCHGID4_FLAG_SUPP_MOVED_REFER;
> --
> 1.8.3.1
>
From: Tao Guo <[email protected]>
Signed-off-by: Tao Guo <[email protected]>
[pnfsd: FATTR4_WORD2_LAYOUT_BLKSIZE is supported only under CONFIG_PNFSD]
Signed-off-by: Benny Halevy <[email protected]>
Signed-off-by: Benny Halevy <[email protected]>
---
fs/nfsd/nfs4xdr.c | 6 ++++++
fs/nfsd/nfsd.h | 6 ++++++
2 files changed, 12 insertions(+)
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index 83f7147..6781a33 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -2583,6 +2583,12 @@ static int get_parent_attributes(struct svc_export *exp, struct kstat *stat)
} else
WRITE32(0); /* length */
}
+
+ if (bmval2 & FATTR4_WORD2_LAYOUT_BLKSIZE) {
+ if ((buflen -= 4) < 0)
+ goto out_resource;
+ WRITE32(stat.blksize);
+ }
#endif /* CONFIG_PNFSD */
if (bmval2 & FATTR4_WORD2_SECURITY_LABEL) {
status = nfsd4_encode_security_label(rqstp, context,
diff --git a/fs/nfsd/nfsd.h b/fs/nfsd/nfsd.h
index f49fb0b..d81db6e 100644
--- a/fs/nfsd/nfsd.h
+++ b/fs/nfsd/nfsd.h
@@ -329,8 +329,14 @@ static inline void nfs4_reset_lease(time_t leasetime) { }
NFSD4_SUPPORTED_ATTRS_WORD1
#endif /* CONFIG_PNFSD */
+#if defined(CONFIG_PNFSD)
+#define NFSD4_1_SUPPORTED_ATTRS_WORD2 \
+ (NFSD4_SUPPORTED_ATTRS_WORD2 | FATTR4_WORD2_SUPPATTR_EXCLCREAT | \
+ FATTR4_WORD2_LAYOUT_BLKSIZE)
+#else /* CONFIG_PNFSD */
#define NFSD4_1_SUPPORTED_ATTRS_WORD2 \
(NFSD4_SUPPORTED_ATTRS_WORD2 | FATTR4_WORD2_SUPPATTR_EXCLCREAT)
+#endif /* CONFIG_PNFSD */
#ifdef CONFIG_NFSD_V4_SECURITY_LABEL
#define NFSD4_2_SUPPORTED_ATTRS_WORD2 \
--
1.8.3.1
On 09/29/2013 05:17 AM, Christoph Hellwig wrote:
> Seems like layout_type should just be a field in the export ops instead
> of a method.
No! this field does not make any sense at all it should just be
removed.
There is a pNFS inquiry a client sends that asks for an array
of all the types supported by this mount point. So these method
should be returning an array. If at all.
The layout_type is just an input to some operations and are
no concern of NFSD. This is not a yes/no flag for pNFS the
opts vector should be the flag.
Cheers
Boaz
From: Andy Adamson <[email protected]>
Common function for LAYOUTGET and LAYOUTRETURN layout stateid processing.
The 'first open, delegation, or lock stateid' presented by the client is
looked up for verification.
Both initial and non-initial parallel LAYOUTGET operations and parallel
LAYOUTRETURN operations are supported.
Note: layout stateid seqid checking is more lax than that specified in
draft-ietf-nfsv4-minorversion1-22 for Connectathon.
Take a reference count whenever the pointer to the layout state
is kept, in particular when the layout structure is listed on the
state's ls_layouts. On dequeue_layout the layout state if being put
and its reference count will drop to zero if the list empties
unless someone's holding a reference transiently within the scope
of teh calling function, in which case the layout state is dereferenced
before the function exits.
Note: the layout stateid must be updated by layout get only
on success upon changing the actual state, otherwise,
a parallel layout_recall will send the wrong stateid.
Signed-off-by: Andy Adamson <[email protected]>
Signed-off-by: Benny Halevy <[email protected]>
[pnfsd: nfs4_process_layout_stateid print result stateid conditionally]
[pnfsd: use STATEID_FMT and STATEID_VAL for printing stateids]
[pnfsd: debug print layout stateid before putting the layout_state]
Signed-off-by: Benny Halevy <[email protected]>
[pnfsd: fix layout state reference count]
Signed-off-by: Benny Halevy <[email protected]>
[used nfs4_check_stateid in nfs4_process_layout_stateid]
[Moved pnfsd code from nfs4state.c to nfs4pnfsd.c]
Signed-off-by: Andy Adamson <[email protected]>
[pnfsd: use a spinlock for layout state]
Signed-off-by: Benny Halevy <[email protected]>
[pnfsd: Move pnfsd code out of nfs4state.c/h]
Signed-off-by: Boaz Harrosh <[email protected]>
[moved defs back into state.h]
[verify_stateid's return status is __be32]
[update layout stateid properly]
[convert to using 3.2 layout state infrastructure]
[squashed Helper functions for layout stateid processing]
Signed-off-by: Benny Halevy <[email protected]>
[pnfsd: Update the reference of nfs4_layout_state properly]
Signed-off-by: Yanchuan Nian <[email protected]>
[pnfsd: use nfsd_net for layoutget starting v3.8]
[do not hang layouts on lo_state yet]
Signed-off-by: Benny Halevy <[email protected]>
[pnfsd: LAYOUTGET layout stateid processing]
Signed-off-by: Andy Adamson <[email protected]>
Signed-off-by: Benny Halevy <[email protected]>
[pnfsd: nfs4_process_layout_stateid: replace do_alloc with typemask]
Signed-off-by: Nadav Shemer <[email protected]>
Massaged-by: Lev Solomonov <[email protected]>
Signed-off-by: Benny Halevy <[email protected]>
---
fs/nfsd/nfs4pnfsd.c | 69 +++++++++++++++++++++++++++++++++++++++++++++++++++++
fs/nfsd/nfs4state.c | 14 +++++------
fs/nfsd/nfs4xdr.c | 2 +-
fs/nfsd/state.h | 2 ++
4 files changed, 79 insertions(+), 8 deletions(-)
diff --git a/fs/nfsd/nfs4pnfsd.c b/fs/nfsd/nfs4pnfsd.c
index 82b6a7d..e28c396 100644
--- a/fs/nfsd/nfs4pnfsd.c
+++ b/fs/nfsd/nfs4pnfsd.c
@@ -164,6 +164,64 @@ struct sbid_tracker {
kref_put(&ls->ls_ref, destroy_layout_state);
}
+/*
+ * We have looked up the nfs4_file corresponding to the current_fh, and
+ * confirmed the clientid. Pull the few tests from nfs4_preprocess_stateid_op()
+ * that make sense with a layout stateid.
+ *
+ * If the layout state was found in cache, grab a reference count on it;
+ * otherwise, allocate a new layout state if "do_alloc" is set.
+ *
+ * Called with the state_lock held
+ * Returns zero and stateid is updated, or error.
+ */
+static __be32
+nfs4_process_layout_stateid(struct nfs4_client *clp, struct nfs4_file *fp,
+ stateid_t *stateid, unsigned char typemask,
+ struct nfs4_layout_state **lsp)
+{
+ struct nfs4_layout_state *ls = NULL;
+ __be32 status = 0;
+ struct nfs4_stid *stid;
+
+ dprintk("--> %s clp %p fp %p operation stateid=" STATEID_FMT "\n",
+ __func__, clp, fp, STATEID_VAL(stateid));
+
+ nfs4_assert_state_locked();
+ status = nfsd4_lookup_stateid(stateid, typemask, &stid, true,
+ net_generic(clp->net, nfsd_net_id));
+ if (status)
+ goto out;
+
+ /* Is this the first use of this layout ? */
+ if (stid->sc_type != NFS4_LAYOUT_STID) {
+ ls = alloc_init_layout_state(clp, stateid);
+ if (!ls) {
+ status = nfserr_jukebox;
+ goto out;
+ }
+ } else {
+ ls = container_of(stid, struct nfs4_layout_state, ls_stid);
+
+ /* BAD STATEID */
+ if (stateid->si_generation > ls->ls_stid.sc_stateid.si_generation) {
+ dprintk("%s bad stateid 1\n", __func__);
+ status = nfserr_bad_stateid;
+ goto out;
+ }
+ get_layout_state(ls);
+ }
+ status = 0;
+
+ *lsp = ls;
+ dprintk("%s: layout stateid=" STATEID_FMT " ref=%d\n", __func__,
+ STATEID_VAL(&ls->ls_stid.sc_stateid), atomic_read(&ls->ls_ref.refcount));
+out:
+ dprintk("<-- %s status %d\n", __func__, htonl(status));
+
+ return status;
+}
+
static struct nfs4_layout *
alloc_layout(void)
{
@@ -275,6 +333,7 @@ struct super_block *
struct nfs4_file *fp;
struct nfs4_client *clp;
struct nfs4_layout *lp = NULL;
+ struct nfs4_layout_state *ls = NULL;
struct nfsd4_pnfs_layoutget_arg args = {
.lg_minlength = lgp->lg_minlength,
.lg_fh = &lgp->lg_fhp->fh_handle,
@@ -320,6 +379,14 @@ struct super_block *
goto out_unlock;
}
+ /* Check decoded layout stateid */
+ nfserr = nfs4_process_layout_stateid(clp, fp, &lgp->lg_sid,
+ (NFS4_OPEN_STID | NFS4_LOCK_STID |
+ NFS4_DELEG_STID | NFS4_LAYOUT_STID),
+ &ls);
+ if (nfserr)
+ goto out_unlock;
+
lp = alloc_layout();
if (!lp) {
nfserr = nfserr_layouttrylater;
@@ -378,6 +445,8 @@ struct super_block *
init_layout(lp, &res.lg_seg);
out_unlock:
+ if (ls)
+ put_layout_state(ls);
nfs4_unlock_state();
if (fp)
put_nfs4_file(fp);
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index 6e251fb..fa292bb 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -1313,7 +1313,7 @@ static void gen_confirm(struct nfs4_client *clp)
memcpy(clp->cl_confirm.data, verf, sizeof(clp->cl_confirm.data));
}
-static struct nfs4_stid *find_stateid(struct nfs4_client *cl, stateid_t *t)
+struct nfs4_stid *nfsd4_find_stateid(struct nfs4_client *cl, stateid_t *t)
{
struct nfs4_stid *ret;
@@ -1328,7 +1328,7 @@ static struct nfs4_stid *find_stateid_by_type(struct nfs4_client *cl, stateid_t
{
struct nfs4_stid *s;
- s = find_stateid(cl, t);
+ s = nfsd4_find_stateid(cl, t);
if (!s)
return NULL;
if (typemask & s->sc_type)
@@ -3617,7 +3617,7 @@ static __be32 nfsd4_validate_stateid(struct nfs4_client *cl, stateid_t *stateid)
"with incorrect client ID\n", addr_str);
return nfserr_bad_stateid;
}
- s = find_stateid(cl, stateid);
+ s = nfsd4_find_stateid(cl, stateid);
if (!s)
return nfserr_bad_stateid;
status = check_stateid_generation(stateid, &s->sc_stateid, 1);
@@ -3643,9 +3643,9 @@ static __be32 nfsd4_validate_stateid(struct nfs4_client *cl, stateid_t *stateid)
}
}
-static __be32 nfsd4_lookup_stateid(stateid_t *stateid, unsigned char typemask,
- struct nfs4_stid **s, bool sessions,
- struct nfsd_net *nn)
+__be32 nfsd4_lookup_stateid(stateid_t *stateid, unsigned char typemask,
+ struct nfs4_stid **s, bool sessions,
+ struct nfsd_net *nn)
{
struct nfs4_client *cl;
__be32 status;
@@ -3779,7 +3779,7 @@ static __be32 nfsd4_lookup_stateid(stateid_t *stateid, unsigned char typemask,
__be32 ret = nfserr_bad_stateid;
nfs4_lock_state();
- s = find_stateid(cl, stateid);
+ s = nfsd4_find_stateid(cl, stateid);
if (!s)
goto out;
switch (s->sc_type) {
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index 1cc19cd..b9c4417 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -3829,7 +3829,7 @@ static __be32 nfsd4_encode_bind_conn_to_session(struct nfsd4_compoundres *resp,
if (xdr.end - xdr.p > exp_xdr_qwords(maxcount & ~3))
xdr.end = xdr.p + exp_xdr_qwords(maxcount & ~3);
- /* Retrieve, encode, and merge layout */
+ /* Retrieve, encode, and merge layout; process stateid */
nfserr = nfs4_pnfs_get_layout(resp->rqstp, lgp, &xdr);
if (nfserr)
goto err;
diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h
index 18a64c4..8c6e097 100644
--- a/fs/nfsd/state.h
+++ b/fs/nfsd/state.h
@@ -490,6 +490,8 @@ extern struct nfs4_client_reclaim *nfs4_client_to_reclaim(const char *name,
extern struct nfs4_stid *nfsd4_alloc_stid(struct nfs4_client *cl, struct kmem_cache *slab);
extern void nfsd4_free_stid(struct kmem_cache *slab, struct nfs4_stid *s);
extern void nfsd4_remove_stid(struct nfs4_stid *s);
+extern struct nfs4_stid *nfsd4_find_stateid(struct nfs4_client *, stateid_t *);
+extern __be32 nfsd4_lookup_stateid(stateid_t *, unsigned char typemask, struct nfs4_stid **, bool sessions, struct nfsd_net *);
#if defined(CONFIG_PNFSD)
extern int nfsd4_init_pnfs_slabs(void);
--
1.8.3.1
Signed-off-by: Benny Halevy <[email protected]>
---
include/linux/nfsd/nfsd4_pnfs.h | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/include/linux/nfsd/nfsd4_pnfs.h b/include/linux/nfsd/nfsd4_pnfs.h
index 53a0bb7..28f9daa 100644
--- a/include/linux/nfsd/nfsd4_pnfs.h
+++ b/include/linux/nfsd/nfsd4_pnfs.h
@@ -42,6 +42,13 @@ struct nfsd4_pnfs_deviceid {
u64 devid; /* filesystem-wide unique device ID */
};
+static inline __be32 *nfsd4_encode_deviceid(__be32 *p,
+ const struct nfsd4_pnfs_deviceid *dp)
+{
+ p = exp_xdr_encode_u64(p, dp->sbid);
+ return exp_xdr_encode_u64(p, dp->devid);
+}
+
struct nfsd4_pnfs_dev_iter_res {
u64 gd_cookie; /* request/repsonse */
u64 gd_verf; /* request/repsonse */
--
1.8.3.1
Seems like layout_type should just be a field in the export ops instead
of a method.
Signed-off-by: Benny Halevy <[email protected]>
---
fs/nfsd/nfs4state.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index b80807c..68b6f7a 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -168,6 +168,7 @@ static __be32 get_client_locked(struct nfs4_client *clp)
{
struct nfsd_net *nn = net_generic(clp->net, nfsd_net_id);
+ lockdep_assert_held(&nn->client_lock);
if (is_client_expired(clp)) {
WARN_ON(1);
printk("%s: client (clientid %08x/%08x) already expired\n",
@@ -1018,6 +1019,7 @@ static void init_session(struct svc_rqst *rqstp, struct nfsd4_session *new, stru
int idx;
struct nfsd_net *nn = net_generic(net, nfsd_net_id);
+ lockdep_assert_held(&nn->client_lock);
dump_sessionid(__func__, sessionid);
idx = hash_sessionid(sessionid);
/* Search in the appropriate list */
--
1.8.3.1
From: Andy Adamson <[email protected]>
Declare a global pnfs_export_operations struct for use with DLM cluster
file systems who wish to be exported by pnfs.
Signed-off-by: Andy Adamson <[email protected]>
[pnfsd: define dlm export ops for the !CONFIG_PNFSD case]
[gfs2: set pnfs_dlm_export_ops only for CONFIG_PNFSD]
Signed-off-by: Benny Halevy <[email protected]>
Acked-by: Steven Whitehouse <[email protected]>
Signed-off-by: Benny Halevy <[email protected]>
---
include/linux/nfsd/nfs4pnfsdlm.h | 3 +++
1 file changed, 3 insertions(+)
diff --git a/include/linux/nfsd/nfs4pnfsdlm.h b/include/linux/nfsd/nfs4pnfsdlm.h
index a4f3477..eb31123 100644
--- a/include/linux/nfsd/nfs4pnfsdlm.h
+++ b/include/linux/nfsd/nfs4pnfsdlm.h
@@ -35,6 +35,9 @@
#ifdef CONFIG_PNFSD
+/* For use by DLM cluster file systems exported by pNFSD */
+extern const struct pnfs_export_operations pnfs_dlm_export_ops;
+
int nfsd4_set_pnfs_dlm_device(char *pnfs_dlm_device, int len);
void nfsd4_pnfs_dlm_shutdown(void);
--
1.8.3.1
On 09/26/2013 02:36 PM, Benny Halevy wrote:
> The following patchset implements an extension to nfsd
> providing a complete minimal pnfs server exporting
> DLM-based clustered file systems such as GFS2 or OCFS2.
>
> The pNFS operations that are implemented are
> GETDEVICELIST and GETDEVICEINFO,
> LAYOUTGET and LAYOUTRETURN.
>
> The server does the bookkeeping of the outstanding layout
> state in response to layout get and return.
>
> Also, the implementation cleans up the client layout state
> opon client expiry and on CLOSE when the return_on_close
> flag is set on the LAYOUTGET response. The latter is the
> default behavior until layout recalls are implemented
> with which the server can reclaim its resources in case
> the client holds layout state post closing files.
>
> The patchset is based on v3.12-rc2 and it's available also online here:
> git://linux-nfs.org/~bhalevy/linux-pnfs.git pnfsd-dlm-3.12-rc2-2013-09-26
>
I thought that we said that exofs server is going in first. What happened?
Cheers
Boaz
> Benny
>
> General infrastructure:
> [PATCH RFC v0 01/49] pnfsd: Define CONFIG_PNFSD
> [PATCH RFC v0 02/49] pnfsd: define NFSDDBG_PNFS
> [PATCH RFC v0 03/49] pnfsd: return pnfs flags on exchange_id
> [PATCH RFC v0 04/49] pnfsd: don't set up back channel on create_session for ds
> [PATCH RFC v0 05/49] pnfsd: introduce pnfsd header files
> [PATCH RFC v0 06/49] pnfsd: define pnfs_export_operations
> [PATCH RFC v0 07/49] pnfsd: add pnfs export option
> [PATCH RFC v0 08/49] pnfsd: layout verify
> [PATCH RFC v0 09/49] pnfsd: initial stub
>
> Device ops:
> [PATCH RFC v0 10/49] pnfsd: use sbid hash table to map super_blocks to devid major identifiers
> [PATCH RFC v0 11/49] NFSD: introduce exp_xdr.h
> [PATCH RFC v0 12/49] pnfsd: get device list/info
> [PATCH RFC v0 13/49] pnfsd: filelayout: get device list/info
> [PATCH RFC v0 14/49] pnfsd: provide helper for xdr encoding of deviceid
> [PATCH RFC v0 15/49] pnfsd: add helper functions for identifying DS filehandles
> [PATCH RFC v0 16/49] pnfsd: accept all ds stateids
>
> layout get:
> [PATCH RFC v0 17/49] DEBUG: nfsd: more client_lock asserts
> [PATCH RFC v0 18/49] pnfsd: nfs4_assert_state_locked
> [PATCH RFC v0 19/49] pnfsd: layout get
> [PATCH RFC v0 20/49] pnfsd: filelayout: layout encoding
>
> layout state handling for layout get:
> [PATCH RFC v0 21/49] nfsd: no need to unhash_stid before free
> [PATCH RFC v0 22/49] nfsd: cleanup free_stid
> [PATCH RFC v0 23/49] pnfsd: layout state allocation
> [PATCH RFC v0 24/49] pnfsd: process the layout stateid
> [PATCH RFC v0 25/49] pnfsd: layout state per client tracking
> [PATCH RFC v0 26/49] pnfsd: layout state per file tracking
> [PATCH RFC v0 27/49] pnfsd: hash layouts on layout state
> [PATCH RFC v0 28/49] pnfsd: support layout segment merging
>
> pnfs attributes:
> [PATCH RFC v0 29/49] pnfsd: support layout_type attribute
> [PATCH RFC v0 30/49] pnfsd: make pnfs server return layout_blksize when the client asks for it
> [PATCH RFC v0 31/49] pnfsd: add support for per-file layout_types attribute
>
> pnfsd over dlm:
> [PATCH RFC v0 32/49] pnfsd: per block device dlm data server list cache
> [PATCH RFC v0 33/49] pnfsd: Add IP address validation to nfsd4_set_pnfs_dlm_device()
> [PATCH RFC v0 34/49] pnfsd: new nfsd filesystem file: pnfs_dlm_device
> [PATCH RFC v0 35/49] pnfsd: nfsd4_pnfs_dlm_getdeviter
> [PATCH RFC v0 36/49] pnfsd: nfsd4_pnfs_dlm_getdevinfo
> [PATCH RFC v0 37/49] pnfsd: make /proc/fs/nfsd/pnfs_dlm_device report dlm device list.
> [PATCH RFC v0 38/49] pnfsd: nfsd4_pnfs_dlm_layoutget
> [PATCH RFC v0 39/49] pnfsd: DLM file layout only support read iomode layouts
> [PATCH RFC v0 40/49] pnfsd: add dlm file layout layout-type
> [PATCH RFC v0 41/49] pnfsd: dlm pnfs_export_operations
> [PATCH RFC v0 42/49] pnfsd: gfs2: use generic file layout pnfs operations vector
>
> layout return / expire / return_on_close:
> [PATCH RFC v0 43/49] pnfsd: release state lock around iput in put_nfs4_file
> [PATCH RFC v0 44/49] posix_acl: resolve compile dependency in posix_acl.h
> [PATCH RFC v0 45/49] nfs: resolve compile dependency in nfs_xdr.h
> [PATCH RFC v0 46/49] pnfsd: layout return generic implementation
> [PATCH RFC v0 47/49] pnfsd: pnfs_expire_client
> [PATCH RFC v0 48/49] pnfsd: return on close
> [PATCH RFC v0 49/49] pnfsd: dlm set return_on_close to true
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
From: Benny Halevy <[email protected]>
[extraced from pnfsd: Initial pNFS server implementation.]
Signed-off-by: Benny Halevy <[email protected]>
Signed-off-by: Benny Halevy <[email protected]>
---
include/uapi/linux/nfsd/debug.h | 1 +
1 file changed, 1 insertion(+)
diff --git a/include/uapi/linux/nfsd/debug.h b/include/uapi/linux/nfsd/debug.h
index a6f453c..168f3a3 100644
--- a/include/uapi/linux/nfsd/debug.h
+++ b/include/uapi/linux/nfsd/debug.h
@@ -32,6 +32,7 @@
#define NFSDDBG_REPCACHE 0x0080
#define NFSDDBG_XDR 0x0100
#define NFSDDBG_LOCKD 0x0200
+#define NFSDDBG_PNFS 0x0400
#define NFSDDBG_ALL 0x7FFF
#define NFSDDBG_NOCHANGE 0xFFFF
--
1.8.3.1
The actual XDR encoding doesn't have business being under fs/exportfs
and should be in the NFSD code itself.
Currently, just return the same per-fs layout types
Signed-off-by: Benny Halevy <[email protected]>
---
fs/nfsd/nfs4xdr.c | 3 ++-
fs/nfsd/nfsd.h | 2 +-
include/linux/nfs4.h | 1 +
3 files changed, 4 insertions(+), 2 deletions(-)
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index 6781a33..1a50467 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -2561,7 +2561,8 @@ static int get_parent_attributes(struct svc_export *exp, struct kstat *stat)
WRITE64(stat.ino);
}
#if defined(CONFIG_PNFSD)
- if (bmval1 & FATTR4_WORD1_FS_LAYOUT_TYPES) {
+ if ((bmval1 & FATTR4_WORD1_FS_LAYOUT_TYPES) ||
+ (bmval2 & FATTR4_WORD2_LAYOUT_TYPES)) {
struct super_block *sb = dentry->d_inode->i_sb;
int type = 0;
diff --git a/fs/nfsd/nfsd.h b/fs/nfsd/nfsd.h
index d81db6e..87aa3aa 100644
--- a/fs/nfsd/nfsd.h
+++ b/fs/nfsd/nfsd.h
@@ -332,7 +332,7 @@ static inline void nfs4_reset_lease(time_t leasetime) { }
#if defined(CONFIG_PNFSD)
#define NFSD4_1_SUPPORTED_ATTRS_WORD2 \
(NFSD4_SUPPORTED_ATTRS_WORD2 | FATTR4_WORD2_SUPPATTR_EXCLCREAT | \
- FATTR4_WORD2_LAYOUT_BLKSIZE)
+ FATTR4_WORD2_LAYOUT_TYPES | FATTR4_WORD2_LAYOUT_BLKSIZE)
#else /* CONFIG_PNFSD */
#define NFSD4_1_SUPPORTED_ATTRS_WORD2 \
(NFSD4_SUPPORTED_ATTRS_WORD2 | FATTR4_WORD2_SUPPATTR_EXCLCREAT)
diff --git a/include/linux/nfs4.h b/include/linux/nfs4.h
index 2c3aa9f..7f6e548 100644
--- a/include/linux/nfs4.h
+++ b/include/linux/nfs4.h
@@ -398,6 +398,7 @@ enum lock_type4 {
#define FATTR4_WORD1_TIME_MODIFY_SET (1UL << 22)
#define FATTR4_WORD1_MOUNTED_ON_FILEID (1UL << 23)
#define FATTR4_WORD1_FS_LAYOUT_TYPES (1UL << 30)
+#define FATTR4_WORD2_LAYOUT_TYPES (1UL << 0)
#define FATTR4_WORD2_LAYOUT_BLKSIZE (1UL << 1)
#define FATTR4_WORD2_MDSTHRESHOLD (1UL << 4)
#define FATTR4_WORD2_SECURITY_LABEL (1UL << 17)
--
1.8.3.1
From: Andy Adamson <[email protected]>
Signed-off-by: Andy Adamson <[email protected]>
[gfs2: set pnfs_dlm_export_ops only for CONFIG_PNFSD]
Signed-off-by: Benny Halevy <[email protected]>
Acked-by: Steven Whitehouse <[email protected]>
Signed-off-by: Benny Halevy <[email protected]>
---
fs/gfs2/ops_fstype.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/fs/gfs2/ops_fstype.c b/fs/gfs2/ops_fstype.c
index 19ff5e8..d16a6e6 100644
--- a/fs/gfs2/ops_fstype.c
+++ b/fs/gfs2/ops_fstype.c
@@ -21,6 +21,7 @@
#include <linux/quotaops.h>
#include <linux/lockdep.h>
#include <linux/module.h>
+#include <linux/nfsd/nfs4pnfsdlm.h>
#include "gfs2.h"
#include "incore.h"
@@ -1154,6 +1155,9 @@ static int fill_super(struct super_block *sb, struct gfs2_args *args, int silent
sb->s_op = &gfs2_super_ops;
sb->s_d_op = &gfs2_dops;
sb->s_export_op = &gfs2_export_ops;
+#if defined(CONFIG_PNFSD)
+ sb->s_pnfs_op = &pnfs_dlm_export_ops;
+#endif /* CONFIG_PNFSD */
sb->s_xattr = gfs2_xattr_handlers;
sb->s_qcop = &gfs2_quotactl_ops;
sb_dqopt(sb)->flags |= DQUOT_QUOTA_SYS_FILE;
--
1.8.3.1
From: Benny Halevy <[email protected]>
Signed-off-by: Benny Halevy <[email protected]>
[pnfsd: alloc_sid should kmalloc a object not a pointer]
Signed-off-by: Bian Naimeng <[email protected]>
Signed-off-by: Benny Halevy <[email protected]>
Signed-off-by: Benny Halevy <[email protected]>
---
fs/nfsd/nfs4pnfsd.c | 120 ++++++++++++++++++++++++++++++++++++++++++++++++++++
fs/nfsd/pnfsd.h | 2 +
2 files changed, 122 insertions(+)
diff --git a/fs/nfsd/nfs4pnfsd.c b/fs/nfsd/nfs4pnfsd.c
index cb28207..9a7cbc9 100644
--- a/fs/nfsd/nfs4pnfsd.c
+++ b/fs/nfsd/nfs4pnfsd.c
@@ -25,3 +25,123 @@
#define NFSDDBG_FACILITY NFSDDBG_PNFS
+static DEFINE_SPINLOCK(layout_lock);
+
+/* hash table for nfsd4_pnfs_deviceid.sbid */
+#define SBID_HASH_BITS 8
+#define SBID_HASH_SIZE (1 << SBID_HASH_BITS)
+#define SBID_HASH_MASK (SBID_HASH_SIZE - 1)
+
+struct sbid_tracker {
+ u64 id;
+ struct super_block *sb;
+ struct list_head hash;
+};
+
+static u64 current_sbid;
+static struct list_head sbid_hashtbl[SBID_HASH_SIZE];
+
+static unsigned long
+sbid_hashval(struct super_block *sb)
+{
+ return hash_ptr(sb, SBID_HASH_BITS);
+}
+
+static struct sbid_tracker *
+alloc_sbid(void)
+{
+ return kmalloc(sizeof(struct sbid_tracker), GFP_KERNEL);
+}
+
+static void
+destroy_sbid(struct sbid_tracker *sbid)
+{
+ spin_lock(&layout_lock);
+ list_del(&sbid->hash);
+ spin_unlock(&layout_lock);
+ kfree(sbid);
+}
+
+void
+nfsd4_free_pnfs_slabs(void)
+{
+ int i;
+ struct sbid_tracker *sbid;
+
+ for (i = 0; i < SBID_HASH_SIZE; i++) {
+ while (!list_empty(&sbid_hashtbl[i])) {
+ sbid = list_first_entry(&sbid_hashtbl[i],
+ struct sbid_tracker,
+ hash);
+ destroy_sbid(sbid);
+ }
+ }
+}
+
+int
+nfsd4_init_pnfs_slabs(void)
+{
+ int i;
+
+ for (i = 0; i < SBID_HASH_SIZE; i++)
+ INIT_LIST_HEAD(&sbid_hashtbl[i]);
+
+ return 0;
+}
+
+static u64
+alloc_init_sbid(struct super_block *sb)
+{
+ struct sbid_tracker *sbid;
+ struct sbid_tracker *new = alloc_sbid();
+ unsigned long hash_idx = sbid_hashval(sb);
+ u64 id = 0;
+
+ if (likely(new)) {
+ spin_lock(&layout_lock);
+ id = ++current_sbid;
+ new->id = (id << SBID_HASH_BITS) | (hash_idx & SBID_HASH_MASK);
+ id = new->id;
+ BUG_ON(id == 0);
+ new->sb = sb;
+
+ list_for_each_entry (sbid, &sbid_hashtbl[hash_idx], hash)
+ if (sbid->sb == sb) {
+ kfree(new);
+ id = sbid->id;
+ spin_unlock(&layout_lock);
+ return id;
+ }
+ list_add(&new->hash, &sbid_hashtbl[hash_idx]);
+ spin_unlock(&layout_lock);
+ }
+ return id;
+}
+
+static u64
+find_create_sbid(struct super_block *sb)
+{
+ struct sbid_tracker *sbid;
+ unsigned long hash_idx = sbid_hashval(sb);
+ int pos = 0;
+ u64 id = 0;
+
+ spin_lock(&layout_lock);
+ list_for_each_entry (sbid, &sbid_hashtbl[hash_idx], hash) {
+ pos++;
+ if (sbid->sb != sb)
+ continue;
+ if (pos > 1) {
+ list_del(&sbid->hash);
+ list_add(&sbid->hash, &sbid_hashtbl[hash_idx]);
+ }
+ id = sbid->id;
+ break;
+ }
+ spin_unlock(&layout_lock);
+
+ if (!id)
+ id = alloc_init_sbid(sb);
+
+ return id;
+}
diff --git a/fs/nfsd/pnfsd.h b/fs/nfsd/pnfsd.h
index 7c46791..29ea2e7 100644
--- a/fs/nfsd/pnfsd.h
+++ b/fs/nfsd/pnfsd.h
@@ -36,4 +36,6 @@
#include <linux/nfsd/nfsd4_pnfs.h>
+#include "xdr4.h"
+
#endif /* LINUX_NFSD_PNFSD_H */
--
1.8.3.1
From: Benny Halevy <[email protected]>
Include headers in nfs_xdr.h required for
struct rpc_task, nfs4_verifier, nfs4_stateid
Cc: Trond Myklebust <[email protected]>
Signed-off-by: Benny Halevy <[email protected]>
[add more includes needed since v3.4-rc]
Signed-off-by: Benny Halevy <[email protected]>
---
include/linux/nfs_xdr.h | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h
index 01fd84b..93cbda7 100644
--- a/include/linux/nfs_xdr.h
+++ b/include/linux/nfs_xdr.h
@@ -3,6 +3,10 @@
#include <linux/nfsacl.h>
#include <linux/sunrpc/gss_api.h>
+#include <linux/nfs.h>
+#include <linux/nfs3.h>
+#include <linux/nfs4.h>
+#include <linux/sunrpc/sched.h>
/*
* To change the maximum rsize and wsize supported by the NFS client, adjust
--
1.8.3.1
Make it symmetric to alloc_stid
Signed-off-by: Benny Halevy <[email protected]>
---
fs/nfsd/nfs4state.c | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index 214e42d..099976e 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -411,11 +411,16 @@ static void remove_stid(struct nfs4_stid *s)
idr_remove(stateids, s->sc_stateid.si_opaque.so_id);
}
+static void free_stid(struct kmem_cache *slab, struct nfs4_stid *s)
+{
+ kmem_cache_free(slab, s);
+}
+
void
nfs4_put_delegation(struct nfs4_delegation *dp)
{
if (atomic_dec_and_test(&dp->dl_count)) {
- kmem_cache_free(deleg_slab, dp);
+ free_stid(deleg_slab, &dp->dl_stid);
num_delegations--;
}
}
@@ -619,7 +624,7 @@ static void close_generic_stateid(struct nfs4_ol_stateid *stp)
static void free_generic_stateid(struct nfs4_ol_stateid *stp)
{
remove_stid(&stp->st_stid);
- kmem_cache_free(stateid_slab, stp);
+ free_stid(stateid_slab, &stp->st_stid);
}
static void release_lock_stateid(struct nfs4_ol_stateid *stp)
--
1.8.3.1
On Fri, Sep 27, 2013 at 12:37 PM, Boaz Harrosh <[email protected]> wrote:
> On 09/27/2013 09:34 AM, Benny Halevy wrote:
>>> I thought that we said that exofs server is going in first. What happened?
>>
>> exofs requires much more functionality.
>> To help review the code we need to go through this milestone in any case.
>>
>
> That is not true. Look at the way I staged the pnfsd-exofs patches. after
> the LO_GET LO_COMMIT and LO_RETURN patches you have a full functioning
> git cloning exofs. (BTW exofs does not need DEVICELIST)
>
> So OK your patches do not have LO_COMMIT but this code path is trivial
> and what is that contraption of returning "no-layout" for writes and
> then not having the LO_COMMIT support. This is plain hacky and not
> in accord to the pNFS philosophy of things.
>
> And We can farther split my original set to do read-only with out LO_COMMIT
> and add a simple LO_COMMIT stage with enable of write LAYOUTs, easily.
> Which is what you have with much less code.
>
> The recall comes in at a different patch that can be staged later and is
> effectively not needed for normal operations.
>
> Actually the all code including the exofs patches first stage is smaller and
> simpler then the DLM contraption. And it only touches exofs code which
> does not involve other sensitive subsystems.
>
> I have a deja vu about this. Why won't you talk to me before working on such
> DLM crap that is not at all pnfs, but a hack that demonstrates nothing?
>
> Please do the right thing, since you are already putting all this effort. And I can
> help as well with the pnfsd-exofs patches part.
>
> BTW: Thank you for doing this, it is about time someone should put some mainline love
> to the pNFS server
>
> Thanks
> Boaz
Boaz, sorry but the files layout went first to production on the
client side in all major
enterprise distributions so it doesn't make sense to submit exofs first.
As for your patch series, I respect the work you did on it but
a. as you said it is your patch series, not mine
b. the forward port from 3.10 on changed the layout state handling
radically (for the better I hope :)
solving numerous correctness issues.
The motivation behind the dlm based implementation is to have a
minimal useful pnfs implementation
that folks can use and test the client against.
On this basis, writes layout can be added, and further on, exofs
support can submitted as the next stage.
Benny
On 2013-09-29 14:42, Christoph Hellwig wrote:
> On Thu, Sep 26, 2013 at 02:36:16PM -0400, Benny Halevy wrote:
>> The following patchset implements an extension to nfsd
>> providing a complete minimal pnfs server exporting
>> DLM-based clustered file systems such as GFS2 or OCFS2.
>
> Does this actually buy us anything by now? Last time I saw numbers for
> this implementation it was slower than an active/passive setup over
> those filesystem due to the way their cluster locking works.
>
Was this for write sharing or read only?
As far as I understand, the share locks do not cause arbitration
and they do provide bandwidth scalability via multiple nodes.
That said, it is worth measuring.
Benny
On 09/29/2013 05:16 AM, Christoph Hellwig wrote:
> The actual XDR encoding doesn't have business being under fs/exportfs
> and should be in the NFSD code itself.
>
This is so FSs will not depend on NFSD. Though the actual implementtation
could be done via a vector that gets set at NFSD load.
Though I would like to keep it here, because I have a patchset
which implements pNFS without NFSD at all. It enables a set
of IOCTLs or syscalls and uses the same exact FS interface
introduced here but sends the info to a user mode server.
Though again it can be just its own library without regard
to exportfs at all. linked in by any user code that needs
it.
Cheers
Boaz
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
On 2013-09-29 14:43, Christoph Hellwig wrote:
> Empty header are pretty useless.
Right. This was useful early on while we re-ordered the patches that followed.
No need for that now so I'll squash this patch accordingly.
> Also why would you want a header
> outside fs/nfsd/ ?
This header contains the file system interface.
Benny
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
From: Andy Adamson <[email protected]>
In a DLM cluster, writing to a node other than the node where the open call
occurred (where meta data is cached) will have performance implications when
the write causes meta data changes that need to be propagated to the open call
node.
DlM clusters support only LAYOUTIOMODE4_READ layouts. Writes will go through
the MDS.
Return NFS4ERR_BADIOMODE for LAYOUTGET requests with LAYOUTIOMODE4_RW iomode.
Signed-off-by: Andy Adamson <[email protected]>
[pnfsd: fixup DLM layout_get return type to u32]
Signed-off-by: Benny Halevy <[email protected]>
Signed-off-by: Benny Halevy <[email protected]>
---
fs/nfsd/nfs4pnfsdlm.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/fs/nfsd/nfs4pnfsdlm.c b/fs/nfsd/nfs4pnfsdlm.c
index 7ed8156..acc1f91 100644
--- a/fs/nfsd/nfs4pnfsdlm.c
+++ b/fs/nfsd/nfs4pnfsdlm.c
@@ -390,6 +390,10 @@ static enum nfsstat4 nfsd4_pnfs_dlm_layoutget(struct inode *inode,
dprintk("%s: LAYOUT_GET\n", __func__);
+ /* DLM exported file systems only support layouts for READ */
+ if (res->lg_seg.iomode == IOMODE_RW)
+ return NFS4ERR_BADIOMODE;
+
index = dlm_ino_hash(inode);
dprintk("%s first stripe index %d i_ino %lu\n", __func__, index,
inode->i_ino);
--
1.8.3.1
From: Benny Halevy <[email protected]>
[pnfsd: define and use FSID_MAX in enum nfsd_fsid]
Signed-off-by: Benny Halevy <[email protected]>
[pnfsd: add helper functions for identifying DS stateids.]
Signed-off-by: David M. Richter <[email protected]>
[define a no-op version of pnfs_fh_is_ds for !CONFIG_PNFSD]
Signed-off-by: Benny Halevy <[email protected]>
Signed-off-by: Benny Halevy <[email protected]>
---
fs/nfsd/nfsfh.c | 7 +++++--
fs/nfsd/nfsfh.h | 39 +++++++++++++++++++++++++++++++++++++++
2 files changed, 44 insertions(+), 2 deletions(-)
diff --git a/fs/nfsd/nfsfh.c b/fs/nfsd/nfsfh.c
index 814afaa..71f9470 100644
--- a/fs/nfsd/nfsfh.c
+++ b/fs/nfsd/nfsfh.c
@@ -10,6 +10,7 @@
#include <linux/exportfs.h>
#include <linux/sunrpc/svcauth_gss.h>
+#include <linux/nfsd/nfsd4_pnfs.h>
#include "nfsd.h"
#include "vfs.h"
#include "auth.h"
@@ -136,6 +137,7 @@ static inline __be32 check_pseudo_root(struct svc_rqst *rqstp,
static __be32 nfsd_set_fh_dentry(struct svc_rqst *rqstp, struct svc_fh *fhp)
{
struct knfsd_fh *fh = &fhp->fh_handle;
+ int fsid_type;
struct fid *fid = NULL, sfid;
struct svc_export *exp;
struct dentry *dentry;
@@ -156,7 +158,8 @@ static __be32 nfsd_set_fh_dentry(struct svc_rqst *rqstp, struct svc_fh *fhp)
return error;
if (fh->fh_auth_type != 0)
return error;
- len = key_len(fh->fh_fsid_type) / 4;
+ fsid_type = pnfs_fh_fsid_type(fh);
+ len = key_len(fsid_type) / 4;
if (len == 0)
return error;
if (fh->fh_fsid_type == FSID_MAJOR_MINOR) {
@@ -169,7 +172,7 @@ static __be32 nfsd_set_fh_dentry(struct svc_rqst *rqstp, struct svc_fh *fhp)
data_left -= len;
if (data_left < 0)
return error;
- exp = rqst_exp_find(rqstp, fh->fh_fsid_type, fh->fh_auth);
+ exp = rqst_exp_find(rqstp, fsid_type, fh->fh_auth);
fid = (struct fid *)(fh->fh_auth + len);
} else {
__u32 tfh[2];
diff --git a/fs/nfsd/nfsfh.h b/fs/nfsd/nfsfh.h
index e5e6707..2563e88 100644
--- a/fs/nfsd/nfsfh.h
+++ b/fs/nfsd/nfsfh.h
@@ -14,6 +14,7 @@ enum nfsd_fsid {
FSID_UUID8,
FSID_UUID16,
FSID_UUID16_INUM,
+ FSID_MAX
};
enum fsid_source {
@@ -203,4 +204,42 @@ static inline int key_len(int type)
}
}
+#if defined(CONFIG_PNFSD)
+
+/*
+ * fh_fsid_type is overloaded to indicate whether a filehandle was one supplied
+ * to a DS by LAYOUTGET. nfs4_preprocess_stateid_op() uses this to decide how
+ * to handle a given stateid.
+ */
+static inline int pnfs_fh_is_ds(struct knfsd_fh *fh)
+{
+ return fh->fh_fsid_type >= FSID_MAX;
+}
+
+static inline void pnfs_fh_mark_ds(struct knfsd_fh *fh)
+{
+ BUG_ON(fh->fh_version != 1);
+ BUG_ON(pnfs_fh_is_ds(fh));
+ fh->fh_fsid_type += FSID_MAX;
+}
+
+#else /* CONFIG_PNFSD */
+
+static inline int pnfs_fh_is_ds(struct knfsd_fh *fh)
+{
+ return 0;
+}
+
+#endif /* CONFIG_PNFSD */
+
+/* allows fh_verify() to check the real fsid_type (i.e., not overloaded). */
+static inline int pnfs_fh_fsid_type(struct knfsd_fh *fh)
+{
+ int fsid_type = fh->fh_fsid_type;
+
+ if (pnfs_fh_is_ds(fh))
+ return fsid_type - FSID_MAX;
+ return fsid_type;
+}
+
#endif /* _LINUX_NFSD_FH_INT_H */
--
1.8.3.1
From: Andy Adamson <[email protected]>
Until a stateid protocol is implemented, remove all checking on
file layout data server stateids.
These are idetified by the current fh type.
Signed-off-by: Andy Adamson <[email protected]>
[remove #ifdef around pnfs_fh_is_ds]
Signed-off-by: Benny Halevy <[email protected]>
Signed-off-by: Benny Halevy <[email protected]>
---
fs/nfsd/nfs4state.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index 2c973e6..b80807c 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -3648,6 +3648,9 @@ static __be32 nfsd4_lookup_stateid(stateid_t *stateid, unsigned char typemask,
if (grace_disallows_io(net, ino))
return nfserr_grace;
+ if (pnfs_fh_is_ds(¤t_fh->fh_handle))
+ return 0;
+
if (ZERO_STATEID(stateid) || ONE_STATEID(stateid))
return check_special_stateids(net, current_fh, stateid, flags);
--
1.8.3.1
On 2013-09-27 17:36, J. Bruce Fields wrote:
> Is this really necessary? What would we lose if we just used pnfs
> automatically when the filesystem supports it and all other necessary
> configuration is in place?
We can do that and let the client decide whether to use pnfs or not.
Though I think that to deal with client interoperability issues, one
would want to control that. If we don't provide a way to enable/disable
pnfs at the export level every file system that supports pnfs would probably
need a mount option to control pnfs exportability...
Benny
>
> On Thu, Sep 26, 2013 at 02:40:19PM -0400, Benny Halevy wrote:
>> From: Andy Adamson <[email protected]>
>>
>> This is a boolean for now. When more layouttypes are supported, this can
>> change to "pnfs=", similar to "sec=".
>>
>> The ctl interface is not enhanced.
>>
>> Note: Export option strings are not guaranteed to be present in every call to
>> svc_export_parse. For example, nfs-utils-1.1.2 exportfs validates the export
>> with a test call that does not include the 'pnfs' export option even though
>> it is set in /etc/exports.
>>
>> nfsd4_layout_verify() checks if ex_pnfs is set so the ex_pnfs check in
>> check_export is not needed.
>>
>> Furthermore,the pnfs_export_operations super block pointer should not be
>> changed because a) it is a const and b) the exports options can be changed
>> while the file system is mounted.
>>
>> Remove the ex_pnfs check from check_export to prevent the pnfs_export_operations
>> superblock pointer from being set to NULL.
>
> This patch doesn't touch check_export. Is this describing a change from
> a previous version of the patch? If so, either drop this comment or
> write it in a way that will make sense to someone who hasn't seen the
> previous version.
>
> --b.
>
>
>>
>> Signed-off-by: Andy Adamson <[email protected]>
>> [pnfsd: fix cosmetic checkpatch warnings]
>> [pnfsd: test pnfs export option in check_export]
>> Signed-off-by: Benny Halevy <[email protected]>
>> [pnfsd: fix ex_pnfs check_export bug]
>> Signed-off-by: Andy Adamson <[email protected]>
>> Signed-off-by: Benny Halevy <[email protected]>
>> Signed-off-by: Benny Halevy <[email protected]>
>> ---
>> fs/nfsd/export.c | 6 ++++++
>> include/linux/nfsd/export.h | 1 +
>> 2 files changed, 7 insertions(+)
>>
>> diff --git a/fs/nfsd/export.c b/fs/nfsd/export.c
>> index f26b0b9..7730dfd 100644
>> --- a/fs/nfsd/export.c
>> +++ b/fs/nfsd/export.c
>> @@ -567,6 +567,8 @@ static int svc_export_parse(struct cache_detail *cd, char *mesg, int mlen)
>> if (exp.ex_uuid == NULL)
>> err = -ENOMEM;
>> }
>> + } else if (strcmp(buf, "pnfs") == 0) {
>> + exp.ex_pnfs = 1;
>> } else if (strcmp(buf, "secinfo") == 0)
>> err = secinfo_parse(&mesg, buf, &exp);
>> else
>> @@ -639,6 +641,8 @@ static int svc_export_show(struct seq_file *m,
>> seq_printf(m, "%02x", exp->ex_uuid[i]);
>> }
>> }
>> + if (exp->ex_pnfs)
>> + seq_puts(m, ",pnfs");
>> show_secinfo(m, exp);
>> }
>> seq_puts(m, ")\n");
>> @@ -666,6 +670,7 @@ static void svc_export_init(struct cache_head *cnew, struct cache_head *citem)
>> new->ex_fslocs.locations_count = 0;
>> new->ex_fslocs.migrated = 0;
>> new->ex_uuid = NULL;
>> + new->ex_pnfs = 0;
>> new->cd = item->cd;
>> }
>>
>> @@ -679,6 +684,7 @@ static void export_update(struct cache_head *cnew, struct cache_head *citem)
>> new->ex_anon_uid = item->ex_anon_uid;
>> new->ex_anon_gid = item->ex_anon_gid;
>> new->ex_fsid = item->ex_fsid;
>> + new->ex_pnfs = item->ex_pnfs;
>> new->ex_uuid = item->ex_uuid;
>> item->ex_uuid = NULL;
>> new->ex_fslocs.locations = item->ex_fslocs.locations;
>> diff --git a/include/linux/nfsd/export.h b/include/linux/nfsd/export.h
>> index 7898c99..b03ceee 100644
>> --- a/include/linux/nfsd/export.h
>> +++ b/include/linux/nfsd/export.h
>> @@ -52,6 +52,7 @@ struct svc_export {
>> kuid_t ex_anon_uid;
>> kgid_t ex_anon_gid;
>> int ex_fsid;
>> + int ex_pnfs;
>> unsigned char * ex_uuid; /* 16 byte fsid */
>> struct nfsd4_fs_locations ex_fslocs;
>> int ex_nflavors;
>> --
>> 1.8.3.1
>>
From: Andy Adamson <[email protected]>
This is a boolean for now. When more layouttypes are supported, this can
change to "pnfs=", similar to "sec=".
The ctl interface is not enhanced.
Note: Export option strings are not guaranteed to be present in every call to
svc_export_parse. For example, nfs-utils-1.1.2 exportfs validates the export
with a test call that does not include the 'pnfs' export option even though
it is set in /etc/exports.
nfsd4_layout_verify() checks if ex_pnfs is set so the ex_pnfs check in
check_export is not needed.
Furthermore,the pnfs_export_operations super block pointer should not be
changed because a) it is a const and b) the exports options can be changed
while the file system is mounted.
Remove the ex_pnfs check from check_export to prevent the pnfs_export_operations
superblock pointer from being set to NULL.
Signed-off-by: Andy Adamson <[email protected]>
[pnfsd: fix cosmetic checkpatch warnings]
[pnfsd: test pnfs export option in check_export]
Signed-off-by: Benny Halevy <[email protected]>
[pnfsd: fix ex_pnfs check_export bug]
Signed-off-by: Andy Adamson <[email protected]>
Signed-off-by: Benny Halevy <[email protected]>
Signed-off-by: Benny Halevy <[email protected]>
---
fs/nfsd/export.c | 6 ++++++
include/linux/nfsd/export.h | 1 +
2 files changed, 7 insertions(+)
diff --git a/fs/nfsd/export.c b/fs/nfsd/export.c
index f26b0b9..7730dfd 100644
--- a/fs/nfsd/export.c
+++ b/fs/nfsd/export.c
@@ -567,6 +567,8 @@ static int svc_export_parse(struct cache_detail *cd, char *mesg, int mlen)
if (exp.ex_uuid == NULL)
err = -ENOMEM;
}
+ } else if (strcmp(buf, "pnfs") == 0) {
+ exp.ex_pnfs = 1;
} else if (strcmp(buf, "secinfo") == 0)
err = secinfo_parse(&mesg, buf, &exp);
else
@@ -639,6 +641,8 @@ static int svc_export_show(struct seq_file *m,
seq_printf(m, "%02x", exp->ex_uuid[i]);
}
}
+ if (exp->ex_pnfs)
+ seq_puts(m, ",pnfs");
show_secinfo(m, exp);
}
seq_puts(m, ")\n");
@@ -666,6 +670,7 @@ static void svc_export_init(struct cache_head *cnew, struct cache_head *citem)
new->ex_fslocs.locations_count = 0;
new->ex_fslocs.migrated = 0;
new->ex_uuid = NULL;
+ new->ex_pnfs = 0;
new->cd = item->cd;
}
@@ -679,6 +684,7 @@ static void export_update(struct cache_head *cnew, struct cache_head *citem)
new->ex_anon_uid = item->ex_anon_uid;
new->ex_anon_gid = item->ex_anon_gid;
new->ex_fsid = item->ex_fsid;
+ new->ex_pnfs = item->ex_pnfs;
new->ex_uuid = item->ex_uuid;
item->ex_uuid = NULL;
new->ex_fslocs.locations = item->ex_fslocs.locations;
diff --git a/include/linux/nfsd/export.h b/include/linux/nfsd/export.h
index 7898c99..b03ceee 100644
--- a/include/linux/nfsd/export.h
+++ b/include/linux/nfsd/export.h
@@ -52,6 +52,7 @@ struct svc_export {
kuid_t ex_anon_uid;
kgid_t ex_anon_gid;
int ex_fsid;
+ int ex_pnfs;
unsigned char * ex_uuid; /* 16 byte fsid */
struct nfsd4_fs_locations ex_fslocs;
int ex_nflavors;
--
1.8.3.1
From: Andy Adamson <[email protected]>
Export nfsd4_pnfs_dlm_layoutget for dlm cluster file system use.
Use the number of data servers as a hash mask and hash inode i_ino
to choose the layout's first_stripe_index.
Always give out whole file layouts.
Always give out IOMODE_READ layouts. DLM locking semantics want to stripe
only READs with WRITEs going through the MDS.
[was pnfsd: hardwire DLM file layout layoutget]
[was pnfs-gfs2: initial LAYOUT* work for pNFS/GFS2 integration]
Frank Filz's work on the layout_type() and layout_get() export operations,
with stubs for layout_commit() and layout_return(). Tested at Connectathon.
Signed-off-by: Frank Filz <[email protected]>
Signed-off-by: David M. Richter <[email protected]>
[pnfs-gfs2: convert to using new pnfs export api]
Signed-off-by: Benny Halevy <[email protected]>
[pnfsd: gfs2 layout_type interface]
Signed-off-by: Marc Eshel <[email protected]>
[Since GFS2 only uses a stripe of one, changed lg_commit_through_mds from
true to false.]
[pnfsd: move and rename nfsd4_pnfs_fl_layoutget]
Signed-off-by: Andy Adamson <[email protected]>
[pnfsd: get rid of layout encoding function vector]
Signed-off-by: Benny Halevy <[email protected]>
Acked-by: Steven Whitehouse <[email protected]>
[pnfsd: rename deviceid_t struct pnfs_deviceid]
[pnfsd: clean up layoutget export API]
[add requried headers]
[pnfsd: rename device fsid member to sbid]
[pnfsd: fixup DLM layout_get return type to u32]
[pnfsd: DLM file layout return only nfs errors on layout_get]
[pnfsd: files layout: change layout_get return type to enum nfsstat4]
Signed-off-by: Benny Halevy <[email protected]>
[pnfsd: fix DLM file layout no device return]
Signed-off-by: Andy Adamson <[email protected]>
[pnfsd: dlm: fixup LAYOUT_NFSV4_1_FILES]
Signed-off-by: Benny Halevy <[email protected]>
Signed-off-by: Benny Halevy <[email protected]>
---
fs/nfsd/nfs4pnfsdlm.c | 92 +++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 92 insertions(+)
diff --git a/fs/nfsd/nfs4pnfsdlm.c b/fs/nfsd/nfs4pnfsdlm.c
index 4c2ab87..7ed8156 100644
--- a/fs/nfsd/nfs4pnfsdlm.c
+++ b/fs/nfsd/nfs4pnfsdlm.c
@@ -28,6 +28,7 @@
#include <linux/nfsd/nfs4layoutxdr.h>
#include <linux/sunrpc/addr.h>
+#include "nfsfh.h"
#include "nfsd.h"
#define NFSDDBG_FACILITY NFSDDBG_FILELAYOUT
@@ -351,9 +352,100 @@ static int nfsd4_pnfs_dlm_getdevinfo(struct super_block *sb,
return err;
}
+static int get_stripe_unit(int blocksize)
+{
+ if (blocksize >= NFSSVC_MAXBLKSIZE)
+ return blocksize;
+ return NFSSVC_MAXBLKSIZE - (NFSSVC_MAXBLKSIZE % blocksize);
+}
+
+/*
+ * Look up inode block device in pnfs_dlm_device list.
+ * Hash on the inode->i_ino and number of data servers.
+ */
+static int dlm_ino_hash(struct inode *ino)
+{
+ struct dlm_device_entry *de;
+ u32 hash_mask = 0;
+
+ /* If can't find the inode block device in the pnfs_dlm_deivce list
+ * then don't hand out a layout
+ */
+ de = nfsd4_find_pnfs_dlm_device(ino->i_sb);
+ if (!de)
+ return -1;
+ hash_mask = de->num_ds - 1;
+ return ino->i_ino & hash_mask;
+}
+
+static enum nfsstat4 nfsd4_pnfs_dlm_layoutget(struct inode *inode,
+ struct exp_xdr_stream *xdr,
+ const struct nfsd4_pnfs_layoutget_arg *args,
+ struct nfsd4_pnfs_layoutget_res *res)
+{
+ struct pnfs_filelayout_layout *layout = NULL;
+ struct knfsd_fh *fhp = NULL;
+ int index;
+ enum nfsstat4 rc = NFS4_OK;
+
+ dprintk("%s: LAYOUT_GET\n", __func__);
+
+ index = dlm_ino_hash(inode);
+ dprintk("%s first stripe index %d i_ino %lu\n", __func__, index,
+ inode->i_ino);
+ if (index < 0)
+ return NFS4ERR_LAYOUTUNAVAILABLE;
+
+ res->lg_seg.layout_type = LAYOUT_NFSV4_1_FILES;
+ /* Always give out whole file layouts */
+ res->lg_seg.offset = 0;
+ res->lg_seg.length = NFS4_MAX_UINT64;
+ /* Always give out READ ONLY layouts */
+ res->lg_seg.iomode = IOMODE_READ;
+
+ layout = kzalloc(sizeof(*layout), GFP_KERNEL);
+ if (layout == NULL) {
+ rc = NFS4ERR_LAYOUTTRYLATER;
+ goto error;
+ }
+
+ /* Set file layout response args */
+ layout->lg_layout_type = LAYOUT_NFSV4_1_FILES;
+ layout->lg_stripe_type = STRIPE_SPARSE;
+ layout->lg_commit_through_mds = false;
+ layout->lg_stripe_unit = get_stripe_unit(inode->i_sb->s_blocksize);
+ layout->lg_fh_length = 1;
+ layout->device_id.sbid = args->lg_sbid;
+ layout->device_id.devid = 1; /*FSFTEMP*/
+ layout->lg_first_stripe_index = index; /*FSFTEMP*/
+ layout->lg_pattern_offset = 0;
+
+ fhp = kmalloc(sizeof(*fhp), GFP_KERNEL);
+ if (fhp == NULL) {
+ rc = NFS4ERR_LAYOUTTRYLATER;
+ goto error;
+ }
+
+ memcpy(fhp, args->lg_fh, sizeof(*fhp));
+ pnfs_fh_mark_ds(fhp);
+ layout->lg_fh_list = fhp;
+
+ /* Call nfsd to encode layout */
+ rc = filelayout_encode_layout(xdr, layout);
+exit:
+ kfree(layout);
+ kfree(fhp);
+ return rc;
+
+error:
+ res->lg_seg.length = 0;
+ goto exit;
+}
+
/* For use by DLM cluster file systems exported by pNFSD */
const struct pnfs_export_operations pnfs_dlm_export_ops = {
.get_device_info = nfsd4_pnfs_dlm_getdevinfo,
.get_device_iter = nfsd4_pnfs_dlm_getdeviter,
+ .layout_get = nfsd4_pnfs_dlm_layoutget,
};
EXPORT_SYMBOL(pnfs_dlm_export_ops);
--
1.8.3.1
From: Benny Halevy <[email protected]>
[extracted from: pnfsd: Initial pNFS server implementation.]
Signed-off-by: Benny Halevy <[email protected]>
[pnfsd: update pNFS server ops to draft 13]
Signed-off-by: Marc Eshel <[email protected]>
[pnfsd: Check for dense layout in layout encode.]
Signed-off-by: Dean Hildebrand <[email protected]>
[pnfsd: Fix server GETDEVICELIST to comply with NFSv4.1 Draft 13]
Signed-off-by: Ricardo Labiaga <[email protected]>
[pnfsd: Fix file layout layoutget export op for d13]
[pnfsd: Simplify layout get export interface.]
Signed-off-by: Dean Hildebrand <[email protected]>
[pnfsd: improve nfs4_pnfs_get_layout dprintks]
Signed-off-by: Benny Halevy <[email protected]>
[pnfsd: initialize layoutget return_on_close]
Signed-off-by: Andy Adamson <[email protected]>
[pnfsd: Use 128 bit deviceid on server]
[pnfsd: update server layout xdr for draft 19.]
Signed-off-by: Dean Hildebrand <[email protected]>
[pnfsd: filelayout: use nfsd4_compoundres pointer in pnfs_xdr_info]
Signed-off-by: Andy Adamson <[email protected]>
[pnfsd: filelayout: get rid of xdr encoding macros for file layout xdr]
[pnfsd: get rid of layout encoding function vector]
[pnfsd: filelayout: strictly define filelayout_encode_layout]
[pnfsd: filelayout: convert to using exp_xdr]
[include nfsd4_pnfs.h from nfs4layoutxdr.h for deviceid_t]
[pnfsd: rename deviceid_t struct pnfs_deviceid]
[pnfsd: fix cosmetic checkpatch warnings]
[pnfsd: rename device fsid member to sbid]
[pnfsd: fixup filelayout_encode_layout return type to u32]
Signed-off-by: Benny Halevy <[email protected]>
[pnfsd: nfsd4_pnfs_dlm_layoutget]
Signed-off-by: Andy Adamson <[email protected]>
Signed-off-by: Benny Halevy <[email protected]>
---
fs/exportfs/nfs4filelayoutxdr.c | 85 ++++++++++++++++++++++++++++++++++++++
include/linux/exportfs.h | 4 +-
include/linux/nfsd/nfs4layoutxdr.h | 18 ++++++++
3 files changed, 105 insertions(+), 2 deletions(-)
diff --git a/fs/exportfs/nfs4filelayoutxdr.c b/fs/exportfs/nfs4filelayoutxdr.c
index 4801bfe..f63c311 100644
--- a/fs/exportfs/nfs4filelayoutxdr.c
+++ b/fs/exportfs/nfs4filelayoutxdr.c
@@ -31,6 +31,8 @@
*/
#include <linux/exp_xdr.h>
#include <linux/module.h>
+#include <linux/nfs4.h>
+#include <linux/nfsd/nfsfh.h>
#include <linux/nfsd/nfs4layoutxdr.h>
/* We do our-own dprintk so filesystems are not dependent on sunrpc */
@@ -131,3 +133,86 @@ static int fl_devinfo_xdr_words(const struct pnfs_filelayout_device *fdev)
return error;
}
EXPORT_SYMBOL(filelayout_encode_devinfo);
+
+/* Encodes the loc_body structure from draft 13
+ * on the response stream.
+ * Use linux error codes (not nfs) since these values are being
+ * returned to the file system.
+ */
+enum nfsstat4
+filelayout_encode_layout(struct exp_xdr_stream *xdr,
+ const struct pnfs_filelayout_layout *flp)
+{
+ u32 len = 0, nfl_util, fhlen, i;
+ u32 *layoutlen_p;
+ enum nfsstat4 nfserr;
+ __be32 *p;
+
+ dprintk("%s: device_id %llx:%llx fsi %u, numfh %u\n",
+ __func__,
+ flp->device_id.pnfs_fsid,
+ flp->device_id.pnfs_devid,
+ flp->lg_first_stripe_index,
+ flp->lg_fh_length);
+
+ /* Ensure file system added at least one file handle */
+ if (flp->lg_fh_length <= 0) {
+ dprintk("%s: File Layout has no file handles!!\n", __func__);
+ nfserr = NFS4ERR_LAYOUTUNAVAILABLE;
+ goto out;
+ }
+
+ /* Ensure room for len, devid, util, first_stripe_index,
+ * pattern_offset, number of filehandles */
+ p = layoutlen_p = exp_xdr_reserve_qwords(xdr, 1+2+2+1+1+2+1);
+ if (!p) {
+ nfserr = NFS4ERR_TOOSMALL;
+ goto out;
+ }
+
+ /* save spot for opaque file layout length, fill-in later*/
+ p++;
+
+ /* encode device id */
+ p = exp_xdr_encode_u64(p, flp->device_id.sbid);
+ p = exp_xdr_encode_u64(p, flp->device_id.devid);
+
+ /* set and encode flags */
+ nfl_util = flp->lg_stripe_unit;
+ if (flp->lg_commit_through_mds)
+ nfl_util |= NFL4_UFLG_COMMIT_THRU_MDS;
+ if (flp->lg_stripe_type == STRIPE_DENSE)
+ nfl_util |= NFL4_UFLG_DENSE;
+ p = exp_xdr_encode_u32(p, nfl_util);
+
+ /* encode first stripe index */
+ p = exp_xdr_encode_u32(p, flp->lg_first_stripe_index);
+
+ /* encode striping pattern start */
+ p = exp_xdr_encode_u64(p, flp->lg_pattern_offset);
+
+ /* encode number of file handles */
+ p = exp_xdr_encode_u32(p, flp->lg_fh_length);
+
+ /* encode file handles */
+ for (i = 0; i < flp->lg_fh_length; i++) {
+ fhlen = flp->lg_fh_list[i].fh_size;
+ p = exp_xdr_reserve_space(xdr, 4 + fhlen);
+ if (!p) {
+ nfserr = NFS4ERR_TOOSMALL;
+ goto out;
+ }
+ p = exp_xdr_encode_opaque(p, &flp->lg_fh_list[i].fh_base, fhlen);
+ }
+
+ /* Set number of bytes encoded = total_bytes_encoded - length var */
+ len = (char *)p - (char *)layoutlen_p;
+ exp_xdr_encode_u32(layoutlen_p, len - 4);
+
+ nfserr = NFS4_OK;
+out:
+ dprintk("%s: End err %u xdrlen %d\n",
+ __func__, nfserr, len);
+ return nfserr;
+}
+EXPORT_SYMBOL(filelayout_encode_layout);
diff --git a/include/linux/exportfs.h b/include/linux/exportfs.h
index 017f1753..8e8b6a7 100644
--- a/include/linux/exportfs.h
+++ b/include/linux/exportfs.h
@@ -218,7 +218,7 @@ extern struct dentry *generic_fh_to_parent(struct super_block *sb,
extern int filelayout_encode_devinfo(struct exp_xdr_stream *xdr,
const struct pnfs_filelayout_device *fdev);
-extern int filelayout_encode_layout(struct exp_xdr_stream *xdr,
- const struct pnfs_filelayout_layout *flp);
+extern enum nfsstat4 filelayout_encode_layout(struct exp_xdr_stream *xdr,
+ const struct pnfs_filelayout_layout *flp);
#endif /* defined(CONFIG_EXPORTFS_FILE_LAYOUT) */
#endif /* LINUX_EXPORTFS_H */
diff --git a/include/linux/nfsd/nfs4layoutxdr.h b/include/linux/nfsd/nfs4layoutxdr.h
index 752055f..dc7831a 100644
--- a/include/linux/nfsd/nfs4layoutxdr.h
+++ b/include/linux/nfsd/nfs4layoutxdr.h
@@ -35,6 +35,7 @@
#define NFSD_NFS4LAYOUTXDR_H
#include <linux/sunrpc/xdr.h>
+#include <linux/nfsd/nfsd4_pnfs.h>
/* the nfsd4_pnfs_devlist dev_addr for the file layout type */
struct pnfs_filelayout_devaddr {
@@ -55,4 +56,21 @@ struct pnfs_filelayout_device {
struct pnfs_filelayout_multipath *fl_device_list;
};
+struct pnfs_filelayout_layout {
+ u32 lg_layout_type; /* response */
+ u32 lg_stripe_type; /* response */
+ u32 lg_commit_through_mds; /* response */
+ u64 lg_stripe_unit; /* response */
+ u64 lg_pattern_offset; /* response */
+ u32 lg_first_stripe_index; /* response */
+ struct nfsd4_pnfs_deviceid device_id; /* response */
+ u32 lg_fh_length; /* response */
+ struct knfsd_fh *lg_fh_list; /* response */
+};
+
+enum stripetype4 {
+ STRIPE_SPARSE = 1,
+ STRIPE_DENSE = 2
+};
+
#endif /* NFSD_NFS4LAYOUTXDR_H */
--
1.8.3.1
From: Benny Halevy <[email protected]>
Containing xdr encoding helpers to be used by the layout type library functions
or by the file system to encode/decode layout-type specific device and layout
information.
Cc: J. Bruce Fields <[email protected]>
[nfsd: fix exp_xdr_encode_u64 parameter type]
Reported-by: J. Bruce Fields <[email protected]>
[exportfs: exp_xdr.h: Use #include <linux/string.h> instead of <asm/string.h>]
Signed-off-by: Benny Halevy <[email protected]>
Signed-off-by: Benny Halevy <[email protected]>
---
include/linux/exp_xdr.h | 141 ++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 141 insertions(+)
create mode 100644 include/linux/exp_xdr.h
diff --git a/include/linux/exp_xdr.h b/include/linux/exp_xdr.h
new file mode 100644
index 0000000..b69c309
--- /dev/null
+++ b/include/linux/exp_xdr.h
@@ -0,0 +1,141 @@
+#ifndef _LINUX_EXP_XDR_H
+#define _LINUX_EXP_XDR_H
+
+#include <asm/byteorder.h>
+#include <asm/unaligned.h>
+#include <linux/string.h>
+
+struct exp_xdr_stream {
+ __be32 *p;
+ __be32 *end;
+};
+
+/**
+ * exp_xdr_qwords - Calculate the number of quad-words holding nbytes
+ * @nbytes: number of bytes to encode
+ */
+static inline size_t
+exp_xdr_qwords(__u32 nbytes)
+{
+ return DIV_ROUND_UP(nbytes, 4);
+}
+
+/**
+ * exp_xdr_qbytes - Calculate the number of bytes holding qwords
+ * @qwords: number of quad-words to encode
+ */
+static inline size_t
+exp_xdr_qbytes(size_t qwords)
+{
+ return qwords << 2;
+}
+
+/**
+ * exp_xdr_reserve_space - Reserve buffer space for sending
+ * @xdr: pointer to exp_xdr_stream
+ * @nbytes: number of bytes to reserve
+ *
+ * Checks that we have enough buffer space to encode 'nbytes' more
+ * bytes of data. If so, update the xdr stream.
+ */
+static inline __be32 *
+exp_xdr_reserve_space(struct exp_xdr_stream *xdr, size_t nbytes)
+{
+ __be32 *p = xdr->p;
+ __be32 *q;
+
+ /* align nbytes on the next 32-bit boundary */
+ q = p + exp_xdr_qwords(nbytes);
+ if (unlikely(q > xdr->end || q < p))
+ return NULL;
+ xdr->p = q;
+ return p;
+}
+
+/**
+ * exp_xdr_reserve_qwords - Reserve buffer space for sending
+ * @xdr: pointer to exp_xdr_stream
+ * @nwords: number of quad words (u32's) to reserve
+ */
+static inline __be32 *
+exp_xdr_reserve_qwords(struct exp_xdr_stream *xdr, size_t qwords)
+{
+ return exp_xdr_reserve_space(xdr, exp_xdr_qbytes(qwords));
+}
+
+/**
+ * exp_xdr_encode_u32 - Encode an unsigned 32-bit value onto a xdr stream
+ * @p: pointer to encoding destination
+ * @val: value to encode
+ */
+static inline __be32 *
+exp_xdr_encode_u32(__be32 *p, __u32 val)
+{
+ *p = cpu_to_be32(val);
+ return p + 1;
+}
+
+/**
+ * exp_xdr_encode_u64 - Encode an unsigned 64-bit value onto a xdr stream
+ * @p: pointer to encoding destination
+ * @val: value to encode
+ */
+static inline __be32 *
+exp_xdr_encode_u64(__be32 *p, __u64 val)
+{
+ put_unaligned_be64(val, p);
+ return p + 2;
+}
+
+/**
+ * exp_xdr_encode_bytes - Encode an array of bytes onto a xdr stream
+ * @p: pointer to encoding destination
+ * @ptr: pointer to the array of bytes
+ * @nbytes: number of bytes to encode
+ */
+static inline __be32 *
+exp_xdr_encode_bytes(__be32 *p, const void *ptr, __u32 nbytes)
+{
+ if (likely(nbytes != 0)) {
+ unsigned int qwords = exp_xdr_qwords(nbytes);
+ unsigned int padding = exp_xdr_qbytes(qwords) - nbytes;
+
+ memcpy(p, ptr, nbytes);
+ if (padding != 0)
+ memset((char *)p + nbytes, 0, padding);
+ p += qwords;
+ }
+ return p;
+}
+
+/**
+ * exp_xdr_encode_opaque - Encode an opaque type onto a xdr stream
+ * @p: pointer to encoding destination
+ * @ptr: pointer to the opaque array
+ * @nbytes: number of bytes to encode
+ *
+ * Encodes the 32-bit opaque size in bytes followed by the opaque value.
+ */
+static inline __be32 *
+exp_xdr_encode_opaque(__be32 *p, const void *ptr, __u32 nbytes)
+{
+ p = exp_xdr_encode_u32(p, nbytes);
+ return exp_xdr_encode_bytes(p, ptr, nbytes);
+}
+
+/**
+ * exp_xdr_encode_opaque_qlen - Encode the opaque length onto a xdr stream
+ * @lenp: pointer to the opaque length destination
+ * @endp: pointer to the end of the opaque array
+ *
+ * Encodes the 32-bit opaque size in bytes given the start and end pointers
+ */
+static inline __be32 *
+exp_xdr_encode_opaque_len(__be32 *lenp, const void *endp)
+{
+ size_t nbytes = (char *)endp - (char *)(lenp + 1);
+
+ exp_xdr_encode_u32(lenp, nbytes);
+ return lenp + 1 + exp_xdr_qwords(nbytes);
+}
+#endif /* _LINUX_EXP_XDR_H */
--
1.8.3.1
On Thu, Sep 26, 2013 at 02:40:35PM -0400, Benny Halevy wrote:
> From: Benny Halevy <[email protected]>
>
> Containing xdr encoding helpers to be used by the layout type library functions
> or by the file system to encode/decode layout-type specific device and layout
> information.
Any reason why pnfs would use this, but not the rest of nfsd? Maybe
this should go in as part of a separate series cleaning up the xdr
encoding and decoding?
On 09/29/2013 05:20 AM, Benny Halevy wrote:
> Makes sense. Thanks.
>
> Bruce - are you ok with moving the pnfs interface definitions to
> include/linux/exportfs.h along with struct export_operations?
>
I disagree this is a bloat. It is big enough as it is. For code clarity
and maintenance we should split. Yes they are related but then
lots of stuff are related but we want to keep them separate compact
and readable. why put everything in the same file, it does not make
any sense.
You have a library exportfs that exports a few related but different
interfaces, In fact this header is not exported from exportfs it is
the type system that defines the pnfs operations. They are so
big and verbose they call for a separate header.
> In fact we can actually extend struct export_operations rather
> than adding pnfs_export_operations...
>
This is a great waist of space. Any FS that does not support
pnfs, Which is currently all but exofs, will have 7 NULLs instead
of one.
And putting a struct pnfs_export_operations pointer inside
struct export_operations gives you nothing but an extra dereference
and funny looking code. Current system is just the most simple and
most efficient. Why the extra complexity?
> Benny
Cheers
Boaz
On 09/29/2013 05:35 AM, Christoph Hellwig wrote:> On Sun, Sep 29, 2013 at 05:21:30AM -0700, Christoph Hellwig wrote:
>>> Bruce - are you ok with moving the pnfs interface definitions to
>>> include/linux/exportfs.h along with struct export_operations?
>>>
>>> In fact we can actually extend struct export_operations rather
>>> than adding pnfs_export_operations...
>>
>> Yes, it probably should go into the export ops, although the actual
>> method signatures might need to be made a litle less nfs-specific for
>> that.
>
> I jsut took a brief look over the diff for the whole series in the git
> tree and the old tree that still had block and exofs servers and have
> revised my opinion a little bit:
>
>
> - the should be a layout_type field in struct export_operations,
> indicating that a filesystem support some sort of pnfs-like export.
The pnfs protocol and people have plans to, allow a multi typed
layouts from the same super-block. It is a per file attribute.
It even allows a multi protocol access to the same file.
The only flag should be the presence of the layout_get vector
that should indicate support or lack of it.
(In fact I would remove layout_type field completly it's place
is only as an input to LO_GET and DEVICE_INFO)
> - there should be a struct pnfs_operations, but it should be confined
> to fs/nfsd: each layout can be a separate loadable module and gets
> registered there. For the initial file layout that module is
> self-contained, but for e.g. block or objects it would have
> call into the filesystem through export_ops, although way lower level
> than the NFS XDR level, e.g. for block there would be one of to get
> the extent map, and one to allocate an extent.
>
No! This does not make any sense. What you say does not fit any model of any
cluster filesystem today.
- Again the FS can support any protocol.
- Only the FS understand the structure and layout of the file access. Any
other model is a specific implementation and breaks abstraction. The only true
abstraction is the LO_GET LO_RETURN LO_COMMIT DEVICE_INFO and LO_CB_RECALL. anything
else is making assumptions.
There is a pnfs vector and it is at this abstraction level exactly.
> This way we alsod avoid the dependcy on nfsd in the filesystems that the
> cureent version introduces.
There is no "dependency on nfsd in the filesystems"
The only dependency the FS has is an import of some library routines
at exportfs that take an abstract layout and device descriptions and encode
them into an XDR buffer. But the FS knows nothing of the XDR and the
NFSD is free to unload at any moment without forcing the FS to unload
first or at all.
This is actually tested, in fact I do this all the time when I want to
start fresh and have NFSD close all resources on the FS.
Nothing changed, the FS is independent and NFSD is dependent on the FS,
but in an abstract way via an exports vector.
Where did you see such dependency?
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
Cheers
Boaz
On 09/30/2013 05:23 PM, Boaz Harrosh wrote:
> But the fact is that no
> one cares for a files-layout open-source server.
>
Actually I was wrong about this. Ganesha project has
3-4 open-source implementations of real pnfs.
Cheers
Boaz
List the layout state on the respective file and client structures and
list all layout segments associated with the layout state on the respective
layout state structure.
Use the layout state list (lo_layouts) for looking up the layout.
Signed-off-by: Benny Halevy <[email protected]>
---
fs/nfsd/nfs4pnfsd.c | 40 +++++++++++++++++++++++++++++++++++++---
fs/nfsd/nfs4state.c | 3 +++
fs/nfsd/pnfsd.h | 4 ++++
fs/nfsd/state.h | 3 +++
4 files changed, 47 insertions(+), 3 deletions(-)
diff --git a/fs/nfsd/nfs4pnfsd.c b/fs/nfsd/nfs4pnfsd.c
index e28c396..2d5ddf7 100644
--- a/fs/nfsd/nfs4pnfsd.c
+++ b/fs/nfsd/nfs4pnfsd.c
@@ -130,6 +130,11 @@ struct sbid_tracker {
return new;
kref_init(&new->ls_ref);
new->ls_stid.sc_type = NFS4_LAYOUT_STID;
+ INIT_LIST_HEAD(&new->ls_perclnt);
+ new->ls_client = clp;
+ spin_lock(&layout_lock);
+ list_add(&new->ls_perclnt, &clp->cl_lo_states);
+ spin_unlock(&layout_lock);
return new;
}
@@ -139,6 +144,13 @@ struct sbid_tracker {
kref_get(&ls->ls_ref);
}
+static void
+unhash_layout_state(struct nfs4_layout_state *ls)
+{
+ ASSERT_LAYOUT_LOCKED();
+ list_del_init(&ls->ls_perclnt);
+}
+
/*
* Note: must be called under the state lock
*/
@@ -149,6 +161,11 @@ struct sbid_tracker {
container_of(kref, struct nfs4_layout_state, ls_ref);
nfsd4_remove_stid(&ls->ls_stid);
+ if (!list_empty(&ls->ls_perclnt)) {
+ spin_lock(&layout_lock);
+ unhash_layout_state(ls);
+ spin_unlock(&layout_lock);
+ }
nfsd4_free_stid(layout_state_slab, &ls->ls_stid);
}
@@ -234,13 +251,30 @@ struct sbid_tracker {
kmem_cache_free(pnfs_layout_slab, lp);
}
+static void update_layout_stateid_locked(struct nfs4_layout_state *ls, stateid_t *sid)
+{
+ update_stateid(&(ls)->ls_stid.sc_stateid);
+ memcpy((sid), &(ls)->ls_stid.sc_stateid, sizeof(stateid_t));
+ dprintk("%s Updated ls_stid to %d on layoutstate %p\n",
+ __func__, sid->si_generation, ls);
+}
+
static void
init_layout(struct nfs4_layout *lp,
- struct nfsd4_layout_seg *seg)
+ struct nfs4_layout_state *ls,
+ struct svc_fh *current_fh,
+ struct nfsd4_layout_seg *seg,
+ stateid_t *stateid)
{
- dprintk("pNFS %s: lp %p\n", __func__, lp);
+ dprintk("pNFS %s: lp %p ls %p\n", __func__,
+ lp, ls);
memcpy(&lp->lo_seg, seg, sizeof(lp->lo_seg));
+ get_layout_state(ls); /* put on destroy_layout */
+ lp->lo_state = ls;
+ spin_lock(&layout_lock);
+ update_layout_stateid_locked(ls, stateid);
+ spin_unlock(&layout_lock);
dprintk("pNFS %s end\n", __func__);
}
@@ -443,7 +477,7 @@ struct super_block *
lgp->lg_seg = res.lg_seg;
lgp->lg_roc = res.lg_return_on_close;
- init_layout(lp, &res.lg_seg);
+ init_layout(lp, ls, lgp->lg_fhp, &res.lg_seg, &lgp->lg_sid);
out_unlock:
if (ls)
put_layout_state(ls);
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index fa292bb..0e2266f 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -1363,6 +1363,9 @@ static struct nfs4_client *create_client(struct xdr_netobj name,
INIT_LIST_HEAD(&clp->cl_idhash);
INIT_LIST_HEAD(&clp->cl_openowners);
INIT_LIST_HEAD(&clp->cl_delegations);
+#if defined(CONFIG_PNFSD)
+ INIT_LIST_HEAD(&clp->cl_lo_states);
+#endif /* CONFIG_PNFSD */
INIT_LIST_HEAD(&clp->cl_lru);
INIT_LIST_HEAD(&clp->cl_callbacks);
INIT_LIST_HEAD(&clp->cl_revoked);
diff --git a/fs/nfsd/pnfsd.h b/fs/nfsd/pnfsd.h
index c2360e4..a0363a7 100644
--- a/fs/nfsd/pnfsd.h
+++ b/fs/nfsd/pnfsd.h
@@ -44,10 +44,14 @@
struct nfs4_layout_state {
struct nfs4_stid ls_stid; /* must be first field */
struct kref ls_ref;
+ struct list_head ls_perclnt;
+ struct nfs4_client *ls_client;
};
/* outstanding layout */
struct nfs4_layout {
+ struct list_head lo_perstate;
+ struct nfs4_layout_state *lo_state;
struct nfsd4_layout_seg lo_seg;
};
diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h
index 8c6e097..187d169 100644
--- a/fs/nfsd/state.h
+++ b/fs/nfsd/state.h
@@ -291,6 +291,9 @@ struct nfs4_client {
struct rpc_wait_queue cl_cb_waitq; /* backchannel callers may */
/* wait here for slots */
struct net *net;
+#if defined(CONFIG_PNFSD)
+ struct list_head cl_lo_states; /* outstanding layout states */
+#endif /* CONFIG_PNFSD */
};
/* struct nfs4_client_reset
--
1.8.3.1
On 09/27/2013 01:19 PM, Benny Halevy wrote:> On Fri, Sep 27, 2013 at 12:37 PM, Boaz Harrosh <[email protected]> wrote:
>
> Boaz, sorry but the files layout went first to production on the
> client side in all major
> enterprise distributions so it doesn't make sense to submit exofs first.
> As for your patch series, I respect the work you did on it but
> a. as you said it is your patch series, not mine
This is not my patch series it is all of ours patch series I do
not have a different one then yours. Every one did some work
in his area. So I wrote the exofs part as well as lots of core
parts. But we always kept one tree. Our tree
> b. the forward port from 3.10 on changed the layout state handling
> radically (for the better I hope :)
> solving numerous correctness issues.
Cool good is good, right? Do you mean that exofs would not work now.
Why would it be broken?
> The motivation behind the dlm based implementation is to have a
> minimal useful pnfs implementation
> that folks can use and test the client against.
What kind of dumb test is DLM, without any write support. It is
plain not pNFS it is a freak. There is nothing to test. READONLY
file system, don't you see the joke in that?
If this is your motivation, testing, then at least put pnfs-exp
as the reference implementation for some real client testing.
> On this basis, writes layout can be added,
What writes layout in DLM? no hands waving please.
> and further on, exofs
> support can submitted as the next stage.
>
You are doing the work, what can I say. We have decided this before
I think it was even Bruce's idea not mine. So you change that decision?
For me the DLM is a joke and a bad face for the 6 years of effort
I put on this thing. This is not pNFS and will do more arm then good
to my cause. If you need it just for testing why do you need it in
mainline? mainline is for real users and benefits no?
I think I agree with Christoph Better wait for a real open-source
pNFS server implementation before putting any of this in the Kernel.
Just leave it out-of-tree as it is now. The only real open-source
pNFS server implementation out there today that can demonstrate 10G
saturation and scalability of up to 40G in the 40 nodes setup I had, is exofs.
So it is the only one that can justify such a big piece added to the Kernel.
Real sorry for the inconvenience of it being objects and not files.
If it would matter to someone so much as it did for me then perhaps
he would sit on his "thing" and implement one. But the fact is that no
one cares for a files-layout open-source server.
And you are off the hook this is the last I will comment on this.
> Benny
>
Thanks
Boaz
On Sun, Sep 29, 2013 at 03:12:41PM +0300, Benny Halevy wrote:
> > Also why would you want a header
> > outside fs/nfsd/ ?
>
> This header contains the file system interface.
Any interface for the filesystem should be part of exportfs.h, not
something nfs-specific.
idr_remove is about to be called before kmem_cache_free so unhashing it
is redundant
Signed-off-by: Benny Halevy <[email protected]>
---
fs/nfsd/nfs4state.c | 7 ++-----
1 file changed, 2 insertions(+), 5 deletions(-)
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index a8a18d4..214e42d 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -677,7 +677,6 @@ static void unhash_open_stateid(struct nfs4_ol_stateid *stp)
static void release_open_stateid(struct nfs4_ol_stateid *stp)
{
unhash_open_stateid(stp);
- unhash_stid(&stp->st_stid);
free_generic_stateid(stp);
}
@@ -699,7 +698,6 @@ static void release_last_closed_stateid(struct nfs4_openowner *oo)
struct nfs4_ol_stateid *s = oo->oo_last_closed_stid;
if (s) {
- unhash_stid(&s->st_stid);
free_generic_stateid(s);
oo->oo_last_closed_stid = NULL;
}
@@ -4043,10 +4041,9 @@ static void nfsd4_close_open_stateid(struct nfs4_ol_stateid *s)
nfsd4_close_open_stateid(stp);
- if (cstate->minorversion) {
- unhash_stid(&stp->st_stid);
+ if (cstate->minorversion)
free_generic_stateid(stp);
- } else
+ else
oo->oo_last_closed_stid = stp;
if (list_empty(&oo->oo_owner.so_stateids)) {
--
1.8.3.1
Signed-off-by: Benny Halevy <[email protected]>
---
fs/nfsd/nfs4pnfsd.c | 2 ++
fs/nfsd/pnfsd.h | 1 +
2 files changed, 3 insertions(+)
diff --git a/fs/nfsd/nfs4pnfsd.c b/fs/nfsd/nfs4pnfsd.c
index 2c5d30d..386afa3 100644
--- a/fs/nfsd/nfs4pnfsd.c
+++ b/fs/nfsd/nfs4pnfsd.c
@@ -133,6 +133,7 @@ struct sbid_tracker {
new->ls_stid.sc_type = NFS4_LAYOUT_STID;
INIT_LIST_HEAD(&new->ls_perclnt);
INIT_LIST_HEAD(&new->ls_perfile);
+ INIT_LIST_HEAD(&new->ls_layouts);
new->ls_client = clp;
get_nfs4_file(fp); /* released on destroy_layout_state */
new->ls_file = fp;
@@ -281,6 +282,7 @@ static void update_layout_stateid_locked(struct nfs4_layout_state *ls, stateid_t
lp->lo_state = ls;
spin_lock(&layout_lock);
update_layout_stateid_locked(ls, stateid);
+ list_add_tail(&lp->lo_perstate, &ls->ls_layouts);
spin_unlock(&layout_lock);
dprintk("pNFS %s end\n", __func__);
}
diff --git a/fs/nfsd/pnfsd.h b/fs/nfsd/pnfsd.h
index 852c250..1cd7a87 100644
--- a/fs/nfsd/pnfsd.h
+++ b/fs/nfsd/pnfsd.h
@@ -48,6 +48,7 @@ struct nfs4_layout_state {
struct nfs4_client *ls_client;
struct list_head ls_perfile;
struct nfs4_file *ls_file;
+ struct list_head ls_layouts;
};
/* outstanding layout */
--
1.8.3.1
Signed-off-by: Benny Halevy <[email protected]>
---
fs/nfsd/nfs4pnfsd.c | 15 +++++++++++----
fs/nfsd/nfs4state.c | 3 +++
fs/nfsd/pnfsd.h | 2 ++
fs/nfsd/state.h | 3 +++
4 files changed, 19 insertions(+), 4 deletions(-)
diff --git a/fs/nfsd/nfs4pnfsd.c b/fs/nfsd/nfs4pnfsd.c
index 2d5ddf7..2c5d30d 100644
--- a/fs/nfsd/nfs4pnfsd.c
+++ b/fs/nfsd/nfs4pnfsd.c
@@ -120,7 +120,8 @@ struct sbid_tracker {
* Note: must be called under the state lock
*/
static struct nfs4_layout_state *
-alloc_init_layout_state(struct nfs4_client *clp, stateid_t *stateid)
+alloc_init_layout_state(struct nfs4_client *clp, struct nfs4_file *fp,
+ stateid_t *stateid)
{
struct nfs4_layout_state *new;
@@ -131,9 +132,13 @@ struct sbid_tracker {
kref_init(&new->ls_ref);
new->ls_stid.sc_type = NFS4_LAYOUT_STID;
INIT_LIST_HEAD(&new->ls_perclnt);
+ INIT_LIST_HEAD(&new->ls_perfile);
new->ls_client = clp;
+ get_nfs4_file(fp); /* released on destroy_layout_state */
+ new->ls_file = fp;
spin_lock(&layout_lock);
list_add(&new->ls_perclnt, &clp->cl_lo_states);
+ list_add(&new->ls_perfile, &fp->fi_lo_states);
spin_unlock(&layout_lock);
return new;
}
@@ -149,6 +154,7 @@ struct sbid_tracker {
{
ASSERT_LAYOUT_LOCKED();
list_del_init(&ls->ls_perclnt);
+ list_del_init(&ls->ls_perfile);
}
/*
@@ -166,6 +172,7 @@ struct sbid_tracker {
unhash_layout_state(ls);
spin_unlock(&layout_lock);
}
+ put_nfs4_file(ls->ls_file);
nfsd4_free_stid(layout_state_slab, &ls->ls_stid);
}
@@ -212,7 +219,7 @@ struct sbid_tracker {
/* Is this the first use of this layout ? */
if (stid->sc_type != NFS4_LAYOUT_STID) {
- ls = alloc_init_layout_state(clp, stateid);
+ ls = alloc_init_layout_state(clp, fp, stateid);
if (!ls) {
status = nfserr_jukebox;
goto out;
@@ -266,8 +273,8 @@ static void update_layout_stateid_locked(struct nfs4_layout_state *ls, stateid_t
struct nfsd4_layout_seg *seg,
stateid_t *stateid)
{
- dprintk("pNFS %s: lp %p ls %p\n", __func__,
- lp, ls);
+ dprintk("pNFS %s: lp %p ls %p ino %lu\n", __func__,
+ lp, ls, ls->ls_file->fi_inode->i_ino);
memcpy(&lp->lo_seg, seg, sizeof(lp->lo_seg));
get_layout_state(ls); /* put on destroy_layout */
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index 0e2266f..e11d96f 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -2509,6 +2509,9 @@ static void nfsd4_init_file(struct nfs4_file *fp, struct inode *ino,
fp->fi_lease = NULL;
memset(fp->fi_fds, 0, sizeof(fp->fi_fds));
memset(fp->fi_access, 0, sizeof(fp->fi_access));
+#if defined(CONFIG_PNFSD)
+ INIT_LIST_HEAD(&fp->fi_lo_states);
+#endif /* CONFIG_PNFSD */
spin_lock(&recall_lock);
hlist_add_head(&fp->fi_hash, &file_hashtbl[hashval]);
spin_unlock(&recall_lock);
diff --git a/fs/nfsd/pnfsd.h b/fs/nfsd/pnfsd.h
index a0363a7..852c250 100644
--- a/fs/nfsd/pnfsd.h
+++ b/fs/nfsd/pnfsd.h
@@ -46,6 +46,8 @@ struct nfs4_layout_state {
struct kref ls_ref;
struct list_head ls_perclnt;
struct nfs4_client *ls_client;
+ struct list_head ls_perfile;
+ struct nfs4_file *ls_file;
};
/* outstanding layout */
diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h
index 187d169..1ef09ae 100644
--- a/fs/nfsd/state.h
+++ b/fs/nfsd/state.h
@@ -405,6 +405,9 @@ struct nfs4_file {
atomic_t fi_delegees;
struct inode *fi_inode;
bool fi_had_conflict;
+#if defined(CONFIG_PNFSD)
+ struct list_head fi_lo_states;
+#endif /* CONFIG_PNFSD */
};
/* XXX: for first cut may fall back on returning file that doesn't work
--
1.8.3.1
On Thu, Sep 26, 2013 at 02:36:16PM -0400, Benny Halevy wrote:
> The following patchset implements an extension to nfsd
> providing a complete minimal pnfs server exporting
> DLM-based clustered file systems such as GFS2 or OCFS2.
Does this actually buy us anything by now? Last time I saw numbers for
this implementation it was slower than an active/passive setup over
those filesystem due to the way their cluster locking works.
On Thu, Sep 26, 2013 at 02:42:59PM -0400, Benny Halevy wrote:
> From: Benny Halevy <[email protected]>
>
> Include headers in nfs_xdr.h required for
> struct rpc_task, nfs4_verifier, nfs4_stateid
Is this actually dereferences or would it need a forward declaration?
Also should go to Trond straight outside of this series.
From: Benny Halevy <[email protected]>
get_cached_acl is defined as inline in posix_acl.h
requiring the full definition of struct inode as it
dereferences its struct inode * parameter.
Cc: Alexander Viro <[email protected]>
Cc: [email protected]
Cc: J. Bruce Fields <[email protected]>
Cc: Trond Myklebust <[email protected]>
Signed-off-by: Benny Halevy <[email protected]>
Signed-off-by: Benny Halevy <[email protected]>
---
include/linux/posix_acl.h | 1 +
1 file changed, 1 insertion(+)
diff --git a/include/linux/posix_acl.h b/include/linux/posix_acl.h
index 7931efe..a7d8b04 100644
--- a/include/linux/posix_acl.h
+++ b/include/linux/posix_acl.h
@@ -9,6 +9,7 @@
#define __LINUX_POSIX_ACL_H
#include <linux/bug.h>
+#include <linux/fs.h>
#include <linux/slab.h>
#include <linux/rcupdate.h>
--
1.8.3.1
From: Andy Adamson <[email protected]>
Change nfsd filesystem name from pnfs_ds_list to pnfs_dlm_device
write the per block device dlm data server cache
Signed-off-by: Andy Adamson <[email protected]>
Signed-off-by: Benny Halevy <[email protected]>
Acked-by: Steven Whitehouse <[email protected]>
Signed-off-by: Benny Halevy <[email protected]>
---
fs/nfsd/nfsctl.c | 73 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 73 insertions(+)
diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
index b8bfa2e..7da8584 100644
--- a/fs/nfsd/nfsctl.c
+++ b/fs/nfsd/nfsctl.c
@@ -50,6 +50,9 @@ enum {
NFSD_Gracetime,
NFSD_RecoveryDir,
#endif
+#ifdef CONFIG_PNFSD
+ NFSD_pnfs_dlm_device,
+#endif
};
/*
@@ -68,6 +71,9 @@ enum {
static ssize_t write_gracetime(struct file *file, char *buf, size_t size);
static ssize_t write_recoverydir(struct file *file, char *buf, size_t size);
#endif
+#ifdef CONFIG_PNFSD
+static ssize_t write_pnfs_dlm_device(struct file *file, char *buf, size_t size);
+#endif
static ssize_t (*write_op[])(struct file *, char *, size_t) = {
[NFSD_Fh] = write_filehandle,
@@ -83,6 +89,9 @@ static ssize_t (*write_op[])(struct file *, char *, size_t) = {
[NFSD_Gracetime] = write_gracetime,
[NFSD_RecoveryDir] = write_recoverydir,
#endif
+#ifdef CONFIG_PNFSD
+ [NFSD_pnfs_dlm_device] = write_pnfs_dlm_device,
+#endif
};
static ssize_t nfsctl_transaction_write(struct file *file, const char __user *buf, size_t size, loff_t *pos)
@@ -1037,6 +1046,66 @@ static ssize_t write_recoverydir(struct file *file, char *buf, size_t size)
#endif
+#ifdef CONFIG_PNFSD
+
+static ssize_t __write_pnfs_dlm_device(struct file *file, char *buf,
+ size_t size)
+{
+ char *mesg = buf;
+ char *pnfs_dlm_device;
+ int max_size = NFSD_PNFS_DLM_DEVICE_MAX;
+ int len, ret = 0;
+
+ if (size > 0) {
+ ret = -EINVAL;
+ if (size > max_size || buf[size-1] != '\n')
+ return ret;
+ buf[size-1] = 0;
+
+ pnfs_dlm_device = mesg;
+ len = qword_get(&mesg, pnfs_dlm_device, size);
+ if (len <= 0)
+ return ret;
+
+ ret = nfsd4_set_pnfs_dlm_device(pnfs_dlm_device, len);
+ }
+ return ret <= 0 ? ret : strlen(buf);
+}
+
+/**
+ * write_pnfs_dlm_device - Set or report the current pNFS data server list
+ *
+ * Input:
+ * buf: ignored
+ * size: zero
+ *
+ * OR
+ *
+ * Input:
+ * buf: C string containing a block device name,
+ * a colon, and then a comma separated
+ * list of pNFS data server IPv4 addresses
+ * size: non-zero length of C string in @buf
+ * Output:
+ * On success: passed-in buffer filled with '\n'-terminated C
+ * string containing a block device name, a colon, and
+ * then a comma separated list of pNFS
+ * data server IPv4 addresses.
+ * return code is the size in bytes of the string
+ * On error: return code is a negative errno value
+ */
+static ssize_t write_pnfs_dlm_device(struct file *file, char *buf, size_t size)
+{
+ ssize_t rv;
+
+ mutex_lock(&nfsd_mutex);
+ rv = __write_pnfs_dlm_device(file, buf, size);
+ mutex_unlock(&nfsd_mutex);
+ return rv;
+}
+
+#endif /* CONFIG_PNFSD */
+
/*----------------------------------------------------------------------------*/
/*
* populating the filesystem.
@@ -1068,6 +1137,10 @@ static int nfsd_fill_super(struct super_block * sb, void * data, int silent)
[NFSD_Gracetime] = {"nfsv4gracetime", &transaction_ops, S_IWUSR|S_IRUSR},
[NFSD_RecoveryDir] = {"nfsv4recoverydir", &transaction_ops, S_IWUSR|S_IRUSR},
#endif
+#ifdef CONFIG_PNFSD
+ [NFSD_pnfs_dlm_device] = {"pnfs_dlm_device", &transaction_ops,
+ S_IWUSR|S_IRUSR},
+#endif
/* last one */ {""}
};
struct net *net = data;
--
1.8.3.1
Handle layout return by the generic layer for RETURN_{FILE,FSID,ALL}.
Keep track of the layout state sequence and remaining outstanding layout.
lrs_present set to false when the client returns all of its layout for the file.
Signed-off-by: Benny Halevy <[email protected]>
---
fs/nfsd/nfs4pnfsd.c | 284 ++++++++++++++++++++++++++++++++++++++++
fs/nfsd/nfs4proc.c | 58 ++++++++
fs/nfsd/nfs4state.c | 10 +-
fs/nfsd/nfs4xdr.c | 49 ++++++-
fs/nfsd/pnfsd.h | 1 +
fs/nfsd/state.h | 3 +
fs/nfsd/xdr4.h | 7 +
include/linux/nfsd/nfsd4_pnfs.h | 10 ++
8 files changed, 416 insertions(+), 6 deletions(-)
diff --git a/fs/nfsd/nfs4pnfsd.c b/fs/nfsd/nfs4pnfsd.c
index 1807455..2ba4a29 100644
--- a/fs/nfsd/nfs4pnfsd.c
+++ b/fs/nfsd/nfs4pnfsd.c
@@ -289,6 +289,54 @@ static void update_layout_stateid_locked(struct nfs4_layout_state *ls, stateid_t
dprintk("pNFS %s end\n", __func__);
}
+/*
+ * Note: must be called under the layout_lock.
+ */
+static void
+dequeue_layout_for_return(struct nfs4_layout *lo,
+ struct list_head *lo_destroy_list)
+{
+ ASSERT_LAYOUT_LOCKED();
+ list_del_init(&lo->lo_perstate);
+ list_add_tail(&lo->lo_perstate, lo_destroy_list);
+ if (list_empty(&lo->lo_state->ls_layouts)) {
+ unhash_layout_state(lo->lo_state);
+ nfsd4_unhash_stid(&lo->lo_state->ls_stid);
+ }
+}
+
+/*
+ * Note: must be called under the state lock
+ */
+static void
+destroy_layout(struct nfs4_layout *lp)
+{
+ struct nfs4_layout_state *ls;
+
+ ls = lp->lo_state;
+ dprintk("pNFS %s: lp %p ls %p ino %lu\n",
+ __func__, lp, ls, ls->ls_file->fi_inode->i_ino);
+
+ free_layout(lp);
+ /* release references taken by init_layout */
+ put_layout_state(ls);
+}
+
+/*
+ * Note: must be called under the state lock
+ */
+static void
+destroy_layout_list(struct list_head *lo_destroy_list)
+{
+ struct nfs4_layout *lp;
+
+ while (!list_empty(lo_destroy_list)) {
+ lp = list_first_entry(lo_destroy_list, struct nfs4_layout, lo_perstate);
+ list_del(&lp->lo_perstate);
+ destroy_layout(lp);
+ }
+}
+
static u64
alloc_init_sbid(struct super_block *sb)
{
@@ -367,6 +415,35 @@ struct super_block *
}
/*
+ * are two octet ranges overlapping?
+ * start1 last1
+ * |-----------------|
+ * start2 last2
+ * |----------------|
+ */
+static int
+lo_seg_overlapping(struct nfsd4_layout_seg *l1, struct nfsd4_layout_seg *l2)
+{
+ u64 start1 = l1->offset;
+ u64 last1 = last_byte_offset(start1, l1->length);
+ u64 start2 = l2->offset;
+ u64 last2 = last_byte_offset(start2, l2->length);
+ int ret;
+
+ /* if last1 == start2 there's a single byte overlap */
+ ret = (last2 >= start1) && (last1 >= start2);
+ dprintk("%s: l1 %llu:%lld l2 %llu:%lld ret=%d\n", __func__,
+ l1->offset, l1->length, l2->offset, l2->length, ret);
+ return ret;
+}
+
+static int
+same_fsid_major(struct nfs4_fsid *fsid, u64 major)
+{
+ return fsid->major == major;
+}
+
+/*
* are two octet ranges overlapping or adjacent?
*/
static bool
@@ -578,3 +655,210 @@ struct super_block *
free_layout(lp);
goto out_unlock;
}
+
+static void
+trim_layout(struct nfsd4_layout_seg *lo, struct nfsd4_layout_seg *lr)
+{
+ u64 lo_start = lo->offset;
+ u64 lo_end = end_offset(lo_start, lo->length);
+ u64 lr_start = lr->offset;
+ u64 lr_end = end_offset(lr_start, lr->length);
+
+ dprintk("%s:Begin lo %llu:%lld lr %llu:%lld\n", __func__,
+ lo->offset, lo->length, lr->offset, lr->length);
+
+ /* lr fully covers lo? */
+ if (lr_start <= lo_start && lo_end <= lr_end) {
+ lo->length = 0;
+ goto out;
+ }
+
+ /*
+ * split not supported yet. retain layout segment.
+ * remains must be returned by the client
+ * on the final layout return.
+ */
+ if (lo_start < lr_start && lr_end < lo_end) {
+ dprintk("%s: split not supported\n", __func__);
+ goto out;
+ }
+
+ if (lo_start < lr_start)
+ lo_end = lr_start - 1;
+ else /* lr_end < lo_end */
+ lo_start = lr_end + 1;
+
+ lo->offset = lo_start;
+ lo->length = (lo_end == NFS4_MAX_UINT64) ? lo_end : lo_end - lo_start;
+out:
+ dprintk("%s:End lo %llu:%lld\n", __func__, lo->offset, lo->length);
+}
+
+/*
+ * Note: should be called WITHOUT holding the layout_lock
+ */
+static int
+pnfs_return_file_layouts(struct nfsd4_pnfs_layoutreturn *lrp,
+ struct nfs4_layout_state *ls,
+ struct list_head *lo_destroy_list)
+{
+ int layouts_found = 0;
+ struct nfs4_layout *lp, *nextlp;
+
+ dprintk("%s: ls %p\n", __func__, ls);
+ lrp->lrs_present = 0;
+ spin_lock(&layout_lock);
+ list_for_each_entry_safe (lp, nextlp, &ls->ls_layouts, lo_perstate) {
+ dprintk("%s: lp %p ls %p inode %lu lo_type %x,%x iomode %d,%d\n",
+ __func__, lp, lp->lo_state,
+ lp->lo_state->ls_file->fi_inode->i_ino,
+ lp->lo_seg.layout_type, lrp->args.lr_seg.layout_type,
+ lp->lo_seg.iomode, lrp->args.lr_seg.iomode);
+ if ((lp->lo_seg.layout_type != lrp->args.lr_seg.layout_type &&
+ lrp->args.lr_seg.layout_type) ||
+ (lp->lo_seg.iomode != lrp->args.lr_seg.iomode &&
+ lrp->args.lr_seg.iomode != IOMODE_ANY) ||
+ !lo_seg_overlapping(&lp->lo_seg, &lrp->args.lr_seg)) {
+ lrp->lrs_present = 1;
+ continue;
+ }
+ layouts_found++;
+ trim_layout(&lp->lo_seg, &lrp->args.lr_seg);
+ if (!lp->lo_seg.length)
+ dequeue_layout_for_return(lp, lo_destroy_list);
+ else
+ lrp->lrs_present = 1;
+ }
+ if (ls && layouts_found && lrp->lrs_present)
+ update_layout_stateid_locked(ls, (stateid_t *)&lrp->args.lr_sid);
+ spin_unlock(&layout_lock);
+
+ return layouts_found;
+}
+
+/*
+ * Return layouts for RETURN_FSID or RETURN_ALL
+ *
+ * Note: must be called WITHOUT holding the layout lock
+ */
+static int
+pnfs_return_client_layouts(struct nfs4_client *clp,
+ struct nfsd4_pnfs_layoutreturn *lrp,
+ u64 ex_fsid,
+ struct list_head *lo_destroy_list)
+{
+ int layouts_found = 0;
+ bool state_found;
+ struct nfs4_layout_state *ls, *nextls;
+ struct nfs4_layout *lp, *nextlp;
+
+ spin_lock(&layout_lock);
+ list_for_each_entry_safe (ls, nextls, &clp->cl_lo_states, ls_perclnt) {
+ if (lrp->args.lr_return_type == RETURN_FSID &&
+ !same_fsid_major(&ls->ls_file->fi_fsid, ex_fsid))
+ continue;
+
+ /* first pass, test only */
+ state_found = false;
+ list_for_each_entry (lp, &ls->ls_layouts, lo_perstate) {
+ if (lrp->args.lr_seg.layout_type != lp->lo_seg.layout_type &&
+ lrp->args.lr_seg.layout_type)
+ break;
+
+ if (lrp->args.lr_seg.iomode != lp->lo_seg.iomode &&
+ lrp->args.lr_seg.iomode != IOMODE_ANY)
+ continue;
+
+ state_found = true;
+ break;
+ }
+
+ if (!state_found)
+ continue;
+
+ list_for_each_entry_safe (lp, nextlp, &ls->ls_layouts, lo_perstate) {
+ if (lrp->args.lr_seg.layout_type != lp->lo_seg.layout_type &&
+ lrp->args.lr_seg.layout_type)
+ break;
+
+ if (lrp->args.lr_seg.iomode != lp->lo_seg.iomode &&
+ lrp->args.lr_seg.iomode != IOMODE_ANY)
+ continue;
+
+ layouts_found++;
+ dequeue_layout_for_return(lp, lo_destroy_list);
+ }
+ }
+ spin_unlock(&layout_lock);
+ return layouts_found;
+}
+
+int nfs4_pnfs_return_layout(struct svc_rqst *rqstp,
+ struct super_block *sb,
+ struct svc_fh *current_fh,
+ struct nfsd4_pnfs_layoutreturn *lrp)
+{
+ int status = 0;
+ int layouts_found = 0;
+ struct inode *ino = current_fh->fh_dentry->d_inode;
+ struct nfs4_file *fp = NULL;
+ struct nfs4_layout_state *ls = NULL;
+ struct nfs4_client *clp;
+ u64 ex_fsid = current_fh->fh_export->ex_fsid;
+ LIST_HEAD(lo_destroy_list);
+
+ dprintk("NFSD: %s\n", __func__);
+
+ nfs4_lock_state();
+ clp = find_confirmed_client(&lrp->lr_clientid,
+ true, net_generic(SVC_NET(rqstp), nfsd_net_id));
+ if (!clp)
+ goto out_unlock;
+
+ if (lrp->args.lr_return_type == RETURN_FILE) {
+ LIST_HEAD(lo_destroy_list);
+
+ fp = find_file(ino);
+ if (!fp) {
+ dprintk("%s: RETURN_FILE: no nfs4_file for ino %p:%lu\n",
+ __func__, ino, ino ? ino->i_ino : 0L);
+ /* If we had a layout on the file the nfs4_file would
+ * be referenced and we should have found it. Since we
+ * don't then it means all layouts were ROC and at this
+ * point we returned all of them on file close.
+ */
+ goto out_unlock;
+ }
+
+ /* Check the stateid */
+ dprintk("%s PROCESS LO_STATEID inode %p\n", __func__, ino);
+ status = nfs4_process_layout_stateid(clp, fp,
+ (stateid_t *)&lrp->args.lr_sid,
+ NFS4_LAYOUT_STID, &ls);
+ if (status)
+ goto out_unlock;
+ layouts_found = pnfs_return_file_layouts(lrp, ls, &lo_destroy_list);
+ } else {
+ layouts_found = pnfs_return_client_layouts(clp, lrp, ex_fsid,
+ &lo_destroy_list);
+ }
+
+ dprintk("pNFS %s: clp %p fp %p layout_type 0x%x iomode %d "
+ "return_type %d fsid 0x%llx offset %llu length %llu: "
+ "layouts_found %d\n",
+ __func__, clp, fp, lrp->args.lr_seg.layout_type,
+ lrp->args.lr_seg.iomode, lrp->args.lr_return_type,
+ ex_fsid,
+ lrp->args.lr_seg.offset, lrp->args.lr_seg.length, layouts_found);
+
+ if (ls)
+ put_layout_state(ls);
+ destroy_layout_list(&lo_destroy_list);
+out_unlock:
+ nfs4_unlock_state();
+ if (fp)
+ put_nfs4_file(fp);
+
+ dprintk("pNFS %s: exit status %d\n", __func__, status);
+ return status;
+}
diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
index 960d8ff..54926cb 100644
--- a/fs/nfsd/nfs4proc.c
+++ b/fs/nfsd/nfs4proc.c
@@ -1274,6 +1274,60 @@ static int fill_in_write_vector(struct kvec *vec, struct nfsd4_write *write)
out:
return status;
}
+
+static __be32
+nfsd4_layoutreturn(struct svc_rqst *rqstp,
+ struct nfsd4_compound_state *cstate,
+ struct nfsd4_pnfs_layoutreturn *lrp)
+{
+ int status;
+ struct super_block *sb;
+ struct svc_fh *current_fh = &cstate->current_fh;
+
+ status = fh_verify(rqstp, current_fh, 0, NFSD_MAY_NOP);
+ if (status)
+ goto out;
+
+ status = nfserr_inval;
+ sb = current_fh->fh_dentry->d_inode->i_sb;
+ if (!sb)
+ goto out;
+
+ /* Ensure underlying file system supports pNFS and,
+ * if so, the requested layout type
+ */
+ status = nfsd4_layout_verify(sb, current_fh->fh_export,
+ lrp->args.lr_seg.layout_type);
+ if (status)
+ goto out;
+
+ status = nfserr_inval;
+ if (lrp->args.lr_return_type != RETURN_FILE &&
+ lrp->args.lr_return_type != RETURN_FSID &&
+ lrp->args.lr_return_type != RETURN_ALL) {
+ dprintk("pNFS %s: invalid return_type %d\n", __func__,
+ lrp->args.lr_return_type);
+ goto out;
+ }
+
+ status = nfserr_inval;
+ if (lrp->args.lr_seg.iomode != IOMODE_READ &&
+ lrp->args.lr_seg.iomode != IOMODE_RW &&
+ lrp->args.lr_seg.iomode != IOMODE_ANY) {
+ dprintk("pNFS %s: invalid iomode %d\n", __func__,
+ lrp->args.lr_seg.iomode);
+ goto out;
+ }
+
+ /* Set clientid from sessionid */
+ copy_clientid(&lrp->lr_clientid, cstate->session);
+ lrp->lrs_present = 0;
+ status = nfs4_pnfs_return_layout(rqstp, sb, current_fh, lrp);
+out:
+ dprintk("pNFS %s: status %d return_type 0x%x lrs_present %d\n",
+ __func__, status, lrp->args.lr_return_type, lrp->lrs_present);
+ return status;
+}
#endif /* CONFIG_PNFSD */
/*
@@ -2021,6 +2075,10 @@ static inline u32 nfsd4_create_session_rsize(struct svc_rqst *rqstp, struct nfsd
.op_func = (nfsd4op_func)nfsd4_layoutget,
.op_name = "OP_LAYOUTGET",
},
+ [OP_LAYOUTRETURN] = {
+ .op_func = (nfsd4op_func)nfsd4_layoutreturn,
+ .op_name = "OP_LAYOUTRETURN",
+ },
#endif /* CONFIG_PNFSD */
};
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index 5d5dead..a9bd82b 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -463,7 +463,7 @@ static void nfs4_put_deleg_lease(struct nfs4_file *fp)
}
}
-static void unhash_stid(struct nfs4_stid *s)
+void nfsd4_unhash_stid(struct nfs4_stid *s)
{
s->sc_type = 0;
}
@@ -660,7 +660,7 @@ static void release_lock_stateid(struct nfs4_ol_stateid *stp)
struct file *file;
unhash_generic_stateid(stp);
- unhash_stid(&stp->st_stid);
+ nfsd4_unhash_stid(&stp->st_stid);
file = find_any_file(stp->st_file);
if (file)
locks_remove_posix(file, (fl_owner_t)lockowner(stp->st_stateowner));
@@ -2539,6 +2539,8 @@ static void nfsd4_init_file(struct nfs4_file *fp, struct inode *ino,
memset(fp->fi_access, 0, sizeof(fp->fi_access));
#if defined(CONFIG_PNFSD)
INIT_LIST_HEAD(&fp->fi_lo_states);
+ fp->fi_fsid.major = current_fh->fh_export->ex_fsid;
+ fp->fi_fsid.minor = 0;
#endif /* CONFIG_PNFSD */
spin_lock(&recall_lock);
hlist_add_head(&fp->fi_hash, &file_hashtbl[hashval]);
@@ -2725,7 +2727,7 @@ static void init_open_stateid(struct nfs4_ol_stateid *stp, struct nfs4_file *fp,
}
/* search file_hashtbl[] for file */
-static struct nfs4_file *
+struct nfs4_file *
find_file(struct inode *ino)
{
unsigned int hashval = file_hashval(ino);
@@ -3233,7 +3235,7 @@ static void nfsd4_open_deleg_none_ext(struct nfsd4_open *open, int status)
open->op_delegate_type = NFS4_OPEN_DELEGATE_READ;
return;
out_free:
- unhash_stid(&dp->dl_stid);
+ nfsd4_unhash_stid(&dp->dl_stid);
nfs4_put_delegation(dp);
out_no_deleg:
open->op_delegate_type = NFS4_OPEN_DELEGATE_NONE;
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index 1a50467..fc10dd7 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -1541,6 +1541,33 @@ static __be32 nfsd4_decode_reclaim_complete(struct nfsd4_compoundargs *argp, str
DECODE_TAIL;
}
+
+static __be32
+nfsd4_decode_layoutreturn(struct nfsd4_compoundargs *argp,
+ struct nfsd4_pnfs_layoutreturn *lrp)
+{
+ DECODE_HEAD;
+
+ READ_BUF(16);
+ READ32(lrp->args.lr_reclaim);
+ READ32(lrp->args.lr_seg.layout_type);
+ READ32(lrp->args.lr_seg.iomode);
+ READ32(lrp->args.lr_return_type);
+ if (lrp->args.lr_return_type == RETURN_FILE) {
+ READ_BUF(16);
+ READ64(lrp->args.lr_seg.offset);
+ READ64(lrp->args.lr_seg.length);
+ nfsd4_decode_stateid(argp, (stateid_t *)&lrp->args.lr_sid);
+ READ_BUF(4);
+ READ32(lrp->args.lrf_body_len);
+ if (lrp->args.lrf_body_len > 0) {
+ READ_BUF(lrp->args.lrf_body_len);
+ READMEM(lrp->args.lrf_body, lrp->args.lrf_body_len);
+ }
+ }
+
+ DECODE_TAIL;
+}
#endif /* CONFIG_PNFSD */
static __be32
@@ -1649,7 +1676,7 @@ static __be32 nfsd4_decode_reclaim_complete(struct nfsd4_compoundargs *argp, str
[OP_GETDEVICELIST] = (nfsd4_dec)nfsd4_decode_getdevlist,
[OP_LAYOUTCOMMIT] = (nfsd4_dec)nfsd4_decode_notsupp,
[OP_LAYOUTGET] = (nfsd4_dec)nfsd4_decode_layoutget,
- [OP_LAYOUTRETURN] = (nfsd4_dec)nfsd4_decode_notsupp,
+ [OP_LAYOUTRETURN] = (nfsd4_dec)nfsd4_decode_layoutreturn,
#else /* CONFIG_PNFSD */
[OP_GETDEVICEINFO] = (nfsd4_dec)nfsd4_decode_notsupp,
[OP_GETDEVICELIST] = (nfsd4_dec)nfsd4_decode_notsupp,
@@ -3903,6 +3930,24 @@ static __be32 nfsd4_encode_bind_conn_to_session(struct nfsd4_compoundres *resp,
resp->p = p_start;
return nfserr;
}
+
+static __be32
+nfsd4_encode_layoutreturn(struct nfsd4_compoundres *resp, __be32 nfserr,
+ struct nfsd4_pnfs_layoutreturn *lrp)
+{
+ __be32 *p;
+
+ if (nfserr)
+ goto out;
+
+ RESERVE_SPACE(4);
+ WRITE32(lrp->lrs_present != 0); /* got stateid? */
+ ADJUST_ARGS();
+ if (lrp->lrs_present)
+ nfsd4_encode_stateid(resp, (stateid_t *)&lrp->args.lr_sid);
+out:
+ return nfserr;
+}
#endif /* CONFIG_PNFSD */
static __be32
@@ -3970,7 +4015,7 @@ static __be32 nfsd4_encode_bind_conn_to_session(struct nfsd4_compoundres *resp,
[OP_GETDEVICELIST] = (nfsd4_enc)nfsd4_encode_getdevlist,
[OP_LAYOUTCOMMIT] = (nfsd4_enc)nfsd4_encode_noop,
[OP_LAYOUTGET] = (nfsd4_enc)nfsd4_encode_layoutget,
- [OP_LAYOUTRETURN] = (nfsd4_enc)nfsd4_encode_noop,
+ [OP_LAYOUTRETURN] = (nfsd4_enc)nfsd4_encode_layoutreturn,
#else /* CONFIG_PNFSD */
[OP_GETDEVICEINFO] = (nfsd4_enc)nfsd4_encode_noop,
[OP_GETDEVICELIST] = (nfsd4_enc)nfsd4_encode_noop,
diff --git a/fs/nfsd/pnfsd.h b/fs/nfsd/pnfsd.h
index 1cd7a87..7ced4f3 100644
--- a/fs/nfsd/pnfsd.h
+++ b/fs/nfsd/pnfsd.h
@@ -61,6 +61,7 @@ struct nfs4_layout {
u64 find_create_sbid(struct super_block *);
struct super_block *find_sbid_id(u64);
__be32 nfs4_pnfs_get_layout(struct svc_rqst *, struct nfsd4_pnfs_layoutget *, struct exp_xdr_stream *);
+int nfs4_pnfs_return_layout(struct svc_rqst *, struct super_block *, struct svc_fh *, struct nfsd4_pnfs_layoutreturn *);
static inline struct nfs4_layout_state *layoutstateid(struct nfs4_stid *s)
{
diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h
index 3be7507..d2c75c5 100644
--- a/fs/nfsd/state.h
+++ b/fs/nfsd/state.h
@@ -407,6 +407,7 @@ struct nfs4_file {
bool fi_had_conflict;
#if defined(CONFIG_PNFSD)
struct list_head fi_lo_states;
+ struct nfs4_fsid fi_fsid;
#endif /* CONFIG_PNFSD */
};
@@ -489,6 +490,7 @@ extern struct nfs4_client_reclaim *nfs4_client_to_reclaim(const char *name,
extern bool nfs4_has_reclaimed_state(const char *name, struct nfsd_net *nn);
extern void put_client_renew(struct nfs4_client *clp);
extern void nfsd4_free_slab(struct kmem_cache **);
+extern struct nfs4_file *find_file(struct inode *);
extern struct nfs4_file *find_alloc_file(struct inode *, struct svc_fh *);
extern void put_nfs4_file(struct nfs4_file *);
extern void put_nfs4_file_locked(struct nfs4_file *);
@@ -497,6 +499,7 @@ extern struct nfs4_client_reclaim *nfs4_client_to_reclaim(const char *name,
extern struct nfs4_stid *nfsd4_alloc_stid(struct nfs4_client *cl, struct kmem_cache *slab);
extern void nfsd4_free_stid(struct kmem_cache *slab, struct nfs4_stid *s);
extern void nfsd4_remove_stid(struct nfs4_stid *s);
+extern void nfsd4_unhash_stid(struct nfs4_stid *s);
extern struct nfs4_stid *nfsd4_find_stateid(struct nfs4_client *, stateid_t *);
extern __be32 nfsd4_lookup_stateid(stateid_t *, unsigned char typemask, struct nfs4_stid **, bool sessions, struct nfsd_net *);
diff --git a/fs/nfsd/xdr4.h b/fs/nfsd/xdr4.h
index 727288b..cfa0bcf 100644
--- a/fs/nfsd/xdr4.h
+++ b/fs/nfsd/xdr4.h
@@ -458,6 +458,12 @@ struct nfsd4_pnfs_layoutget {
u32 lg_roc; /* response */
};
+struct nfsd4_pnfs_layoutreturn {
+ struct nfsd4_pnfs_layoutreturn_arg args;
+ clientid_t lr_clientid; /* request */
+ u32 lrs_present; /* response */
+};
+
struct nfsd4_op {
int opnum;
__be32 status;
@@ -507,6 +513,7 @@ struct nfsd4_op {
struct nfsd4_pnfs_getdevlist pnfs_getdevlist;
struct nfsd4_pnfs_getdevinfo pnfs_getdevinfo;
struct nfsd4_pnfs_layoutget pnfs_layoutget;
+ struct nfsd4_pnfs_layoutreturn pnfs_layoutreturn;
#endif /* CONFIG_PNFSD */
} u;
struct nfs4_replay * replay;
diff --git a/include/linux/nfsd/nfsd4_pnfs.h b/include/linux/nfsd/nfsd4_pnfs.h
index a680085..e198979 100644
--- a/include/linux/nfsd/nfsd4_pnfs.h
+++ b/include/linux/nfsd/nfsd4_pnfs.h
@@ -36,6 +36,7 @@
#include <linux/exportfs.h>
#include <linux/exp_xdr.h>
+#include <linux/nfs_xdr.h>
struct nfsd4_pnfs_deviceid {
u64 sbid; /* per-superblock unique ID */
@@ -86,6 +87,15 @@ struct nfsd4_pnfs_layoutget_res {
u32 lg_return_on_close;
};
+struct nfsd4_pnfs_layoutreturn_arg {
+ u32 lr_return_type; /* request */
+ struct nfsd4_layout_seg lr_seg; /* request */
+ u32 lr_reclaim; /* request */
+ u32 lrf_body_len; /* request */
+ void *lrf_body; /* request */
+ nfs4_stateid lr_sid; /* request/response */
+};
+
/*
* pNFS export operations vector.
*
--
1.8.3.1
On 2013-09-26 18:01, J. Bruce Fields wrote:
> On Thu, Sep 26, 2013 at 02:40:07PM -0400, Benny Halevy wrote:
>> From: Dean Hildebrand <[email protected]>
>
> I don't understand why we need to do this.
The motivation was that the DS doesn't need the backchannel
as it will never issue any recalls or other callback ops.
>
> Also: based on the previous patch I believe we set the
> EXCHGID4_FLAG_USE_PNFS_MDS bit in the reply unconditionally, so
> regardless of what the client requests we're permitting it to use this
> client as a MDS (or plain non-pnfs) server, so I'm not sure it matters
> what the client requested.
Hmm, True.
>
> Could you just drop this patch? Unless you have some good argument for
> it.
Yup. I see no problem in dropping it.
Benny
>
> --b.
>
>>
>> [was pnfsd: Add use of pnfs exchange flags]
>> Signed-off-by: Dean Hildebrand <[email protected]>
>> [pnfsd: define a is_ds_only_session helper]
>> Signed-off-by: Benny Halevy <[email protected]>
>> Signed-off-by: Benny Halevy <[email protected]>
>> ---
>> fs/nfsd/nfs4state.c | 4 ++++
>> include/uapi/linux/nfs4.h | 7 +++++++
>> 2 files changed, 11 insertions(+)
>>
>> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
>> index 21c15fc..2c973e6 100644
>> --- a/fs/nfsd/nfs4state.c
>> +++ b/fs/nfsd/nfs4state.c
>> @@ -1953,6 +1953,10 @@ static __be32 nfsd4_check_cb_sec(struct nfsd4_cb_sec *cbs)
>> status = nfserr_seq_misordered;
>> goto out_free_conn;
>> }
>> +
>> + if (is_ds_only_session(unconf->cl_exchange_flags))
>> + cr_ses->flags &= ~SESSION4_BACK_CHAN;
>> +
>> old = find_confirmed_client_by_name(&unconf->cl_name, nn);
>> if (old) {
>> status = mark_client_expired(old);
>> diff --git a/include/uapi/linux/nfs4.h b/include/uapi/linux/nfs4.h
>> index 788128e..028f5fc 100644
>> --- a/include/uapi/linux/nfs4.h
>> +++ b/include/uapi/linux/nfs4.h
>> @@ -125,6 +125,13 @@
>> #define EXCHGID4_FLAG_USE_PNFS_DS 0x00040000
>> #define EXCHGID4_FLAG_MASK_PNFS 0x00070000
>>
>> +static inline bool
>> +is_ds_only_session(u32 exchange_flags)
>> +{
>> + u32 mask = EXCHGID4_FLAG_USE_PNFS_DS | EXCHGID4_FLAG_USE_PNFS_MDS;
>> + return (exchange_flags & mask) == EXCHGID4_FLAG_USE_PNFS_DS;
>> +}
>> +
>> #define EXCHGID4_FLAG_UPD_CONFIRMED_REC_A 0x40000000
>> #define EXCHGID4_FLAG_CONFIRMED_R 0x80000000
>> /*
>> --
>> 1.8.3.1
>>
On Thu, Sep 26, 2013 at 02:42:48PM -0400, Benny Halevy wrote:
> we don't want to hold the state_lock while the file system may block
Needs a much beter changelog:
- why don't you want to hold it
- why you think the new version is safe and performs fine.
On 2013-09-29 15:35, Christoph Hellwig wrote:
> On Sun, Sep 29, 2013 at 05:21:30AM -0700, Christoph Hellwig wrote:
>>> Bruce - are you ok with moving the pnfs interface definitions to
>>> include/linux/exportfs.h along with struct export_operations?
>>>
>>> In fact we can actually extend struct export_operations rather
>>> than adding pnfs_export_operations...
>>
>> Yes, it probably should go into the export ops, although the actual
>> method signatures might need to be made a litle less nfs-specific for
>> that.
>
> I jsut took a brief look over the diff for the whole series in the git
> tree and the old tree that still had block and exofs servers and have
> revised my opinion a little bit:
>
>
> - the should be a layout_type field in struct export_operations,
> indicating that a filesystem support some sort of pnfs-like export.
> - there should be a struct pnfs_operations, but it should be confined
> to fs/nfsd: each layout can be a separate loadable module and gets
> registered there. For the initial file layout that module is
> self-contained, but for e.g. block or objects it would have
> call into the filesystem through export_ops, although way lower level
This makes sense for blocks for its use of the generic block allocation and mapping
calls (and it needs a new call for committing uninitialized extents)
But for objects there are no such calls and the integration with exofs
is pretty intimate.
Benny
> than the NFS XDR level, e.g. for block there would be one of to get
> the extent map, and one to allocate an extent.
>
> This way we alsod avoid the dependcy on nfsd in the filesystems that the
> cureent version introduces.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
On Sun, Sep 29, 2013 at 03:20:33PM +0300, Benny Halevy wrote:
> Bruce - are you ok with moving the pnfs interface definitions to
> include/linux/exportfs.h along with struct export_operations?
>
> In fact we can actually extend struct export_operations rather
> than adding pnfs_export_operations...
Yes, it probably should go into the export ops, although the actual
method signatures might need to be made a litle less nfs-specific for
that.
On Thu, Sep 26, 2013 at 02:36:16PM -0400, Benny Halevy wrote:
> The following patchset implements an extension to nfsd
> providing a complete minimal pnfs server exporting
> DLM-based clustered file systems such as GFS2 or OCFS2.
>
> The pNFS operations that are implemented are
> GETDEVICELIST and GETDEVICEINFO,
> LAYOUTGET and LAYOUTRETURN.
>
> The server does the bookkeeping of the outstanding layout
> state in response to layout get and return.
>
> Also, the implementation cleans up the client layout state
> opon client expiry and on CLOSE when the return_on_close
> flag is set on the LAYOUTGET response. The latter is the
> default behavior until layout recalls are implemented
> with which the server can reclaim its resources in case
> the client holds layout state post closing files.
>
> The patchset is based on v3.12-rc2 and it's available also online here:
> git://linux-nfs.org/~bhalevy/linux-pnfs.git pnfsd-dlm-3.12-rc2-2013-09-26
Is there any userland support code required?
What's the quickest way to get a test setup--is
http://wiki.linux-nfs.org/wiki/index.php/PNFS_Setup_Instructions
accurate?
--b.
>
> Benny
>
> General infrastructure:
> [PATCH RFC v0 01/49] pnfsd: Define CONFIG_PNFSD
> [PATCH RFC v0 02/49] pnfsd: define NFSDDBG_PNFS
> [PATCH RFC v0 03/49] pnfsd: return pnfs flags on exchange_id
> [PATCH RFC v0 04/49] pnfsd: don't set up back channel on create_session for ds
> [PATCH RFC v0 05/49] pnfsd: introduce pnfsd header files
> [PATCH RFC v0 06/49] pnfsd: define pnfs_export_operations
> [PATCH RFC v0 07/49] pnfsd: add pnfs export option
> [PATCH RFC v0 08/49] pnfsd: layout verify
> [PATCH RFC v0 09/49] pnfsd: initial stub
>
> Device ops:
> [PATCH RFC v0 10/49] pnfsd: use sbid hash table to map super_blocks to devid major identifiers
> [PATCH RFC v0 11/49] NFSD: introduce exp_xdr.h
> [PATCH RFC v0 12/49] pnfsd: get device list/info
> [PATCH RFC v0 13/49] pnfsd: filelayout: get device list/info
> [PATCH RFC v0 14/49] pnfsd: provide helper for xdr encoding of deviceid
> [PATCH RFC v0 15/49] pnfsd: add helper functions for identifying DS filehandles
> [PATCH RFC v0 16/49] pnfsd: accept all ds stateids
>
> layout get:
> [PATCH RFC v0 17/49] DEBUG: nfsd: more client_lock asserts
> [PATCH RFC v0 18/49] pnfsd: nfs4_assert_state_locked
> [PATCH RFC v0 19/49] pnfsd: layout get
> [PATCH RFC v0 20/49] pnfsd: filelayout: layout encoding
>
> layout state handling for layout get:
> [PATCH RFC v0 21/49] nfsd: no need to unhash_stid before free
> [PATCH RFC v0 22/49] nfsd: cleanup free_stid
> [PATCH RFC v0 23/49] pnfsd: layout state allocation
> [PATCH RFC v0 24/49] pnfsd: process the layout stateid
> [PATCH RFC v0 25/49] pnfsd: layout state per client tracking
> [PATCH RFC v0 26/49] pnfsd: layout state per file tracking
> [PATCH RFC v0 27/49] pnfsd: hash layouts on layout state
> [PATCH RFC v0 28/49] pnfsd: support layout segment merging
>
> pnfs attributes:
> [PATCH RFC v0 29/49] pnfsd: support layout_type attribute
> [PATCH RFC v0 30/49] pnfsd: make pnfs server return layout_blksize when the client asks for it
> [PATCH RFC v0 31/49] pnfsd: add support for per-file layout_types attribute
>
> pnfsd over dlm:
> [PATCH RFC v0 32/49] pnfsd: per block device dlm data server list cache
> [PATCH RFC v0 33/49] pnfsd: Add IP address validation to nfsd4_set_pnfs_dlm_device()
> [PATCH RFC v0 34/49] pnfsd: new nfsd filesystem file: pnfs_dlm_device
> [PATCH RFC v0 35/49] pnfsd: nfsd4_pnfs_dlm_getdeviter
> [PATCH RFC v0 36/49] pnfsd: nfsd4_pnfs_dlm_getdevinfo
> [PATCH RFC v0 37/49] pnfsd: make /proc/fs/nfsd/pnfs_dlm_device report dlm device list.
> [PATCH RFC v0 38/49] pnfsd: nfsd4_pnfs_dlm_layoutget
> [PATCH RFC v0 39/49] pnfsd: DLM file layout only support read iomode layouts
> [PATCH RFC v0 40/49] pnfsd: add dlm file layout layout-type
> [PATCH RFC v0 41/49] pnfsd: dlm pnfs_export_operations
> [PATCH RFC v0 42/49] pnfsd: gfs2: use generic file layout pnfs operations vector
>
> layout return / expire / return_on_close:
> [PATCH RFC v0 43/49] pnfsd: release state lock around iput in put_nfs4_file
> [PATCH RFC v0 44/49] posix_acl: resolve compile dependency in posix_acl.h
> [PATCH RFC v0 45/49] nfs: resolve compile dependency in nfs_xdr.h
> [PATCH RFC v0 46/49] pnfsd: layout return generic implementation
> [PATCH RFC v0 47/49] pnfsd: pnfs_expire_client
> [PATCH RFC v0 48/49] pnfsd: return on close
> [PATCH RFC v0 49/49] pnfsd: dlm set return_on_close to true
From: Benny Halevy <[email protected]>
Provide for getting the (read-only) layout_type attribute
[extraced from pnfsd: Initial pNFS server implementation.]
Signed-off-by: Benny Halevy <[email protected]>
[pnfsd: Add super block to layout_type()]
Signed-off-by: Marc Eshel <[email protected]>
[pnfsd: convert generic code to use new pnfs api]
Signed-off-by: Benny Halevy <[email protected]>
[Remove the use of struct pnfs_export_operations.]
[pnfsd: support layout_type attribute all layout types]
[pnfsd: check ex_pnfs in nfsd4_verify_layout]
Signed-off-by: Andy Adamson <[email protected]>
[pnfsd: handle s_pnfs_op==NULL]
[pnfsd: test layout_type method in nfsd4_encode_fattr]
Signed-off-by: Benny Halevy <[email protected]>
Signed-off-by: Benny Halevy <[email protected]>
---
fs/nfsd/nfs4xdr.c | 24 ++++++++++++++++++++++++
fs/nfsd/nfsd.h | 5 +++++
2 files changed, 29 insertions(+)
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index b9c4417..83f7147 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -2560,6 +2560,30 @@ static int get_parent_attributes(struct svc_export *exp, struct kstat *stat)
get_parent_attributes(exp, &stat);
WRITE64(stat.ino);
}
+#if defined(CONFIG_PNFSD)
+ if (bmval1 & FATTR4_WORD1_FS_LAYOUT_TYPES) {
+ struct super_block *sb = dentry->d_inode->i_sb;
+ int type = 0;
+
+ /* Query the filesystem for supported pNFS layout types.
+ * Currently, we only support one layout type per file system.
+ * The export_ops->layout_type() returns the pnfs_layouttype4.
+ */
+ buflen -= 4;
+ if (buflen < 0) /* length */
+ goto out_resource;
+
+ if (sb && sb->s_pnfs_op && sb->s_pnfs_op->layout_type)
+ type = sb->s_pnfs_op->layout_type(sb);
+ if (type) {
+ if ((buflen -= 4) < 0) /* type */
+ goto out_resource;
+ WRITE32(1); /* length */
+ WRITE32(type); /* type */
+ } else
+ WRITE32(0); /* length */
+ }
+#endif /* CONFIG_PNFSD */
if (bmval2 & FATTR4_WORD2_SECURITY_LABEL) {
status = nfsd4_encode_security_label(rqstp, context,
contextlen, &p, &buflen);
diff --git a/fs/nfsd/nfsd.h b/fs/nfsd/nfsd.h
index 30f34ab..f49fb0b 100644
--- a/fs/nfsd/nfsd.h
+++ b/fs/nfsd/nfsd.h
@@ -321,8 +321,13 @@ static inline void nfs4_reset_lease(time_t leasetime) { }
#define NFSD4_1_SUPPORTED_ATTRS_WORD0 \
NFSD4_SUPPORTED_ATTRS_WORD0
+#if defined(CONFIG_PNFSD)
+#define NFSD4_1_SUPPORTED_ATTRS_WORD1 \
+ (NFSD4_SUPPORTED_ATTRS_WORD1 | FATTR4_WORD1_FS_LAYOUT_TYPES)
+#else /* CONFIG_PNFSD */
#define NFSD4_1_SUPPORTED_ATTRS_WORD1 \
NFSD4_SUPPORTED_ATTRS_WORD1
+#endif /* CONFIG_PNFSD */
#define NFSD4_1_SUPPORTED_ATTRS_WORD2 \
(NFSD4_SUPPORTED_ATTRS_WORD2 | FATTR4_WORD2_SUPPATTR_EXCLCREAT)
--
1.8.3.1
On 2013-09-27 17:39, J. Bruce Fields wrote:
> On Thu, Sep 26, 2013 at 02:40:15PM -0400, Benny Halevy wrote:
>> From: Benny Halevy <[email protected]>
>>
>> struct pnfs_export_operations defines the VFS level API for pNFS,
>> not including callbacks. A pnfs-exportable filesystem sets
>> a pointer to its pnfs export vector in its struct super_block.s_pnfs_op.
>>
>> The file system provides the per-superblock layout_type method that
>> determines if it supports pnfs for the filesystem identified by
>> the superblock, and if so, with which layout type (only one per-sb is
>> supported).
>>
>> Device ops:
>> get_device_iter is used to fill-in the device list for GETDEVICELIST
>> and get_device_info is used to encode the device info for GETDEVICEINFO.
>>
>> Layout ops:
>> layout_get, layout_commit, and layout_return implement the file system- and
>> layout type- specific parts of their respective protocol operations: LAYOUTGET,
>> LAYOUTCOMMIT, and LAYOUTRETURN.
>>
>> The following methods are mandatory to be implemented:
>> layout_type, get_device_info, and layout_get.
>>
>> Note: define pnfs export operations in a stub form in this patch.
>> Actual operations are defined along with their usage.
>
> Patches touching the superblock or the new pnfs export operations should
> probably all be cc'd to linux-fsdevel.
Absolutely.
I'll add linux-fsdevel in the next iteration.
Benny
>
> --b.
>
>>
>> [pnfsd: provide default no-op operations]
>> Signed-off-by: Benny Halevy <[email protected]>
>> [pnfsd: compile fixes for pnfsd branch]
>> Signed-off-by: Fred Isaman <[email protected]>
>> [gfs2: set pnfs_dlm_export_ops only for CONFIG_PNFSD]
>> [pnfsd: handle s_pnfs_op==NULL]
>> Signed-off-by: Benny Halevy <[email protected]>
>> Signed-off-by: Benny Halevy <[email protected]>
>> ---
>> fs/nfsd/export.c | 2 +-
>> include/linux/fs.h | 2 ++
>> include/linux/nfsd/nfsd4_pnfs.h | 14 ++++++++++++++
>> 3 files changed, 17 insertions(+), 1 deletion(-)
>>
>> diff --git a/fs/nfsd/export.c b/fs/nfsd/export.c
>> index 5f38ea3..f26b0b9 100644
>> --- a/fs/nfsd/export.c
>> +++ b/fs/nfsd/export.c
>> @@ -16,7 +16,7 @@
>> #include <linux/module.h>
>> #include <linux/exportfs.h>
>> #include <linux/sunrpc/svc_xprt.h>
>> -
>> +#include <linux/nfsd/nfsd4_pnfs.h>
>> #include <net/ipv6.h>
>>
>> #include "nfsd.h"
>> diff --git a/include/linux/fs.h b/include/linux/fs.h
>> index 3f40547..d9186a4 100644
>> --- a/include/linux/fs.h
>> +++ b/include/linux/fs.h
>> @@ -34,6 +34,7 @@
>> #include <uapi/linux/fs.h>
>>
>> struct export_operations;
>> +struct pnfs_export_operations;
>> struct hd_geometry;
>> struct iovec;
>> struct nameidata;
>> @@ -1251,6 +1252,7 @@ struct super_block {
>> const struct dquot_operations *dq_op;
>> const struct quotactl_ops *s_qcop;
>> const struct export_operations *s_export_op;
>> + const struct pnfs_export_operations *s_pnfs_op;
>> unsigned long s_flags;
>> unsigned long s_magic;
>> struct dentry *s_root;
>> diff --git a/include/linux/nfsd/nfsd4_pnfs.h b/include/linux/nfsd/nfsd4_pnfs.h
>> index 9e7d95e..ff6613e 100644
>> --- a/include/linux/nfsd/nfsd4_pnfs.h
>> +++ b/include/linux/nfsd/nfsd4_pnfs.h
>> @@ -34,4 +34,18 @@
>> #ifndef _LINUX_NFSD_NFSD4_PNFS_H
>> #define _LINUX_NFSD_NFSD4_PNFS_H
>>
>> +/*
>> + * pNFS export operations vector.
>> + *
>> + * The filesystem must implement the following methods:
>> + * layout_type
>> + * get_device_info
>> + * layout_get
>> + *
>> + * All other methods are optional and can be set to NULL if not implemented.
>> + */
>> +struct pnfs_export_operations {
>> + /* stub */
>> +};
>> +
>> #endif /* _LINUX_NFSD_NFSD4_PNFS_H */
>> --
>> 1.8.3.1
>>
On 2013-09-29 15:15, Christoph Hellwig wrote:
> On Thu, Sep 26, 2013 at 02:40:35PM -0400, Benny Halevy wrote:
>> From: Benny Halevy <[email protected]>
>>
>> Containing xdr encoding helpers to be used by the layout type library functions
>> or by the file system to encode/decode layout-type specific device and layout
>> information.
>
> Any reason why pnfs would use this, but not the rest of nfsd? Maybe
> this should go in as part of a separate series cleaning up the xdr
> encoding and decoding?
I discussed this with Trond and this patch is dropped from the next version
of this patchset. nfs4.h is going to be included instead.
Benny
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
Empty header are pretty useless. Also why would you want a header
outside fs/nfsd/ ?
Signed-off-by: Benny Halevy <[email protected]>
---
fs/nfsd/nfs4pnfsd.c | 55 +++++++++++++++++++++++++++++++++++++++++++++++++++++
fs/nfsd/nfs4state.c | 20 +++++++++----------
fs/nfsd/pnfsd.h | 11 +++++++++++
fs/nfsd/state.h | 4 ++++
4 files changed, 80 insertions(+), 10 deletions(-)
diff --git a/fs/nfsd/nfs4pnfsd.c b/fs/nfsd/nfs4pnfsd.c
index b8ddd82..82b6a7d 100644
--- a/fs/nfsd/nfs4pnfsd.c
+++ b/fs/nfsd/nfs4pnfsd.c
@@ -39,6 +39,7 @@
* Layout state - NFSv4.1 pNFS
*/
static struct kmem_cache *pnfs_layout_slab;
+static struct kmem_cache *layout_state_slab;
/* hash table for nfsd4_pnfs_deviceid.sbid */
#define SBID_HASH_BITS 8
@@ -82,6 +83,7 @@ struct sbid_tracker {
struct sbid_tracker *sbid;
nfsd4_free_slab(&pnfs_layout_slab);
+ nfsd4_free_slab(&layout_state_slab);
for (i = 0; i < SBID_HASH_SIZE; i++) {
while (!list_empty(&sbid_hashtbl[i])) {
@@ -103,12 +105,65 @@ struct sbid_tracker {
if (pnfs_layout_slab == NULL)
return -ENOMEM;
+ layout_state_slab = kmem_cache_create("pnfs_layout_states",
+ sizeof(struct nfs4_layout_state), 0, 0, NULL);
+ if (layout_state_slab == NULL)
+ return -ENOMEM;
+
for (i = 0; i < SBID_HASH_SIZE; i++)
INIT_LIST_HEAD(&sbid_hashtbl[i]);
return 0;
}
+/*
+ * Note: must be called under the state lock
+ */
+static struct nfs4_layout_state *
+alloc_init_layout_state(struct nfs4_client *clp, stateid_t *stateid)
+{
+ struct nfs4_layout_state *new;
+
+ nfs4_assert_state_locked();
+ new = layoutstateid(nfsd4_alloc_stid(clp, layout_state_slab));
+ if (!new)
+ return new;
+ kref_init(&new->ls_ref);
+ new->ls_stid.sc_type = NFS4_LAYOUT_STID;
+ return new;
+}
+
+static void
+get_layout_state(struct nfs4_layout_state *ls)
+{
+ kref_get(&ls->ls_ref);
+}
+
+/*
+ * Note: must be called under the state lock
+ */
+static void
+destroy_layout_state(struct kref *kref)
+{
+ struct nfs4_layout_state *ls =
+ container_of(kref, struct nfs4_layout_state, ls_ref);
+
+ nfsd4_remove_stid(&ls->ls_stid);
+ nfsd4_free_stid(layout_state_slab, &ls->ls_stid);
+}
+
+/*
+ * Note: must be called under the state lock
+ */
+static void
+put_layout_state(struct nfs4_layout_state *ls)
+{
+ dprintk("pNFS %s: ls %p ls_ref %d\n", __func__, ls,
+ atomic_read(&ls->ls_ref.refcount));
+ nfs4_assert_state_locked();
+ kref_put(&ls->ls_ref, destroy_layout_state);
+}
+
static struct nfs4_layout *
alloc_layout(void)
{
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index 099976e..6e251fb 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -330,7 +330,7 @@ static void nfs4_file_put_access(struct nfs4_file *fp, int oflag)
__nfs4_file_put_access(fp, oflag);
}
-static struct nfs4_stid *nfs4_alloc_stid(struct nfs4_client *cl, struct
+struct nfs4_stid *nfsd4_alloc_stid(struct nfs4_client *cl, struct
kmem_cache *slab)
{
struct idr *stateids = &cl->cl_stateids;
@@ -369,7 +369,7 @@ static struct nfs4_stid *nfs4_alloc_stid(struct nfs4_client *cl, struct
static struct nfs4_ol_stateid * nfs4_alloc_stateid(struct nfs4_client *clp)
{
- return openlockstateid(nfs4_alloc_stid(clp, stateid_slab));
+ return openlockstateid(nfsd4_alloc_stid(clp, stateid_slab));
}
static struct nfs4_delegation *
@@ -380,7 +380,7 @@ static struct nfs4_ol_stateid * nfs4_alloc_stateid(struct nfs4_client *clp)
dprintk("NFSD alloc_init_deleg\n");
if (num_delegations > max_delegations)
return NULL;
- dp = delegstateid(nfs4_alloc_stid(clp, deleg_slab));
+ dp = delegstateid(nfsd4_alloc_stid(clp, deleg_slab));
if (dp == NULL)
return dp;
dp->dl_stid.sc_type = NFS4_DELEG_STID;
@@ -403,7 +403,7 @@ static struct nfs4_ol_stateid * nfs4_alloc_stateid(struct nfs4_client *clp)
return dp;
}
-static void remove_stid(struct nfs4_stid *s)
+void nfsd4_remove_stid(struct nfs4_stid *s)
{
struct idr *stateids = &s->sc_client->cl_stateids;
@@ -411,7 +411,7 @@ static void remove_stid(struct nfs4_stid *s)
idr_remove(stateids, s->sc_stateid.si_opaque.so_id);
}
-static void free_stid(struct kmem_cache *slab, struct nfs4_stid *s)
+void nfsd4_free_stid(struct kmem_cache *slab, struct nfs4_stid *s)
{
kmem_cache_free(slab, s);
}
@@ -420,7 +420,7 @@ static void free_stid(struct kmem_cache *slab, struct nfs4_stid *s)
nfs4_put_delegation(struct nfs4_delegation *dp)
{
if (atomic_dec_and_test(&dp->dl_count)) {
- free_stid(deleg_slab, &dp->dl_stid);
+ nfsd4_free_stid(deleg_slab, &dp->dl_stid);
num_delegations--;
}
}
@@ -459,14 +459,14 @@ static void unhash_stid(struct nfs4_stid *s)
static void destroy_revoked_delegation(struct nfs4_delegation *dp)
{
list_del_init(&dp->dl_recall_lru);
- remove_stid(&dp->dl_stid);
+ nfsd4_remove_stid(&dp->dl_stid);
nfs4_put_delegation(dp);
}
static void destroy_delegation(struct nfs4_delegation *dp)
{
unhash_delegation(dp);
- remove_stid(&dp->dl_stid);
+ nfsd4_remove_stid(&dp->dl_stid);
nfs4_put_delegation(dp);
}
@@ -623,8 +623,8 @@ static void close_generic_stateid(struct nfs4_ol_stateid *stp)
static void free_generic_stateid(struct nfs4_ol_stateid *stp)
{
- remove_stid(&stp->st_stid);
- free_stid(stateid_slab, &stp->st_stid);
+ nfsd4_remove_stid(&stp->st_stid);
+ nfsd4_free_stid(stateid_slab, &stp->st_stid);
}
static void release_lock_stateid(struct nfs4_ol_stateid *stp)
diff --git a/fs/nfsd/pnfsd.h b/fs/nfsd/pnfsd.h
index 6920e43..c2360e4 100644
--- a/fs/nfsd/pnfsd.h
+++ b/fs/nfsd/pnfsd.h
@@ -40,6 +40,12 @@
#include "state.h"
#include "xdr4.h"
+/* outstanding layout stateid */
+struct nfs4_layout_state {
+ struct nfs4_stid ls_stid; /* must be first field */
+ struct kref ls_ref;
+};
+
/* outstanding layout */
struct nfs4_layout {
struct nfsd4_layout_seg lo_seg;
@@ -49,4 +55,9 @@ struct nfs4_layout {
struct super_block *find_sbid_id(u64);
__be32 nfs4_pnfs_get_layout(struct svc_rqst *, struct nfsd4_pnfs_layoutget *, struct exp_xdr_stream *);
+static inline struct nfs4_layout_state *layoutstateid(struct nfs4_stid *s)
+{
+ return container_of(s, struct nfs4_layout_state, ls_stid);
+}
+
#endif /* LINUX_NFSD_PNFSD_H */
diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h
index b85ad60..18a64c4 100644
--- a/fs/nfsd/state.h
+++ b/fs/nfsd/state.h
@@ -81,6 +81,7 @@ struct nfs4_stid {
#define NFS4_CLOSED_STID 8
/* For a deleg stateid kept around only to process free_stateid's: */
#define NFS4_REVOKED_DELEG_STID 16
+#define NFS4_LAYOUT_STID 32
unsigned char sc_type;
stateid_t sc_stateid;
struct nfs4_client *sc_client;
@@ -486,6 +487,9 @@ extern struct nfs4_client_reclaim *nfs4_client_to_reclaim(const char *name,
extern void put_nfs4_file(struct nfs4_file *);
extern void get_nfs4_file(struct nfs4_file *);
extern struct nfs4_client *find_confirmed_client(clientid_t *, bool sessions, struct nfsd_net *);
+extern struct nfs4_stid *nfsd4_alloc_stid(struct nfs4_client *cl, struct kmem_cache *slab);
+extern void nfsd4_free_stid(struct kmem_cache *slab, struct nfs4_stid *s);
+extern void nfsd4_remove_stid(struct nfs4_stid *s);
#if defined(CONFIG_PNFSD)
extern int nfsd4_init_pnfs_slabs(void);
--
1.8.3.1
On Thu, Sep 26, 2013 at 02:40:15PM -0400, Benny Halevy wrote:
> From: Benny Halevy <[email protected]>
>
> struct pnfs_export_operations defines the VFS level API for pNFS,
> not including callbacks. A pnfs-exportable filesystem sets
> a pointer to its pnfs export vector in its struct super_block.s_pnfs_op.
>
> The file system provides the per-superblock layout_type method that
> determines if it supports pnfs for the filesystem identified by
> the superblock, and if so, with which layout type (only one per-sb is
> supported).
>
> Device ops:
> get_device_iter is used to fill-in the device list for GETDEVICELIST
> and get_device_info is used to encode the device info for GETDEVICEINFO.
>
> Layout ops:
> layout_get, layout_commit, and layout_return implement the file system- and
> layout type- specific parts of their respective protocol operations: LAYOUTGET,
> LAYOUTCOMMIT, and LAYOUTRETURN.
>
> The following methods are mandatory to be implemented:
> layout_type, get_device_info, and layout_get.
>
> Note: define pnfs export operations in a stub form in this patch.
> Actual operations are defined along with their usage.
Patches touching the superblock or the new pnfs export operations should
probably all be cc'd to linux-fsdevel.
--b.
>
> [pnfsd: provide default no-op operations]
> Signed-off-by: Benny Halevy <[email protected]>
> [pnfsd: compile fixes for pnfsd branch]
> Signed-off-by: Fred Isaman <[email protected]>
> [gfs2: set pnfs_dlm_export_ops only for CONFIG_PNFSD]
> [pnfsd: handle s_pnfs_op==NULL]
> Signed-off-by: Benny Halevy <[email protected]>
> Signed-off-by: Benny Halevy <[email protected]>
> ---
> fs/nfsd/export.c | 2 +-
> include/linux/fs.h | 2 ++
> include/linux/nfsd/nfsd4_pnfs.h | 14 ++++++++++++++
> 3 files changed, 17 insertions(+), 1 deletion(-)
>
> diff --git a/fs/nfsd/export.c b/fs/nfsd/export.c
> index 5f38ea3..f26b0b9 100644
> --- a/fs/nfsd/export.c
> +++ b/fs/nfsd/export.c
> @@ -16,7 +16,7 @@
> #include <linux/module.h>
> #include <linux/exportfs.h>
> #include <linux/sunrpc/svc_xprt.h>
> -
> +#include <linux/nfsd/nfsd4_pnfs.h>
> #include <net/ipv6.h>
>
> #include "nfsd.h"
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 3f40547..d9186a4 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -34,6 +34,7 @@
> #include <uapi/linux/fs.h>
>
> struct export_operations;
> +struct pnfs_export_operations;
> struct hd_geometry;
> struct iovec;
> struct nameidata;
> @@ -1251,6 +1252,7 @@ struct super_block {
> const struct dquot_operations *dq_op;
> const struct quotactl_ops *s_qcop;
> const struct export_operations *s_export_op;
> + const struct pnfs_export_operations *s_pnfs_op;
> unsigned long s_flags;
> unsigned long s_magic;
> struct dentry *s_root;
> diff --git a/include/linux/nfsd/nfsd4_pnfs.h b/include/linux/nfsd/nfsd4_pnfs.h
> index 9e7d95e..ff6613e 100644
> --- a/include/linux/nfsd/nfsd4_pnfs.h
> +++ b/include/linux/nfsd/nfsd4_pnfs.h
> @@ -34,4 +34,18 @@
> #ifndef _LINUX_NFSD_NFSD4_PNFS_H
> #define _LINUX_NFSD_NFSD4_PNFS_H
>
> +/*
> + * pNFS export operations vector.
> + *
> + * The filesystem must implement the following methods:
> + * layout_type
> + * get_device_info
> + * layout_get
> + *
> + * All other methods are optional and can be set to NULL if not implemented.
> + */
> +struct pnfs_export_operations {
> + /* stub */
> +};
> +
> #endif /* _LINUX_NFSD_NFSD4_PNFS_H */
> --
> 1.8.3.1
>
From: Andy Adamson <[email protected]>
Export nfsd4_pnfs_dlm_getdevinfo for dlm cluster file system use.
Decide whether or not to send tcp or tcp6 in the netid field of GETDEVICEINFO
replies by checking for the presence of a colon in the address.
[was pnfsd: hardwire DLM cluster file layout get device info]
[pnfsd: move and rename nfsd4_pnfs_fl_getdevinfo]
Signed-off-by: Andy Adamson <[email protected]>
[pnfsd: get rid of devinfo encoding function vector]
Signed-off-by: Benny Halevy <[email protected]>
[pnfsd: fix pnfs_dlm_device string parsing]
Signed-off-by: Andy Adamson <[email protected]>
[pnfsd: more fixes for pnfs_dlm_device string parsing]
[pnfsd: filelayout: get rid of getdevinfo notify_types]
Signed-off-by: Benny Halevy <[email protected]>
Acked-by: Steven Whitehouse <[email protected]>
[pnfsd: rename deviceid_t struct pnfs_deviceid]
[pnfsd: clean up getdeviceinfo export op API]
[pnfsd: getdeviceinfo deviceid needs to be const.]
[pnfsd: dlm: fixup LAYOUT_NFSV4_1_FILES]
[pnfsd-files: prevent NULL deref in nfsd4_pnfs_dlm_getdevinfo]
Signed-off-by: Benny Halevy <[email protected]>
[pnfsd: Correctly set netid to tcp or tcp6 for non-local exports]
Signed-off-by: Michael Groshans <[email protected]>
Signed-off-by: Benny Halevy <[email protected]>
[fix bug in DS tcp/tcp6 address string]
Signed-off-by: Benny Halevy <[email protected]>
---
fs/nfsd/nfs4pnfsdlm.c | 110 ++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 110 insertions(+)
diff --git a/fs/nfsd/nfs4pnfsdlm.c b/fs/nfsd/nfs4pnfsdlm.c
index 0a14f06..19002c1 100644
--- a/fs/nfsd/nfs4pnfsdlm.c
+++ b/fs/nfsd/nfs4pnfsdlm.c
@@ -204,8 +204,118 @@ static int nfsd4_pnfs_dlm_getdeviter(struct super_block *sb,
return 0;
}
+static int nfsd4_pnfs_dlm_getdevinfo(struct super_block *sb,
+ struct exp_xdr_stream *xdr,
+ u32 layout_type,
+ const struct nfsd4_pnfs_deviceid *devid)
+{
+ int err, len, i = 0;
+ struct pnfs_filelayout_device fdev;
+ struct pnfs_filelayout_devaddr *daddr;
+ struct dlm_device_entry *dlm_pdev;
+ char *bufp;
+
+ err = -ENOTSUPP;
+ if (layout_type != LAYOUT_NFSV4_1_FILES) {
+ dprintk("%s: ERROR: layout type isn't 'file' "
+ "(type: %x)\n", __func__, layout_type);
+ return err;
+ }
+
+ /* We only hand out a deviceid of 1 in LAYOUTGET, so a GETDEVICEINFO
+ * with a gdia_device_id != 1 is invalid.
+ */
+ err = -EINVAL;
+ if (devid->devid != 1) {
+ dprintk("%s: WARNING: didn't receive a deviceid of "
+ "1 (got: 0x%llx)\n", __func__, devid->devid);
+ return err;
+ }
+
+ /*
+ * If the DS list has not been established, return -EINVAL
+ */
+ dlm_pdev = nfsd4_find_pnfs_dlm_device(sb->s_bdev->bd_disk->disk_name);
+ if (!dlm_pdev) {
+ dprintk("%s: DEBUG: disk %s Not Found\n", __func__,
+ sb->s_bdev->bd_disk->disk_name);
+ return err;
+ }
+
+ dprintk("%s: Found disk %s with DS list |%s|\n",
+ __func__, dlm_pdev->disk_name, dlm_pdev->ds_list);
+
+ memset(&fdev, '\0', sizeof(fdev));
+ fdev.fl_device_length = dlm_pdev->num_ds;
+
+ err = -ENOMEM;
+ len = sizeof(*fdev.fl_device_list) * fdev.fl_device_length;
+ fdev.fl_device_list = kzalloc(len, GFP_KERNEL);
+ if (!fdev.fl_device_list) {
+ printk(KERN_ERR "%s: ERROR: unable to kmalloc a device list "
+ "buffer for %d DSes.\n", __func__, i);
+ fdev.fl_device_length = 0;
+ goto out;
+ }
+
+ /* Set a simple stripe indicie */
+ fdev.fl_stripeindices_length = fdev.fl_device_length;
+ fdev.fl_stripeindices_list = kzalloc(sizeof(u32) *
+ fdev.fl_stripeindices_length, GFP_KERNEL);
+
+ if (!fdev.fl_stripeindices_list) {
+ printk(KERN_ERR "%s: ERROR: unable to kmalloc a stripeindices "
+ "list buffer for %d DSes.\n", __func__, i);
+ goto out;
+ }
+ for (i = 0; i < fdev.fl_stripeindices_length; i++)
+ fdev.fl_stripeindices_list[i] = i;
+
+ /* Transfer the data server list with a single multipath entry */
+ bufp = dlm_pdev->ds_list;
+ for (i = 0; i < fdev.fl_device_length; i++) {
+ daddr = kmalloc(sizeof(*daddr), GFP_KERNEL);
+ if (!daddr) {
+ printk(KERN_ERR "%s: ERROR: unable to kmalloc a device "
+ "addr buffer.\n", __func__);
+ goto out;
+ }
+
+ len = strcspn(bufp, ",");
+ daddr->r_addr.data = kmalloc(len + 4, GFP_KERNEL);
+ memcpy(daddr->r_addr.data, bufp, len);
+ /*
+ * append the port number. interpreted as two more bytes
+ * beyond the quad: ".8.1" -> 0x08.0x01 -> 0x0801 = port 2049.
+ */
+ memcpy(daddr->r_addr.data + len, ".8.1", 4);
+ daddr->r_addr.len = len + 4;
+
+ daddr->r_netid.data = "tcp6";
+ daddr->r_netid.len = strnchr(daddr->r_addr.data, len, ':') ? 4 : 3;
+
+ fdev.fl_device_list[i].fl_multipath_length = 1;
+ fdev.fl_device_list[i].fl_multipath_list = daddr;
+
+ dprintk("%s: encoding DS |%s|\n", __func__, bufp);
+
+ bufp += len + 1;
+ }
+
+ /* have nfsd encode the device info */
+ err = filelayout_encode_devinfo(xdr, &fdev);
+out:
+ for (i = 0; i < fdev.fl_device_length; i++)
+ kfree(fdev.fl_device_list[i].fl_multipath_list);
+ kfree(fdev.fl_device_list);
+ kfree(fdev.fl_stripeindices_list);
+ dprintk("<-- %s returns %d\n", __func__, err);
+ return err;
+}
+
/* For use by DLM cluster file systems exported by pNFSD */
const struct pnfs_export_operations pnfs_dlm_export_ops = {
+ .get_device_info = nfsd4_pnfs_dlm_getdevinfo,
.get_device_iter = nfsd4_pnfs_dlm_getdeviter,
};
EXPORT_SYMBOL(pnfs_dlm_export_ops);
--
1.8.3.1
On Fri, Sep 27, 2013 at 9:31 AM, Boaz Harrosh <[email protected]> wrote:
> On 09/26/2013 02:36 PM, Benny Halevy wrote:
>> The following patchset implements an extension to nfsd
>> providing a complete minimal pnfs server exporting
>> DLM-based clustered file systems such as GFS2 or OCFS2.
>>
>> The pNFS operations that are implemented are
>> GETDEVICELIST and GETDEVICEINFO,
>> LAYOUTGET and LAYOUTRETURN.
>>
>> The server does the bookkeeping of the outstanding layout
>> state in response to layout get and return.
>>
>> Also, the implementation cleans up the client layout state
>> opon client expiry and on CLOSE when the return_on_close
>> flag is set on the LAYOUTGET response. The latter is the
>> default behavior until layout recalls are implemented
>> with which the server can reclaim its resources in case
>> the client holds layout state post closing files.
>>
>> The patchset is based on v3.12-rc2 and it's available also online here:
>> git://linux-nfs.org/~bhalevy/linux-pnfs.git pnfsd-dlm-3.12-rc2-2013-09-26
>>
>
> I thought that we said that exofs server is going in first. What happened?
exofs requires much more functionality.
To help review the code we need to go through this milestone in any case.
Benny
>
> Cheers
> Boaz
>
>> Benny
>>
>> General infrastructure:
>> [PATCH RFC v0 01/49] pnfsd: Define CONFIG_PNFSD
>> [PATCH RFC v0 02/49] pnfsd: define NFSDDBG_PNFS
>> [PATCH RFC v0 03/49] pnfsd: return pnfs flags on exchange_id
>> [PATCH RFC v0 04/49] pnfsd: don't set up back channel on create_session for ds
>> [PATCH RFC v0 05/49] pnfsd: introduce pnfsd header files
>> [PATCH RFC v0 06/49] pnfsd: define pnfs_export_operations
>> [PATCH RFC v0 07/49] pnfsd: add pnfs export option
>> [PATCH RFC v0 08/49] pnfsd: layout verify
>> [PATCH RFC v0 09/49] pnfsd: initial stub
>>
>> Device ops:
>> [PATCH RFC v0 10/49] pnfsd: use sbid hash table to map super_blocks to devid major identifiers
>> [PATCH RFC v0 11/49] NFSD: introduce exp_xdr.h
>> [PATCH RFC v0 12/49] pnfsd: get device list/info
>> [PATCH RFC v0 13/49] pnfsd: filelayout: get device list/info
>> [PATCH RFC v0 14/49] pnfsd: provide helper for xdr encoding of deviceid
>> [PATCH RFC v0 15/49] pnfsd: add helper functions for identifying DS filehandles
>> [PATCH RFC v0 16/49] pnfsd: accept all ds stateids
>>
>> layout get:
>> [PATCH RFC v0 17/49] DEBUG: nfsd: more client_lock asserts
>> [PATCH RFC v0 18/49] pnfsd: nfs4_assert_state_locked
>> [PATCH RFC v0 19/49] pnfsd: layout get
>> [PATCH RFC v0 20/49] pnfsd: filelayout: layout encoding
>>
>> layout state handling for layout get:
>> [PATCH RFC v0 21/49] nfsd: no need to unhash_stid before free
>> [PATCH RFC v0 22/49] nfsd: cleanup free_stid
>> [PATCH RFC v0 23/49] pnfsd: layout state allocation
>> [PATCH RFC v0 24/49] pnfsd: process the layout stateid
>> [PATCH RFC v0 25/49] pnfsd: layout state per client tracking
>> [PATCH RFC v0 26/49] pnfsd: layout state per file tracking
>> [PATCH RFC v0 27/49] pnfsd: hash layouts on layout state
>> [PATCH RFC v0 28/49] pnfsd: support layout segment merging
>>
>> pnfs attributes:
>> [PATCH RFC v0 29/49] pnfsd: support layout_type attribute
>> [PATCH RFC v0 30/49] pnfsd: make pnfs server return layout_blksize when the client asks for it
>> [PATCH RFC v0 31/49] pnfsd: add support for per-file layout_types attribute
>>
>> pnfsd over dlm:
>> [PATCH RFC v0 32/49] pnfsd: per block device dlm data server list cache
>> [PATCH RFC v0 33/49] pnfsd: Add IP address validation to nfsd4_set_pnfs_dlm_device()
>> [PATCH RFC v0 34/49] pnfsd: new nfsd filesystem file: pnfs_dlm_device
>> [PATCH RFC v0 35/49] pnfsd: nfsd4_pnfs_dlm_getdeviter
>> [PATCH RFC v0 36/49] pnfsd: nfsd4_pnfs_dlm_getdevinfo
>> [PATCH RFC v0 37/49] pnfsd: make /proc/fs/nfsd/pnfs_dlm_device report dlm device list.
>> [PATCH RFC v0 38/49] pnfsd: nfsd4_pnfs_dlm_layoutget
>> [PATCH RFC v0 39/49] pnfsd: DLM file layout only support read iomode layouts
>> [PATCH RFC v0 40/49] pnfsd: add dlm file layout layout-type
>> [PATCH RFC v0 41/49] pnfsd: dlm pnfs_export_operations
>> [PATCH RFC v0 42/49] pnfsd: gfs2: use generic file layout pnfs operations vector
>>
>> layout return / expire / return_on_close:
>> [PATCH RFC v0 43/49] pnfsd: release state lock around iput in put_nfs4_file
>> [PATCH RFC v0 44/49] posix_acl: resolve compile dependency in posix_acl.h
>> [PATCH RFC v0 45/49] nfs: resolve compile dependency in nfs_xdr.h
>> [PATCH RFC v0 46/49] pnfsd: layout return generic implementation
>> [PATCH RFC v0 47/49] pnfsd: pnfs_expire_client
>> [PATCH RFC v0 48/49] pnfsd: return on close
>> [PATCH RFC v0 49/49] pnfsd: dlm set return_on_close to true
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>> the body of a message to [email protected]
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
>
From: Benny Halevy <[email protected]>
Calculate the size of the opaque device_addr4 da_addr_body. Use this size to
compare to the client's gdia_maxcount, and if it's not too small, to reserve
the xdr space once.
Require the file system get_device_info call to return the XDR size of the
device_addr4 da_addr_body in pnfs_xdr_info.bytes_written on NFS4ERR_TOOSMALL
for the gdir_mincount calculation.
Declare a call back into the file system for encoding a multipage stripe
indice.
[extraced from pnfsd: Initial pNFS server implementation.]
Signed-off-by: Benny Halevy <[email protected]>
[pnfsd: update pNFS server ops to draft 13]
Signed-off-by: Marc Eshel <[email protected]>
[pnfsd: Fix server GETDEVICELIST to comply with NFSv4.1 Draft 13]
Signed-off-by: Ricardo Labiaga <[email protected]>
[pnfsd: Simplify device export ops.]
[pnfsd: Remove device enc/free export ops]
[pnfsd: Use 128 bit deviceid on server]
Signed-off-by: Dean Hildebrand <[email protected]>
[pnfsd: filelayout: use nfsd4_compoundres pointer in pnfs_xdr_info]
[pnfsd: filelayout: fix NFS4ERR_TOOSMALL for getdeviceinfo]
[pnfsd: fix filelayout getdeviceinfo devaddr4 length encoding]
[pnfsd: file layout mulitpage getdeviceinfo encode callback]
[Used gfs2_get_device_info from
pnfs-gfs2: initial GETDEVICE* work for pNFS/GFS2 integration]
Signed-off-by: David M. Richter <[email protected]>
Signed-off-by: Frank Filz <[email protected]>
Signed-off-by: Benny Halevy <[email protected]>
[pnfs-gfs2: return correct error value in GETDEVICEINFO]
Signed-off-by: David M. Richter <[email protected]>
Signed-off-by: Benny Halevy <[email protected]>
Signed-off-by: Andy Adamson <[email protected]>
[pnfsd: filelayout: get rid of xdr encoding macros for file layout xdr]
[pnfsd: filelayout: move xdr declarations to nfs4layoutxdr.h]
[pnfsd: get rid of devinfo encoding function vector]
[pnfsd: filelayout: strictly define filelayout_encode_devinfo]
[pnfsd: mv nfs4filelayoutxdr to fs/exportfs]
[pnfsd: filelayout: convert to using exp_xdr]
[exportfs: filelayout: disable dprintk]
Signed-off-by: Benny Halevy <[email protected]>
[pnfsd: exportfs: fix build warning]
Signed-off-by: Boaz Harrosh <[email protected]>
[pnfsd: rename deviceid_t struct pnfs_deviceid]
[pnfsd: fix cosmetic checkpatch warnings]
[conditionally build nfs4filelayoutxdr using config option]
[pnfsd: rename device fsid member to sbid]
Signed-off-by: Benny Halevy <[email protected]>
[pnfsd: EXPORTFS_FILE_LAYOUT should be prompt-less]
Signed-off-by: Boaz Harrosh <[email protected]>
Signed-off-by: Benny Halevy <[email protected]>
Signed-off-by: Benny Halevy <[email protected]>
---
fs/Kconfig | 7 ++
fs/exportfs/Makefile | 3 +-
fs/exportfs/nfs4filelayoutxdr.c | 133 +++++++++++++++++++++++++++++++++++++
fs/nfsd/Kconfig | 1 +
fs/nfsd/nfs4proc.c | 1 +
fs/nfsd/nfs4xdr.c | 1 +
include/linux/exportfs.h | 9 +++
include/linux/nfsd/nfs4layoutxdr.h | 58 ++++++++++++++++
include/uapi/linux/nfsd/debug.h | 1 +
9 files changed, 213 insertions(+), 1 deletion(-)
create mode 100644 fs/exportfs/nfs4filelayoutxdr.c
create mode 100644 include/linux/nfsd/nfs4layoutxdr.h
diff --git a/fs/Kconfig b/fs/Kconfig
index c229f82..7c4af37 100644
--- a/fs/Kconfig
+++ b/fs/Kconfig
@@ -53,6 +53,13 @@ config FS_POSIX_ACL
config EXPORTFS
tristate
+config EXPORTFS_FILE_LAYOUT
+ bool
+ depends on PNFSD && EXPORTFS
+ help
+ Exportfs support for the NFSv4.1 files layout type.
+ Must be automatically selected by supporting filesystems.
+
config FILE_LOCKING
bool "Enable POSIX file locking API" if EXPERT
default y
diff --git a/fs/exportfs/Makefile b/fs/exportfs/Makefile
index d7c5d4d..658207d 100644
--- a/fs/exportfs/Makefile
+++ b/fs/exportfs/Makefile
@@ -3,4 +3,5 @@
obj-$(CONFIG_EXPORTFS) += exportfs.o
-exportfs-objs := expfs.o
+exportfs-y := expfs.o
+exportfs-$(CONFIG_EXPORTFS_FILE_LAYOUT) += nfs4filelayoutxdr.o
diff --git a/fs/exportfs/nfs4filelayoutxdr.c b/fs/exportfs/nfs4filelayoutxdr.c
new file mode 100644
index 0000000..4801bfe
--- /dev/null
+++ b/fs/exportfs/nfs4filelayoutxdr.c
@@ -0,0 +1,133 @@
+/*
+ * Copyright (c) 2006 The Regents of the University of Michigan.
+ * All rights reserved.
+ *
+ * Andy Adamson <[email protected]>
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ * 1. Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in the
+ * documentation and/or other materials provided with the distribution.
+ * 3. Neither the name of the University nor the names of its
+ * contributors may be used to endorse or promote products derived
+ * from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESS OR IMPLIED
+ * WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
+ * MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+ * DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
+ * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+ * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+ * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR
+ * BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
+ * LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
+ * NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
+ * SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+#include <linux/exp_xdr.h>
+#include <linux/module.h>
+#include <linux/nfsd/nfs4layoutxdr.h>
+
+/* We do our-own dprintk so filesystems are not dependent on sunrpc */
+#ifdef dprintk
+#undef dprintk
+#endif
+#define dprintk(fmt, args, ...) do { } while (0)
+
+/* Calculate the XDR length of the GETDEVICEINFO4resok structure
+ * excluding the gdir_notification and the gdir_device_addr da_layout_type.
+ */
+static int fl_devinfo_xdr_words(const struct pnfs_filelayout_device *fdev)
+{
+ struct pnfs_filelayout_devaddr *fl_addr;
+ struct pnfs_filelayout_multipath *mp;
+ int i, j, nwords;
+
+ /* da_addr_body length, indice length, indices,
+ * multipath_list4 length */
+ nwords = 1 + 1 + fdev->fl_stripeindices_length + 1;
+ for (i = 0; i < fdev->fl_device_length; i++) {
+ mp = &fdev->fl_device_list[i];
+ nwords++; /* multipath list length */
+ for (j = 0; j < mp->fl_multipath_length; j++) {
+ fl_addr = mp->fl_multipath_list;
+ nwords += 1 + exp_xdr_qwords(fl_addr->r_netid.len);
+ nwords += 1 + exp_xdr_qwords(fl_addr->r_addr.len);
+ }
+ }
+ dprintk("<-- %s nwords %d\n", __func__, nwords);
+ return nwords;
+}
+
+/* Encodes the nfsv4_1_file_layout_ds_addr4 structure from draft 13
+ * on the response stream.
+ * Use linux error codes (not nfs) since these values are being
+ * returned to the file system.
+ */
+int
+filelayout_encode_devinfo(struct exp_xdr_stream *xdr,
+ const struct pnfs_filelayout_device *fdev)
+{
+ unsigned int i, j, len = 0, opaque_words;
+ u32 *p_in;
+ u32 index_count = fdev->fl_stripeindices_length;
+ u32 dev_count = fdev->fl_device_length;
+ int error = 0;
+ __be32 *p;
+
+ opaque_words = fl_devinfo_xdr_words(fdev);
+ dprintk("%s: Begin indx_cnt: %u dev_cnt: %u total size %u\n",
+ __func__,
+ index_count,
+ dev_count,
+ opaque_words*4);
+
+ /* check space for opaque length */
+ p = p_in = exp_xdr_reserve_qwords(xdr, opaque_words);
+ if (!p) {
+ error = -ETOOSMALL;
+ goto out;
+ }
+
+ /* Fill in length later */
+ p++;
+
+ /* encode device list indices */
+ p = exp_xdr_encode_u32(p, index_count);
+ for (i = 0; i < index_count; i++)
+ p = exp_xdr_encode_u32(p, fdev->fl_stripeindices_list[i]);
+
+ /* encode device list */
+ p = exp_xdr_encode_u32(p, dev_count);
+ for (i = 0; i < dev_count; i++) {
+ struct pnfs_filelayout_multipath *mp = &fdev->fl_device_list[i];
+
+ p = exp_xdr_encode_u32(p, mp->fl_multipath_length);
+ for (j = 0; j < mp->fl_multipath_length; j++) {
+ struct pnfs_filelayout_devaddr *da =
+ &mp->fl_multipath_list[j];
+
+ /* Encode device info */
+ p = exp_xdr_encode_opaque(p, da->r_netid.data,
+ da->r_netid.len);
+ p = exp_xdr_encode_opaque(p, da->r_addr.data,
+ da->r_addr.len);
+ }
+ }
+
+ /* backfill in length. Subtract 4 for da_addr_body size */
+ len = (char *)p - (char *)p_in;
+ exp_xdr_encode_u32(p_in, len - 4);
+
+ error = 0;
+out:
+ dprintk("%s: End err %d xdrlen %d\n",
+ __func__, error, len);
+ return error;
+}
+EXPORT_SYMBOL(filelayout_encode_devinfo);
diff --git a/fs/nfsd/Kconfig b/fs/nfsd/Kconfig
index 4d68a8c..1cea26c 100644
--- a/fs/nfsd/Kconfig
+++ b/fs/nfsd/Kconfig
@@ -110,6 +110,7 @@ config NFSD_FAULT_INJECTION
config PNFSD
bool "NFSv4.1 server support for Parallel NFS (pNFS) (EXPERIMENTAL)"
depends on NFSD_V4
+ select EXPORTFS_FILE_LAYOUT
help
This option enables support for the parallel NFS features of the
minor version 1 of the NFSv4 protocol (RFC5661)
diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
index feea3a9..81d41a4 100644
--- a/fs/nfsd/nfs4proc.c
+++ b/fs/nfsd/nfs4proc.c
@@ -34,6 +34,7 @@
*/
#include <linux/file.h>
#include <linux/slab.h>
+#include <linux/nfsd/nfs4layoutxdr.h>
#include "idmap.h"
#include "cache.h"
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index a761514..ed86a2d 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -47,6 +47,7 @@
#include <linux/pagemap.h>
#include <linux/sunrpc/svcauth_gss.h>
#include <linux/exportfs.h>
+#include <linux/nfsd/nfs4layoutxdr.h>
#include "idmap.h"
#include "acl.h"
diff --git a/include/linux/exportfs.h b/include/linux/exportfs.h
index 41b223a..ade74e1 100644
--- a/include/linux/exportfs.h
+++ b/include/linux/exportfs.h
@@ -2,6 +2,7 @@
#define LINUX_EXPORTFS_H 1
#include <linux/types.h>
+#include <linux/exp_xdr.h>
struct dentry;
struct inode;
@@ -211,4 +212,12 @@ extern struct dentry *generic_fh_to_parent(struct super_block *sb,
struct fid *fid, int fh_len, int fh_type,
struct inode *(*get_inode) (struct super_block *sb, u64 ino, u32 gen));
+#if defined(CONFIG_EXPORTFS_FILE_LAYOUT)
+struct pnfs_filelayout_device;
+struct pnfs_filelayout_layout;
+
+extern int filelayout_encode_devinfo(struct exp_xdr_stream *xdr,
+ const struct pnfs_filelayout_device *fdev);
+
+#endif /* defined(CONFIG_EXPORTFS_FILE_LAYOUT) */
#endif /* LINUX_EXPORTFS_H */
diff --git a/include/linux/nfsd/nfs4layoutxdr.h b/include/linux/nfsd/nfs4layoutxdr.h
new file mode 100644
index 0000000..752055f
--- /dev/null
+++ b/include/linux/nfsd/nfs4layoutxdr.h
@@ -0,0 +1,58 @@
+/*
+ * Copyright (c) 2006 The Regents of the University of Michigan.
+ * All rights reserved.
+ *
+ * Andy Adamson <[email protected]>
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ * 1. Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in the
+ * documentation and/or other materials provided with the distribution.
+ * 3. Neither the name of the University nor the names of its
+ * contributors may be used to endorse or promote products derived
+ * from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESS OR IMPLIED
+ * WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
+ * MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+ * DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
+ * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+ * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+ * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR
+ * BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
+ * LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
+ * NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
+ * SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ *
+ */
+
+#ifndef NFSD_NFS4LAYOUTXDR_H
+#define NFSD_NFS4LAYOUTXDR_H
+
+#include <linux/sunrpc/xdr.h>
+
+/* the nfsd4_pnfs_devlist dev_addr for the file layout type */
+struct pnfs_filelayout_devaddr {
+ struct xdr_netobj r_netid;
+ struct xdr_netobj r_addr;
+};
+
+/* list of multipath servers */
+struct pnfs_filelayout_multipath {
+ u32 fl_multipath_length;
+ struct pnfs_filelayout_devaddr *fl_multipath_list;
+};
+
+struct pnfs_filelayout_device {
+ u32 fl_stripeindices_length;
+ u32 *fl_stripeindices_list;
+ u32 fl_device_length;
+ struct pnfs_filelayout_multipath *fl_device_list;
+};
+
+#endif /* NFSD_NFS4LAYOUTXDR_H */
diff --git a/include/uapi/linux/nfsd/debug.h b/include/uapi/linux/nfsd/debug.h
index 168f3a3..7444e9d 100644
--- a/include/uapi/linux/nfsd/debug.h
+++ b/include/uapi/linux/nfsd/debug.h
@@ -33,6 +33,7 @@
#define NFSDDBG_XDR 0x0100
#define NFSDDBG_LOCKD 0x0200
#define NFSDDBG_PNFS 0x0400
+#define NFSDDBG_FILELAYOUT 0x0800
#define NFSDDBG_ALL 0x7FFF
#define NFSDDBG_NOCHANGE 0xFFFF
--
1.8.3.1
The client may hold a layout post CLOSE essentially forever,
requiring the server to remember that.
However, since we keep track of the layout state by hanging it on the
respective file and client structures, when the file goes away we lose
track of the layout therefore we need to set return_on_close to true.
Signed-off-by: Benny Halevy <[email protected]>
---
fs/nfsd/nfs4pnfsdlm.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/fs/nfsd/nfs4pnfsdlm.c b/fs/nfsd/nfs4pnfsdlm.c
index de1af22..093d836 100644
--- a/fs/nfsd/nfs4pnfsdlm.c
+++ b/fs/nfsd/nfs4pnfsdlm.c
@@ -406,6 +406,8 @@ static enum nfsstat4 nfsd4_pnfs_dlm_layoutget(struct inode *inode,
res->lg_seg.length = NFS4_MAX_UINT64;
/* Always give out READ ONLY layouts */
res->lg_seg.iomode = IOMODE_READ;
+ /* Set return_on_close to true until we track layout state post CLOSE */
+ res->lg_return_on_close = 1;
layout = kzalloc(sizeof(*layout), GFP_KERNEL);
if (layout == NULL) {
--
1.8.3.1
From: Andy Adamson <[email protected]>
pnfsd data structures used internally and over the export API.
[extracted from pnfsd: Initial pNFS server implementation.]
[pnfsd: remove CONFIG_PNFSD from nfsd4_pnfs.h]
Signed-off-by: Andy Adamson <[email protected]>
[moved {include/linux,fs}/nfsd/pnfsd.h]
Signed-off-by: Benny Halevy <[email protected]>
Signed-off-by: Benny Halevy <[email protected]>
---
fs/nfsd/pnfsd.h | 37 +++++++++++++++++++++++++++++++++++++
include/linux/nfsd/nfsd4_pnfs.h | 37 +++++++++++++++++++++++++++++++++++++
2 files changed, 74 insertions(+)
create mode 100644 fs/nfsd/pnfsd.h
create mode 100644 include/linux/nfsd/nfsd4_pnfs.h
diff --git a/fs/nfsd/pnfsd.h b/fs/nfsd/pnfsd.h
new file mode 100644
index 0000000..65fb57e
--- /dev/null
+++ b/fs/nfsd/pnfsd.h
@@ -0,0 +1,37 @@
+/*
+ * Copyright (c) 2005 The Regents of the University of Michigan.
+ * All rights reserved.
+ *
+ * Andy Adamson <[email protected]>
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ * 1. Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in the
+ * documentation and/or other materials provided with the distribution.
+ * 3. Neither the name of the University nor the names of its
+ * contributors may be used to endorse or promote products derived
+ * from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESS OR IMPLIED
+ * WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
+ * MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+ * DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
+ * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+ * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+ * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR
+ * BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
+ * LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
+ * NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
+ * SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ *
+ */
+
+#ifndef LINUX_NFSD_PNFSD_H
+#define LINUX_NFSD_PNFSD_H
+
+#endif /* LINUX_NFSD_PNFSD_H */
diff --git a/include/linux/nfsd/nfsd4_pnfs.h b/include/linux/nfsd/nfsd4_pnfs.h
new file mode 100644
index 0000000..9e7d95e
--- /dev/null
+++ b/include/linux/nfsd/nfsd4_pnfs.h
@@ -0,0 +1,37 @@
+/*
+ * Copyright (c) 2006 The Regents of the University of Michigan.
+ * All rights reserved.
+ *
+ * Andy Adamson <[email protected]>
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ * 1. Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in the
+ * documentation and/or other materials provided with the distribution.
+ * 3. Neither the name of the University nor the names of its
+ * contributors may be used to endorse or promote products derived
+ * from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESS OR IMPLIED
+ * WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
+ * MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+ * DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
+ * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+ * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+ * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR
+ * BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
+ * LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
+ * NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
+ * SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ *
+ */
+
+#ifndef _LINUX_NFSD_NFSD4_PNFS_H
+#define _LINUX_NFSD_NFSD4_PNFS_H
+
+#endif /* _LINUX_NFSD_NFSD4_PNFS_H */
--
1.8.3.1
From: Andy Adamson <[email protected]>
Export nfsd4_pnfs_dlm_getdeviter for dlm cluster file system use.
[was pnfsd: hardwire DLM cluster file layout get device iterator]
Signed-off-by: David M. Richter <[email protected]>
Signed-off-by: Frank Filz <[email protected]>
[pnfs-gfs2: return correct error value in GETDEVICEINFO]
Signed-off-by: David M. Richter <[email protected]>
[Use the GFS2 iterator as the default file layout iterator.]
Signed-off-by: Andy Adamson <[email protected]>
[Add the pnfsd default file layout getdevice info]
Signed-off-by: David M. Richter <[email protected]>
Signed-off-by: Frank Filz <[email protected]>
[pnfs-gfs2: return correct error value in GETDEVICEINFO]
Signed-off-by: David M. Richter <[email protected]>
[pnfsd: move and rename nfsd4_pnfs_fl_getdeviter]
Signed-off-by: Andy Adamson <[email protected]>
Signed-off-by: Benny Halevy <[email protected]>
Acked-by: Steven Whitehouse <[email protected]>
[pnfsd: dev_iter: clean up export API]
[pnfsd: dlm: fixup LAYOUT_NFSV4_1_FILES]
Signed-off-by: Benny Halevy <[email protected]>
Signed-off-by: Benny Halevy <[email protected]>
---
fs/nfsd/nfs4pnfsdlm.c | 29 +++++++++++++++++++++++++++++
1 file changed, 29 insertions(+)
diff --git a/fs/nfsd/nfs4pnfsdlm.c b/fs/nfsd/nfs4pnfsdlm.c
index ddc2188..0a14f06 100644
--- a/fs/nfsd/nfs4pnfsdlm.c
+++ b/fs/nfsd/nfs4pnfsdlm.c
@@ -21,8 +21,11 @@
*
******************************************************************************/
+#include <linux/nfs4.h>
+#include <linux/export.h>
#include <linux/nfsd/debug.h>
#include <linux/nfsd/nfs4pnfsdlm.h>
+#include <linux/nfsd/nfs4layoutxdr.h>
#include <linux/sunrpc/addr.h>
#define NFSDDBG_FACILITY NFSDDBG_FILELAYOUT
@@ -180,3 +183,29 @@ void nfsd4_pnfs_dlm_shutdown(void)
}
spin_unlock(&dlm_device_list_lock);
}
+
+static int nfsd4_pnfs_dlm_getdeviter(struct super_block *sb,
+ u32 layout_type,
+ struct nfsd4_pnfs_dev_iter_res *res)
+{
+ if (layout_type != LAYOUT_NFSV4_1_FILES) {
+ printk(KERN_ERR "%s: ERROR: layout type isn't 'file' "
+ "(type: %x)\n", __func__, layout_type);
+ return -ENOTSUPP;
+ }
+
+ res->gd_eof = 1;
+ if (res->gd_cookie)
+ return -ENOENT;
+
+ res->gd_cookie = 1;
+ res->gd_verf = 1;
+ res->gd_devid = 1;
+ return 0;
+}
+
+/* For use by DLM cluster file systems exported by pNFSD */
+const struct pnfs_export_operations pnfs_dlm_export_ops = {
+ .get_device_iter = nfsd4_pnfs_dlm_getdeviter,
+};
+EXPORT_SYMBOL(pnfs_dlm_export_ops);
--
1.8.3.1
From: Benny Halevy <[email protected]>
Set the cl_exchange_flags to be non_pnfs if we do not set
either pnfs or ds (in the plain old nfs41 case).
Note that we always set both MDS and DS exchangeid capability flags
when CONFIG_PNFSD is enabled.
The client needs to remember what the session is used for
if it cares to distiguish between DSs and MDSs.
EXCHGID4_FLAG_USE_NON_PNFS should be set when the server does not support
operations (e.g. LAYOUTGET) or attributes that pertain to pNFS.
[extraced from pnfsd: Initial pNFS server implementation.]
Signed-off-by: Benny Halevy <[email protected]>
[pnfsd: Fixup nfsd4_set_ex_flags.]
Signed-off-by: Dean Hildebrand <[email protected]>
[pnfsd: set EXCHGID4_FLAG_USE_NON_PNFS when !CONFIG_PNFSD]
[pnfsd: fix compiler warning in nfsd4_set_ex_flags when CONFIG_PNFSD is not defined]
[pnfsd: always set both MDS and DS exchangeid capability flags]
Signed-off-by: Benny Halevy <[email protected]>
Signed-off-by: Benny Halevy <[email protected]>
---
fs/nfsd/nfs4state.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index 57a0340..21c15fc 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -1614,8 +1614,12 @@ static bool clp_used_exchangeid(struct nfs4_client *clp)
static void
nfsd4_set_ex_flags(struct nfs4_client *new, struct nfsd4_exchange_id *clid)
{
- /* pNFS is not supported */
+#if defined(CONFIG_PNFSD)
+ new->cl_exchange_flags |= EXCHGID4_FLAG_USE_PNFS_MDS |
+ EXCHGID4_FLAG_USE_PNFS_DS;
+#else /* CONFIG_PNFSD */
new->cl_exchange_flags |= EXCHGID4_FLAG_USE_NON_PNFS;
+#endif /* CONFIG_PNFSD */
/* Referrals are supported, Migration is not. */
new->cl_exchange_flags |= EXCHGID4_FLAG_SUPP_MOVED_REFER;
--
1.8.3.1
Make sure the state lock state is taken.
Signed-off-by: Benny Halevy <[email protected]>
---
fs/nfsd/nfs4state.c | 15 +++++++++++++++
fs/nfsd/state.h | 1 +
2 files changed, 16 insertions(+)
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index 68b6f7a..f6022a6 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -135,6 +135,12 @@ static bool is_client_expired(struct nfs4_client *clp)
return clp->cl_time == 0;
}
+void
+nfs4_assert_state_locked(void)
+{
+ BUG_ON(!mutex_is_locked(&client_mutex));
+}
+
static __be32 mark_client_expired_locked(struct nfs4_client *clp)
{
if (atomic_read(&clp->cl_refcount))
@@ -331,6 +337,7 @@ static struct nfs4_stid *nfs4_alloc_stid(struct nfs4_client *cl, struct
struct nfs4_stid *stid;
int new_id;
+ nfs4_assert_state_locked();
stid = kmem_cache_alloc(slab, GFP_KERNEL);
if (!stid)
return NULL;
@@ -400,6 +407,7 @@ static void remove_stid(struct nfs4_stid *s)
{
struct idr *stateids = &s->sc_client->cl_stateids;
+ nfs4_assert_state_locked();
idr_remove(stateids, s->sc_stateid.si_opaque.so_id);
}
@@ -1306,6 +1314,7 @@ static struct nfs4_stid *find_stateid(struct nfs4_client *cl, stateid_t *t)
{
struct nfs4_stid *ret;
+ nfs4_assert_state_locked();
ret = idr_find(&cl->cl_stateids, t->si_opaque.so_id);
if (!ret || !ret->sc_type)
return NULL;
@@ -1394,6 +1403,7 @@ static struct nfs4_client *create_client(struct xdr_netobj name,
struct rb_node *node = root->rb_node;
struct nfs4_client *clp;
+ nfs4_assert_state_locked();
while (node) {
clp = rb_entry(node, struct nfs4_client, cl_namenode);
cmp = compare_blob(&clp->cl_name, name);
@@ -1440,6 +1450,7 @@ static struct nfs4_client *create_client(struct xdr_netobj name,
struct nfs4_client *clp;
unsigned int idhashval = clientid_hashval(clid->cl_id);
+ nfs4_assert_state_locked();
list_for_each_entry(clp, &tbl[idhashval], cl_idhash) {
if (same_clid(&clp->cl_clientid, clid)) {
if ((bool)clp->cl_minorversion != sessions)
@@ -2589,6 +2600,7 @@ static void hash_openowner(struct nfs4_openowner *oo, struct nfs4_client *clp, u
{
struct nfsd_net *nn = net_generic(clp->net, nfsd_net_id);
+ nfs4_assert_state_locked();
list_add(&oo->oo_owner.so_strhash, &nn->ownerstr_hashtbl[strhashval]);
list_add(&oo->oo_perclient, &clp->cl_openowners);
}
@@ -2655,6 +2667,7 @@ static void init_open_stateid(struct nfs4_ol_stateid *stp, struct nfs4_file *fp,
struct nfs4_openowner *oo;
struct nfs4_client *clp;
+ nfs4_assert_state_locked();
list_for_each_entry(so, &nn->ownerstr_hashtbl[hashval], so_strhash) {
if (!so->so_is_open_owner)
continue;
@@ -4166,6 +4179,7 @@ static bool same_lockowner_ino(struct nfs4_lockowner *lo, struct inode *inode, c
unsigned int hashval = lockowner_ino_hashval(inode, clid->cl_id, owner);
struct nfs4_lockowner *lo;
+ nfs4_assert_state_locked();
list_for_each_entry(lo, &nn->lockowner_ino_hashtbl[hashval], lo_owner_ino_hash) {
if (same_lockowner_ino(lo, inode, clid, owner))
return lo;
@@ -4180,6 +4194,7 @@ static void hash_lockowner(struct nfs4_lockowner *lo, unsigned int strhashval, s
clp->cl_clientid.cl_id, &lo->lo_owner.so_owner);
struct nfsd_net *nn = net_generic(clp->net, nfsd_net_id);
+ nfs4_assert_state_locked();
list_add(&lo->lo_owner.so_strhash, &nn->ownerstr_hashtbl[strhashval]);
list_add(&lo->lo_owner_ino_hash, &nn->lockowner_ino_hashtbl[inohash]);
list_add(&lo->lo_perstateid, &open_stp->st_lockowners);
diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h
index 424d8f5..2e601a2 100644
--- a/fs/nfsd/state.h
+++ b/fs/nfsd/state.h
@@ -459,6 +459,7 @@ extern __be32 nfs4_preprocess_stateid_op(struct net *net,
stateid_t *stateid, int flags, struct file **filp);
extern void nfs4_lock_state(void);
extern void nfs4_unlock_state(void);
+extern void nfs4_assert_state_locked(void);
void nfs4_remove_reclaim_record(struct nfs4_client_reclaim *, struct nfsd_net *);
extern void nfs4_release_reclaim(struct nfsd_net *);
extern struct nfs4_client_reclaim *nfsd4_find_reclaim_client(const char *recdir,
--
1.8.3.1
Simulate layout_return and remove all layouts held by the client closing the file.
pnfs_return_file_layouts is used as if the client returned its layout for the
file with RETURN_FILE and <IOMODE_ANY, offset=0, length=NFS4_MAX_UINT64>
Signed-off-by: Benny Halevy <[email protected]>
---
fs/nfsd/nfs4pnfsd.c | 47 +++++++++++++++++++++++++++++++++++++++++++++++
fs/nfsd/nfs4state.c | 2 ++
fs/nfsd/pnfsd.h | 1 +
fs/nfsd/state.h | 2 ++
4 files changed, 52 insertions(+)
diff --git a/fs/nfsd/nfs4pnfsd.c b/fs/nfsd/nfs4pnfsd.c
index d18e2a1..57ca89f 100644
--- a/fs/nfsd/nfs4pnfsd.c
+++ b/fs/nfsd/nfs4pnfsd.c
@@ -137,6 +137,7 @@ struct sbid_tracker {
new->ls_client = clp;
get_nfs4_file(fp); /* released on destroy_layout_state */
new->ls_file = fp;
+ new->ls_roc = false;
spin_lock(&layout_lock);
list_add(&new->ls_perclnt, &clp->cl_lo_states);
list_add(&new->ls_perfile, &fp->fi_lo_states);
@@ -269,6 +270,15 @@ static void update_layout_stateid_locked(struct nfs4_layout_state *ls, stateid_t
__func__, sid->si_generation, ls);
}
+static void update_layout_roc(struct nfs4_layout_state *ls, bool roc)
+{
+ if (roc) {
+ ls->ls_roc = true;
+ dprintk("%s: Marked return_on_close on layoutstate %p\n",
+ __func__, ls);
+ }
+}
+
static void
init_layout(struct nfs4_layout *lp,
struct nfs4_layout_state *ls,
@@ -631,6 +641,7 @@ struct super_block *
lgp->lg_seg = res.lg_seg;
lgp->lg_roc = res.lg_return_on_close;
+ update_layout_roc(ls, res.lg_return_on_close);
/* SUCCESS!
* Can the new layout be merged into an existing one?
@@ -884,3 +895,39 @@ void pnfs_expire_client(struct nfs4_client *clp)
destroy_layout_list(&lo_destroy_list);
}
+
+/* Return On Close:
+ * Look for all layouts of @fp that belong to @clp, remove
+ * the layout and simulate a layout_return. Surly the client has forgotten
+ * these layouts or it would return them before the close.
+ *
+ * Note: must be called under the state lock
+ */
+void pnfsd_roc(struct nfs4_client *clp, struct nfs4_file *fp)
+{
+ struct nfsd4_pnfs_layoutreturn lr = {
+ .args.lr_return_type = RETURN_FILE,
+ .args.lr_seg = {
+ .iomode = IOMODE_ANY,
+ .offset = 0,
+ .length = NFS4_MAX_UINT64,
+ },
+ };
+ LIST_HEAD(lo_destroy_list);
+ struct nfs4_layout_state *ls;
+
+ nfs4_assert_state_locked();
+
+ spin_lock(&layout_lock);
+ list_for_each_entry (ls, &fp->fi_lo_states, ls_perfile) {
+ if (ls->ls_client != clp)
+ continue;
+ spin_unlock(&layout_lock);
+ pnfs_return_file_layouts(&lr, ls, &lo_destroy_list);
+ goto out;
+ }
+ spin_unlock(&layout_lock);
+
+out:
+ destroy_layout_list(&lo_destroy_list);
+}
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index 600edbc..5568f3d 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -4081,6 +4081,8 @@ static void nfsd4_close_open_stateid(struct nfs4_ol_stateid *s)
update_stateid(&stp->st_stid.sc_stateid);
memcpy(&close->cl_stateid, &stp->st_stid.sc_stateid, sizeof(stateid_t));
+ pnfsd_roc(stp->st_stateowner->so_client, stp->st_file);
+
nfsd4_close_open_stateid(stp);
if (cstate->minorversion)
diff --git a/fs/nfsd/pnfsd.h b/fs/nfsd/pnfsd.h
index 7ced4f3..af6842e 100644
--- a/fs/nfsd/pnfsd.h
+++ b/fs/nfsd/pnfsd.h
@@ -49,6 +49,7 @@ struct nfs4_layout_state {
struct list_head ls_perfile;
struct nfs4_file *ls_file;
struct list_head ls_layouts;
+ bool ls_roc;
};
/* outstanding layout */
diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h
index 90ec8b9..7d9c724 100644
--- a/fs/nfsd/state.h
+++ b/fs/nfsd/state.h
@@ -507,10 +507,12 @@ extern struct nfs4_client_reclaim *nfs4_client_to_reclaim(const char *name,
extern int nfsd4_init_pnfs_slabs(void);
extern void nfsd4_free_pnfs_slabs(void);
extern void pnfs_expire_client(struct nfs4_client *);
+extern void pnfsd_roc(struct nfs4_client *clp, struct nfs4_file *fp);
#else /* CONFIG_PNFSD */
static inline void nfsd4_free_pnfs_slabs(void) {}
static inline int nfsd4_init_pnfs_slabs(void) { return 0; }
static inline void pnfs_expire_client(struct nfs4_client *clp) {}
+static inline void pnfsd_roc(struct nfs4_client *clp, struct nfs4_file *fp) {}
#endif /* CONFIG_PNFSD */
static inline u64
--
1.8.3.1
On 2013-09-29 15:13, Christoph Hellwig wrote:
> On Sun, Sep 29, 2013 at 03:12:41PM +0300, Benny Halevy wrote:
>>> Also why would you want a header
>>> outside fs/nfsd/ ?
>>
>> This header contains the file system interface.
>
> Any interface for the filesystem should be part of exportfs.h, not
> something nfs-specific.
Makes sense. Thanks.
Bruce - are you ok with moving the pnfs interface definitions to
include/linux/exportfs.h along with struct export_operations?
In fact we can actually extend struct export_operations rather
than adding pnfs_export_operations...
Benny
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
I can't see any reason why you'd want to split this from the normal
export_operations.
On 2013-09-26 15:44, J. Bruce Fields wrote:
> On Thu, Sep 26, 2013 at 02:36:16PM -0400, Benny Halevy wrote:
>> The following patchset implements an extension to nfsd
>> providing a complete minimal pnfs server exporting
>> DLM-based clustered file systems such as GFS2 or OCFS2.
>>
>> The pNFS operations that are implemented are
>> GETDEVICELIST and GETDEVICEINFO,
>> LAYOUTGET and LAYOUTRETURN.
>>
>> The server does the bookkeeping of the outstanding layout
>> state in response to layout get and return.
>>
>> Also, the implementation cleans up the client layout state
>> opon client expiry and on CLOSE when the return_on_close
>> flag is set on the LAYOUTGET response. The latter is the
>> default behavior until layout recalls are implemented
>> with which the server can reclaim its resources in case
>> the client holds layout state post closing files.
>>
>> The patchset is based on v3.12-rc2 and it's available also online here:
>> git://linux-nfs.org/~bhalevy/linux-pnfs.git pnfsd-dlm-3.12-rc2-2013-09-26
>
> Is there any userland support code required?
There is a small change to exportfs and mountd to provide for the pnfs export option:
http://git.linux-nfs.org/?p=bhalevy/pnfs-nfs-utils.git;a=commitdiff;h=b3632132ed8f3682ee5e8cc979cd7129b8ab4d4a
>
> What's the quickest way to get a test setup--is
>
> http://wiki.linux-nfs.org/wiki/index.php/PNFS_Setup_Instructions
>
> accurate?
It should be, though I admit I haven't tested these instructions in a long while.
Benny
>
> --b.
>
>>
>> Benny
>>
>> General infrastructure:
>> [PATCH RFC v0 01/49] pnfsd: Define CONFIG_PNFSD
>> [PATCH RFC v0 02/49] pnfsd: define NFSDDBG_PNFS
>> [PATCH RFC v0 03/49] pnfsd: return pnfs flags on exchange_id
>> [PATCH RFC v0 04/49] pnfsd: don't set up back channel on create_session for ds
>> [PATCH RFC v0 05/49] pnfsd: introduce pnfsd header files
>> [PATCH RFC v0 06/49] pnfsd: define pnfs_export_operations
>> [PATCH RFC v0 07/49] pnfsd: add pnfs export option
>> [PATCH RFC v0 08/49] pnfsd: layout verify
>> [PATCH RFC v0 09/49] pnfsd: initial stub
>>
>> Device ops:
>> [PATCH RFC v0 10/49] pnfsd: use sbid hash table to map super_blocks to devid major identifiers
>> [PATCH RFC v0 11/49] NFSD: introduce exp_xdr.h
>> [PATCH RFC v0 12/49] pnfsd: get device list/info
>> [PATCH RFC v0 13/49] pnfsd: filelayout: get device list/info
>> [PATCH RFC v0 14/49] pnfsd: provide helper for xdr encoding of deviceid
>> [PATCH RFC v0 15/49] pnfsd: add helper functions for identifying DS filehandles
>> [PATCH RFC v0 16/49] pnfsd: accept all ds stateids
>>
>> layout get:
>> [PATCH RFC v0 17/49] DEBUG: nfsd: more client_lock asserts
>> [PATCH RFC v0 18/49] pnfsd: nfs4_assert_state_locked
>> [PATCH RFC v0 19/49] pnfsd: layout get
>> [PATCH RFC v0 20/49] pnfsd: filelayout: layout encoding
>>
>> layout state handling for layout get:
>> [PATCH RFC v0 21/49] nfsd: no need to unhash_stid before free
>> [PATCH RFC v0 22/49] nfsd: cleanup free_stid
>> [PATCH RFC v0 23/49] pnfsd: layout state allocation
>> [PATCH RFC v0 24/49] pnfsd: process the layout stateid
>> [PATCH RFC v0 25/49] pnfsd: layout state per client tracking
>> [PATCH RFC v0 26/49] pnfsd: layout state per file tracking
>> [PATCH RFC v0 27/49] pnfsd: hash layouts on layout state
>> [PATCH RFC v0 28/49] pnfsd: support layout segment merging
>>
>> pnfs attributes:
>> [PATCH RFC v0 29/49] pnfsd: support layout_type attribute
>> [PATCH RFC v0 30/49] pnfsd: make pnfs server return layout_blksize when the client asks for it
>> [PATCH RFC v0 31/49] pnfsd: add support for per-file layout_types attribute
>>
>> pnfsd over dlm:
>> [PATCH RFC v0 32/49] pnfsd: per block device dlm data server list cache
>> [PATCH RFC v0 33/49] pnfsd: Add IP address validation to nfsd4_set_pnfs_dlm_device()
>> [PATCH RFC v0 34/49] pnfsd: new nfsd filesystem file: pnfs_dlm_device
>> [PATCH RFC v0 35/49] pnfsd: nfsd4_pnfs_dlm_getdeviter
>> [PATCH RFC v0 36/49] pnfsd: nfsd4_pnfs_dlm_getdevinfo
>> [PATCH RFC v0 37/49] pnfsd: make /proc/fs/nfsd/pnfs_dlm_device report dlm device list.
>> [PATCH RFC v0 38/49] pnfsd: nfsd4_pnfs_dlm_layoutget
>> [PATCH RFC v0 39/49] pnfsd: DLM file layout only support read iomode layouts
>> [PATCH RFC v0 40/49] pnfsd: add dlm file layout layout-type
>> [PATCH RFC v0 41/49] pnfsd: dlm pnfs_export_operations
>> [PATCH RFC v0 42/49] pnfsd: gfs2: use generic file layout pnfs operations vector
>>
>> layout return / expire / return_on_close:
>> [PATCH RFC v0 43/49] pnfsd: release state lock around iput in put_nfs4_file
>> [PATCH RFC v0 44/49] posix_acl: resolve compile dependency in posix_acl.h
>> [PATCH RFC v0 45/49] nfs: resolve compile dependency in nfs_xdr.h
>> [PATCH RFC v0 46/49] pnfsd: layout return generic implementation
>> [PATCH RFC v0 47/49] pnfsd: pnfs_expire_client
>> [PATCH RFC v0 48/49] pnfsd: return on close
>> [PATCH RFC v0 49/49] pnfsd: dlm set return_on_close to true
Is this really necessary? What would we lose if we just used pnfs
automatically when the filesystem supports it and all other necessary
configuration is in place?
On Thu, Sep 26, 2013 at 02:40:19PM -0400, Benny Halevy wrote:
> From: Andy Adamson <[email protected]>
>
> This is a boolean for now. When more layouttypes are supported, this can
> change to "pnfs=", similar to "sec=".
>
> The ctl interface is not enhanced.
>
> Note: Export option strings are not guaranteed to be present in every call to
> svc_export_parse. For example, nfs-utils-1.1.2 exportfs validates the export
> with a test call that does not include the 'pnfs' export option even though
> it is set in /etc/exports.
>
> nfsd4_layout_verify() checks if ex_pnfs is set so the ex_pnfs check in
> check_export is not needed.
>
> Furthermore,the pnfs_export_operations super block pointer should not be
> changed because a) it is a const and b) the exports options can be changed
> while the file system is mounted.
>
> Remove the ex_pnfs check from check_export to prevent the pnfs_export_operations
> superblock pointer from being set to NULL.
This patch doesn't touch check_export. Is this describing a change from
a previous version of the patch? If so, either drop this comment or
write it in a way that will make sense to someone who hasn't seen the
previous version.
--b.
>
> Signed-off-by: Andy Adamson <[email protected]>
> [pnfsd: fix cosmetic checkpatch warnings]
> [pnfsd: test pnfs export option in check_export]
> Signed-off-by: Benny Halevy <[email protected]>
> [pnfsd: fix ex_pnfs check_export bug]
> Signed-off-by: Andy Adamson <[email protected]>
> Signed-off-by: Benny Halevy <[email protected]>
> Signed-off-by: Benny Halevy <[email protected]>
> ---
> fs/nfsd/export.c | 6 ++++++
> include/linux/nfsd/export.h | 1 +
> 2 files changed, 7 insertions(+)
>
> diff --git a/fs/nfsd/export.c b/fs/nfsd/export.c
> index f26b0b9..7730dfd 100644
> --- a/fs/nfsd/export.c
> +++ b/fs/nfsd/export.c
> @@ -567,6 +567,8 @@ static int svc_export_parse(struct cache_detail *cd, char *mesg, int mlen)
> if (exp.ex_uuid == NULL)
> err = -ENOMEM;
> }
> + } else if (strcmp(buf, "pnfs") == 0) {
> + exp.ex_pnfs = 1;
> } else if (strcmp(buf, "secinfo") == 0)
> err = secinfo_parse(&mesg, buf, &exp);
> else
> @@ -639,6 +641,8 @@ static int svc_export_show(struct seq_file *m,
> seq_printf(m, "%02x", exp->ex_uuid[i]);
> }
> }
> + if (exp->ex_pnfs)
> + seq_puts(m, ",pnfs");
> show_secinfo(m, exp);
> }
> seq_puts(m, ")\n");
> @@ -666,6 +670,7 @@ static void svc_export_init(struct cache_head *cnew, struct cache_head *citem)
> new->ex_fslocs.locations_count = 0;
> new->ex_fslocs.migrated = 0;
> new->ex_uuid = NULL;
> + new->ex_pnfs = 0;
> new->cd = item->cd;
> }
>
> @@ -679,6 +684,7 @@ static void export_update(struct cache_head *cnew, struct cache_head *citem)
> new->ex_anon_uid = item->ex_anon_uid;
> new->ex_anon_gid = item->ex_anon_gid;
> new->ex_fsid = item->ex_fsid;
> + new->ex_pnfs = item->ex_pnfs;
> new->ex_uuid = item->ex_uuid;
> item->ex_uuid = NULL;
> new->ex_fslocs.locations = item->ex_fslocs.locations;
> diff --git a/include/linux/nfsd/export.h b/include/linux/nfsd/export.h
> index 7898c99..b03ceee 100644
> --- a/include/linux/nfsd/export.h
> +++ b/include/linux/nfsd/export.h
> @@ -52,6 +52,7 @@ struct svc_export {
> kuid_t ex_anon_uid;
> kgid_t ex_anon_gid;
> int ex_fsid;
> + int ex_pnfs;
> unsigned char * ex_uuid; /* 16 byte fsid */
> struct nfsd4_fs_locations ex_fslocs;
> int ex_nflavors;
> --
> 1.8.3.1
>
From: Benny Halevy <[email protected]>
Signed-off-by: Benny Halevy <[email protected]>
---
fs/nfsd/Makefile | 1 +
fs/nfsd/nfs4pnfsd.c | 27 +++++++++++++++++++++++++++
2 files changed, 28 insertions(+)
create mode 100644 fs/nfsd/nfs4pnfsd.c
diff --git a/fs/nfsd/Makefile b/fs/nfsd/Makefile
index af32ef0..5ebe5df 100644
--- a/fs/nfsd/Makefile
+++ b/fs/nfsd/Makefile
@@ -12,3 +12,4 @@ nfsd-$(CONFIG_NFSD_V3) += nfs3proc.o nfs3xdr.o
nfsd-$(CONFIG_NFSD_V3_ACL) += nfs3acl.o
nfsd-$(CONFIG_NFSD_V4) += nfs4proc.o nfs4xdr.o nfs4state.o nfs4idmap.o \
nfs4acl.o nfs4callback.o nfs4recover.o
+nfsd-$(CONFIG_PNFSD) += nfs4pnfsd.o
diff --git a/fs/nfsd/nfs4pnfsd.c b/fs/nfsd/nfs4pnfsd.c
new file mode 100644
index 0000000..cb28207
--- /dev/null
+++ b/fs/nfsd/nfs4pnfsd.c
@@ -0,0 +1,27 @@
+/******************************************************************************
+ *
+ * (c) 2007 Network Appliance, Inc. All Rights Reserved.
+ * (c) 2009 NetApp. All Rights Reserved.
+ *
+ * NetApp provides this source code under the GPL v2 License.
+ * The GPL v2 license is available at
+ * http://opensource.org/licenses/gpl-license.php.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR
+ * CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
+ * EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
+ * PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
+ * PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
+ * LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
+ * NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
+ * SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ *
+ *****************************************************************************/
+
+#include "pnfsd.h"
+
+#define NFSDDBG_FACILITY NFSDDBG_PNFS
+
--
1.8.3.1
On 2013-09-26 17:55, J. Bruce Fields wrote:
> On Thu, Sep 26, 2013 at 02:40:02PM -0400, Benny Halevy wrote:
>> From: Benny Halevy <[email protected]>
>>
>> Set the cl_exchange_flags to be non_pnfs if we do not set
>> either pnfs or ds (in the plain old nfs41 case).
>>
>> Note that we always set both MDS and DS exchangeid capability flags
>> when CONFIG_PNFSD is enabled.
>> The client needs to remember what the session is used for
>> if it cares to distiguish between DSs and MDSs.
>>
>> EXCHGID4_FLAG_USE_NON_PNFS should be set when the server does not support
>> operations (e.g. LAYOUTGET) or attributes that pertain to pNFS.
>
> Minor nit: since we don't actually support those operations yet, this
> patch should probably come later in the series.
Right. It was originally placed first to allow testing of early patches.
I'll move it later when the implementation supports getdeviceinfo and layoutget.
Benny
>
> --b.
>
>>
>> [extraced from pnfsd: Initial pNFS server implementation.]
>> Signed-off-by: Benny Halevy <[email protected]>
>> [pnfsd: Fixup nfsd4_set_ex_flags.]
>> Signed-off-by: Dean Hildebrand <[email protected]>
>> [pnfsd: set EXCHGID4_FLAG_USE_NON_PNFS when !CONFIG_PNFSD]
>> [pnfsd: fix compiler warning in nfsd4_set_ex_flags when CONFIG_PNFSD is not defined]
>> [pnfsd: always set both MDS and DS exchangeid capability flags]
>> Signed-off-by: Benny Halevy <[email protected]>
>> Signed-off-by: Benny Halevy <[email protected]>
>> ---
>> fs/nfsd/nfs4state.c | 6 +++++-
>> 1 file changed, 5 insertions(+), 1 deletion(-)
>>
>> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
>> index 57a0340..21c15fc 100644
>> --- a/fs/nfsd/nfs4state.c
>> +++ b/fs/nfsd/nfs4state.c
>> @@ -1614,8 +1614,12 @@ static bool clp_used_exchangeid(struct nfs4_client *clp)
>> static void
>> nfsd4_set_ex_flags(struct nfs4_client *new, struct nfsd4_exchange_id *clid)
>> {
>> - /* pNFS is not supported */
>> +#if defined(CONFIG_PNFSD)
>> + new->cl_exchange_flags |= EXCHGID4_FLAG_USE_PNFS_MDS |
>> + EXCHGID4_FLAG_USE_PNFS_DS;
>> +#else /* CONFIG_PNFSD */
>> new->cl_exchange_flags |= EXCHGID4_FLAG_USE_NON_PNFS;
>> +#endif /* CONFIG_PNFSD */
>>
>> /* Referrals are supported, Migration is not. */
>> new->cl_exchange_flags |= EXCHGID4_FLAG_SUPP_MOVED_REFER;
>> --
>> 1.8.3.1
>>
From: Benny Halevy <[email protected]>
Verify whether the server and file system support the given layout type.
[was pnfsd: Streamline error code checking for non-pnfs filesystems]
Signed-off-by: Dean Hildebrand <[email protected]>
[pnfsd: Add super block to layout_type()]
Signed-off-by: Marc Eshel <[email protected]>
[pnfsd: Fix order of ops in nfsd4_layout_verify]
Signed-off-by: Dean Hildebrand <[email protected]>
[pnfsd: convert generic code to use new pnfs api]
[pnfsd: define pnfs_export_operations]
[pnfsd: obliterate old vfs api]
Signed-off-by: Benny Halevy <[email protected]>
[pnfsd: layout verify all layout types]
Signed-off-by: Andy Adamson <[email protected]>
[pnfsd: tone nfsd4_layout_verify printk down to dprintk]
Signed-off-by: Benny Halevy <[email protected]>
[pnfsd: check ex_pnfs in nfsd4_verify_layout]
Signed-off-by: Andy Adamson <[email protected]>
[pnfsd: handle s_pnfs_op==NULL]
[pnfsd: verify export option only if svc_export is present]
Signed-off-by: Benny Halevy <[email protected]>
Signed-off-by: Benny Halevy <[email protected]>
---
fs/nfsd/export.c | 6 ++++++
fs/nfsd/nfs4proc.c | 39 +++++++++++++++++++++++++++++++++++++++
fs/nfsd/pnfsd.h | 2 ++
include/linux/nfsd/nfsd4_pnfs.h | 5 ++++-
4 files changed, 51 insertions(+), 1 deletion(-)
diff --git a/fs/nfsd/export.c b/fs/nfsd/export.c
index 7730dfd..d803414 100644
--- a/fs/nfsd/export.c
+++ b/fs/nfsd/export.c
@@ -376,6 +376,12 @@ static int check_export(struct inode *inode, int *flags, unsigned char *uuid)
return -EINVAL;
}
+ if (inode->i_sb->s_pnfs_op &&
+ !inode->i_sb->s_pnfs_op->layout_type) {
+ dprintk("exp_export: export of invalid fs pnfs export ops.\n");
+ return -EINVAL;
+ }
+
return 0;
}
diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
index 419572f..576b635 100644
--- a/fs/nfsd/nfs4proc.c
+++ b/fs/nfsd/nfs4proc.c
@@ -41,6 +41,7 @@
#include "vfs.h"
#include "current_stateid.h"
#include "netns.h"
+#include "pnfsd.h"
#ifdef CONFIG_NFSD_V4_SECURITY_LABEL
#include <linux/security.h>
@@ -1109,6 +1110,44 @@ static int fill_in_write_vector(struct kvec *vec, struct nfsd4_write *write)
return status == nfserr_same ? nfs_ok : status;
}
+#if defined(CONFIG_PNFSD)
+static __be32
+nfsd4_layout_verify(struct super_block *sb, struct svc_export *exp,
+ unsigned int layout_type)
+{
+ int status, type;
+
+ /* check to see if pNFS is supported. */
+ status = nfserr_layoutunavailable;
+ if (exp && exp->ex_pnfs == 0) {
+ dprintk("%s: Underlying file system "
+ "is not exported over pNFS\n", __func__);
+ goto out;
+ }
+ if (!sb->s_pnfs_op || !sb->s_pnfs_op->layout_type) {
+ dprintk("%s: Underlying file system "
+ "does not support pNFS\n", __func__);
+ goto out;
+ }
+
+ type = sb->s_pnfs_op->layout_type(sb);
+
+ /* check to see if requested layout type is supported. */
+ status = nfserr_unknown_layouttype;
+ if (!type)
+ dprintk("BUG: %s: layout_type 0 is reserved and must not be "
+ "used by filesystem\n", __func__);
+ else if (type != layout_type)
+ dprintk("%s: requested layout type %d "
+ "does not match supported type %d\n",
+ __func__, layout_type, type);
+ else
+ status = nfs_ok;
+out:
+ return status;
+}
+#endif /* CONFIG_PNFSD */
+
/*
* NULL call.
*/
diff --git a/fs/nfsd/pnfsd.h b/fs/nfsd/pnfsd.h
index 65fb57e..7c46791 100644
--- a/fs/nfsd/pnfsd.h
+++ b/fs/nfsd/pnfsd.h
@@ -34,4 +34,6 @@
#ifndef LINUX_NFSD_PNFSD_H
#define LINUX_NFSD_PNFSD_H
+#include <linux/nfsd/nfsd4_pnfs.h>
+
#endif /* LINUX_NFSD_PNFSD_H */
diff --git a/include/linux/nfsd/nfsd4_pnfs.h b/include/linux/nfsd/nfsd4_pnfs.h
index ff6613e..d44669e 100644
--- a/include/linux/nfsd/nfsd4_pnfs.h
+++ b/include/linux/nfsd/nfsd4_pnfs.h
@@ -34,6 +34,8 @@
#ifndef _LINUX_NFSD_NFSD4_PNFS_H
#define _LINUX_NFSD_NFSD4_PNFS_H
+#include <linux/exportfs.h>
+
/*
* pNFS export operations vector.
*
@@ -45,7 +47,8 @@
* All other methods are optional and can be set to NULL if not implemented.
*/
struct pnfs_export_operations {
- /* stub */
+ /* Returns the supported pnfs_layouttype4. */
+ int (*layout_type) (struct super_block *);
};
#endif /* _LINUX_NFSD_NFSD4_PNFS_H */
--
1.8.3.1
On 09/27/2013 09:34 AM, Benny Halevy wrote:
>> I thought that we said that exofs server is going in first. What happened?
>
> exofs requires much more functionality.
> To help review the code we need to go through this milestone in any case.
>
That is not true. Look at the way I staged the pnfsd-exofs patches. after
the LO_GET LO_COMMIT and LO_RETURN patches you have a full functioning
git cloning exofs. (BTW exofs does not need DEVICELIST)
So OK your patches do not have LO_COMMIT but this code path is trivial
and what is that contraption of returning "no-layout" for writes and
then not having the LO_COMMIT support. This is plain hacky and not
in accord to the pNFS philosophy of things.
And We can farther split my original set to do read-only with out LO_COMMIT
and add a simple LO_COMMIT stage with enable of write LAYOUTs, easily.
Which is what you have with much less code.
The recall comes in at a different patch that can be staged later and is
effectively not needed for normal operations.
Actually the all code including the exofs patches first stage is smaller and
simpler then the DLM contraption. And it only touches exofs code which
does not involve other sensitive subsystems.
I have a deja vu about this. Why won't you talk to me before working on such
DLM crap that is not at all pnfs, but a hack that demonstrates nothing?
Please do the right thing, since you are already putting all this effort. And I can
help as well with the pnfsd-exofs patches part.
BTW: Thank you for doing this, it is about time someone should put some mainline love
to the pNFS server
Thanks
Boaz
From: Eric Anderle <[email protected]>
The ability to read the current device list is useful for debugging.
Signed-off-by: Eric Anderle <[email protected]>
Signed-off-by: J. Bruce Fields <[email protected]>
[pnfsd: fix dlm device naming]
Signed-off-by: Andy Adamson <[email protected]>
Signed-off-by: Benny Halevy <[email protected]>
Signed-off-by: Benny Halevy <[email protected]>
---
fs/nfsd/nfs4pnfsdlm.c | 44 +++++++++++++++++++++++++++++++++++++---
fs/nfsd/nfsctl.c | 4 +++-
include/linux/nfsd/nfs4pnfsdlm.h | 2 ++
3 files changed, 46 insertions(+), 4 deletions(-)
diff --git a/fs/nfsd/nfs4pnfsdlm.c b/fs/nfsd/nfs4pnfsdlm.c
index 19002c1..4c2ab87 100644
--- a/fs/nfsd/nfs4pnfsdlm.c
+++ b/fs/nfsd/nfs4pnfsdlm.c
@@ -28,6 +28,8 @@
#include <linux/nfsd/nfs4layoutxdr.h>
#include <linux/sunrpc/addr.h>
+#include "nfsd.h"
+
#define NFSDDBG_FACILITY NFSDDBG_FILELAYOUT
/* Just use a linked list. Do not expect more than 32 dlm_device_entries
@@ -45,12 +47,15 @@ struct dlm_device_entry {
};
static struct dlm_device_entry *
-nfsd4_find_pnfs_dlm_device(char *disk_name)
+_nfsd4_find_pnfs_dlm_device(char *disk_name)
{
struct dlm_device_entry *dlm_pdev;
+ dprintk("--> %s disk name %s\n", __func__, disk_name);
spin_lock(&dlm_device_list_lock);
list_for_each_entry(dlm_pdev, &dlm_device_list, dlm_dev_list) {
+ dprintk("%s Look for dlm_pdev %s\n", __func__,
+ dlm_pdev->disk_name);
if (!memcmp(dlm_pdev->disk_name, disk_name, strlen(disk_name))) {
spin_unlock(&dlm_device_list_lock);
return dlm_pdev;
@@ -60,6 +65,39 @@ struct dlm_device_entry {
return NULL;
}
+static struct dlm_device_entry *
+nfsd4_find_pnfs_dlm_device(struct super_block *sb) {
+ char dname[BDEVNAME_SIZE];
+
+ bdevname(sb->s_bdev, dname);
+ return _nfsd4_find_pnfs_dlm_device(dname);
+}
+
+ssize_t
+nfsd4_get_pnfs_dlm_device_list(char *buf, ssize_t buflen)
+{
+ char *pos = buf;
+ ssize_t size = 0;
+ struct dlm_device_entry *dlm_pdev;
+ int ret = -EINVAL;
+
+ spin_lock(&dlm_device_list_lock);
+ list_for_each_entry(dlm_pdev, &dlm_device_list, dlm_dev_list)
+ {
+ int advanced;
+ advanced = snprintf(pos, buflen - size, "%s:%s\n", dlm_pdev->disk_name, dlm_pdev->ds_list);
+ if (advanced >= buflen - size)
+ goto out;
+ size += advanced;
+ pos += advanced;
+ }
+ ret = size;
+
+out:
+ spin_unlock(&dlm_device_list_lock);
+ return ret;
+}
+
bool nfsd4_validate_pnfs_dlm_device(char *ds_list, int *num_ds)
{
char *start = ds_list;
@@ -140,7 +178,7 @@ bool nfsd4_validate_pnfs_dlm_device(char *ds_list, int *num_ds)
dprintk("%s disk_name %s num_ds %d ds_list %s\n", __func__,
new->disk_name, new->num_ds, new->ds_list);
- found = nfsd4_find_pnfs_dlm_device(new->disk_name);
+ found = _nfsd4_find_pnfs_dlm_device(new->disk_name);
if (found) {
/* FIXME: should compare found->ds_list with new->ds_list
* and if it is different, kick off a CB_NOTIFY change
@@ -235,7 +273,7 @@ static int nfsd4_pnfs_dlm_getdevinfo(struct super_block *sb,
/*
* If the DS list has not been established, return -EINVAL
*/
- dlm_pdev = nfsd4_find_pnfs_dlm_device(sb->s_bdev->bd_disk->disk_name);
+ dlm_pdev = nfsd4_find_pnfs_dlm_device(sb);
if (!dlm_pdev) {
dprintk("%s: DEBUG: disk %s Not Found\n", __func__,
sb->s_bdev->bd_disk->disk_name);
diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
index 7da8584..4fafa2a 100644
--- a/fs/nfsd/nfsctl.c
+++ b/fs/nfsd/nfsctl.c
@@ -1068,7 +1068,9 @@ static ssize_t __write_pnfs_dlm_device(struct file *file, char *buf,
return ret;
ret = nfsd4_set_pnfs_dlm_device(pnfs_dlm_device, len);
- }
+ } else
+ return nfsd4_get_pnfs_dlm_device_list(buf, SIMPLE_TRANSACTION_LIMIT);
+
return ret <= 0 ? ret : strlen(buf);
}
diff --git a/include/linux/nfsd/nfs4pnfsdlm.h b/include/linux/nfsd/nfs4pnfsdlm.h
index 63248aa..a4f3477 100644
--- a/include/linux/nfsd/nfs4pnfsdlm.h
+++ b/include/linux/nfsd/nfs4pnfsdlm.h
@@ -39,6 +39,8 @@
void nfsd4_pnfs_dlm_shutdown(void);
+ssize_t nfsd4_get_pnfs_dlm_device_list(char *buf, ssize_t buflen);
+
#else /* CONFIG_PNFSD */
static inline void nfsd4_pnfs_dlm_shutdown(void)
--
1.8.3.1
Signed-off-by: Benny Halevy <[email protected]>
---
fs/nfsd/nfs4pnfsd.c | 75 +++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 75 insertions(+)
diff --git a/fs/nfsd/nfs4pnfsd.c b/fs/nfsd/nfs4pnfsd.c
index 386afa3..8d16b85 100644
--- a/fs/nfsd/nfs4pnfsd.c
+++ b/fs/nfsd/nfs4pnfsd.c
@@ -364,6 +364,66 @@ struct super_block *
return id;
}
+/*
+ * are two octet ranges overlapping or adjacent?
+ */
+static bool
+lo_seg_mergeable(struct nfsd4_layout_seg *l1, struct nfsd4_layout_seg *l2)
+{
+ u64 start1 = l1->offset;
+ u64 end1 = end_offset(start1, l1->length);
+ u64 start2 = l2->offset;
+ u64 end2 = end_offset(start2, l2->length);
+
+ /* is end1 == start2 ranges are adjacent */
+ return (end2 >= start1) && (end1 >= start2);
+}
+
+static void
+extend_layout(struct nfsd4_layout_seg *lo, struct nfsd4_layout_seg *lg)
+{
+ u64 lo_start = lo->offset;
+ u64 lo_end = end_offset(lo_start, lo->length);
+ u64 lg_start = lg->offset;
+ u64 lg_end = end_offset(lg_start, lg->length);
+
+ /* lo already covers lg? */
+ if (lo_start <= lg_start && lg_end <= lo_end)
+ return;
+
+ /* extend start offset */
+ if (lo_start > lg_start)
+ lo_start = lg_start;
+
+ /* extend end offset */
+ if (lo_end < lg_end)
+ lo_end = lg_end;
+
+ lo->offset = lo_start;
+ lo->length = (lo_end == NFS4_MAX_UINT64) ?
+ lo_end : lo_end - lo_start;
+}
+
+static bool
+merge_layout(struct nfs4_layout_state *ls, struct nfsd4_layout_seg *seg)
+{
+ bool ret = false;
+ struct nfs4_layout *lp;
+
+ spin_lock(&layout_lock);
+ list_for_each_entry (lp, &ls->ls_layouts, lo_perstate)
+ if (lp->lo_seg.layout_type == seg->layout_type &&
+ lp->lo_seg.clientid == seg->clientid &&
+ lp->lo_seg.iomode == seg->iomode &&
+ (ret = lo_seg_mergeable(&lp->lo_seg, seg))) {
+ extend_layout(&lp->lo_seg, seg);
+ break;
+ }
+ spin_unlock(&layout_lock);
+
+ return ret;
+}
+
__be32
nfs4_pnfs_get_layout(struct svc_rqst *rqstp,
struct nfsd4_pnfs_layoutget *lgp,
@@ -373,6 +433,7 @@ struct super_block *
__be32 nfserr;
struct inode *ino = lgp->lg_fhp->fh_dentry->d_inode;
struct super_block *sb = ino->i_sb;
+ int can_merge;
struct nfs4_file *fp;
struct nfs4_client *clp;
struct nfs4_layout *lp = NULL;
@@ -412,6 +473,9 @@ struct super_block *
goto out;
}
+ can_merge = sb->s_pnfs_op->can_merge_layouts != NULL &&
+ sb->s_pnfs_op->can_merge_layouts(lgp->lg_seg.layout_type);
+
nfs4_lock_state();
fp = find_alloc_file(ino, lgp->lg_fhp);
clp = find_confirmed_client((clientid_t *)&lgp->lg_seg.clientid, true,
@@ -430,6 +494,9 @@ struct super_block *
if (nfserr)
goto out_unlock;
+ /* pre-alloc layout in case we can't merge after we call
+ * the file system
+ */
lp = alloc_layout();
if (!lp) {
nfserr = nfserr_layouttrylater;
@@ -486,6 +553,14 @@ struct super_block *
lgp->lg_seg = res.lg_seg;
lgp->lg_roc = res.lg_return_on_close;
+ /* SUCCESS!
+ * Can the new layout be merged into an existing one?
+ * If so, free unused layout struct
+ */
+ if (can_merge && merge_layout(ls, &res.lg_seg))
+ goto out_freelayout;
+
+ /* Can't merge, so let's initialize this new layout */
init_layout(lp, ls, lgp->lg_fhp, &res.lg_seg, &lgp->lg_sid);
out_unlock:
if (ls)
--
1.8.3.1
From: Eric Anderle <[email protected]>
We should catch errors in the format at the time the list is given to
the kernel, rather than just returning garbage to the client and letting
the client fail.
Signed-off-by: J. Bruce Fields <[email protected]>
[removed unused 'len' parameter]
[fixup rpc_pton parameters for 3.4]
[fixup rpc_pton include file]
Signed-off-by: Benny Halevy <[email protected]>
---
fs/nfsd/nfs4pnfsdlm.c | 30 ++++++++++++++++++++++++------
1 file changed, 24 insertions(+), 6 deletions(-)
diff --git a/fs/nfsd/nfs4pnfsdlm.c b/fs/nfsd/nfs4pnfsdlm.c
index 906c370..ddc2188 100644
--- a/fs/nfsd/nfs4pnfsdlm.c
+++ b/fs/nfsd/nfs4pnfsdlm.c
@@ -23,6 +23,7 @@
#include <linux/nfsd/debug.h>
#include <linux/nfsd/nfs4pnfsdlm.h>
+#include <linux/sunrpc/addr.h>
#define NFSDDBG_FACILITY NFSDDBG_FILELAYOUT
@@ -56,6 +57,25 @@ struct dlm_device_entry {
return NULL;
}
+bool nfsd4_validate_pnfs_dlm_device(char *ds_list, int *num_ds)
+{
+ char *start = ds_list;
+
+ *num_ds = 0;
+
+ while (*start) {
+ struct sockaddr_storage tempAddr;
+ int ipLen = strcspn(start, ",");
+
+ if (!rpc_pton(&init_net, start, ipLen,
+ (struct sockaddr *)&tempAddr, sizeof(tempAddr)))
+ return false;
+ (*num_ds)++;
+ start += ipLen + 1;
+ }
+ return true;
+}
+
/*
* pnfs_dlm_device string format:
* block-device-path:<ds1 ipv4 address>,<ds2 ipv4 address>
@@ -109,12 +129,10 @@ struct dlm_device_entry {
goto out_free;
memcpy(new->ds_list, bufp, len);
- /* count the number of comma-delimited DS IPs */
- new->num_ds = 1;
- while ((bufp = strchr(bufp, ',')) != NULL) {
- new->num_ds++;
- bufp++;
- }
+
+ /* validate the ips */
+ if (!nfsd4_validate_pnfs_dlm_device(new->ds_list, &(new->num_ds)))
+ goto out_free;
dprintk("%s disk_name %s num_ds %d ds_list %s\n", __func__,
new->disk_name, new->num_ds, new->ds_list);
--
1.8.3.1
From: Dean Hildebrand <[email protected]>
[was pnfsd: Add use of pnfs exchange flags]
Signed-off-by: Dean Hildebrand <[email protected]>
[pnfsd: define a is_ds_only_session helper]
Signed-off-by: Benny Halevy <[email protected]>
Signed-off-by: Benny Halevy <[email protected]>
---
fs/nfsd/nfs4state.c | 4 ++++
include/uapi/linux/nfs4.h | 7 +++++++
2 files changed, 11 insertions(+)
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index 21c15fc..2c973e6 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -1953,6 +1953,10 @@ static __be32 nfsd4_check_cb_sec(struct nfsd4_cb_sec *cbs)
status = nfserr_seq_misordered;
goto out_free_conn;
}
+
+ if (is_ds_only_session(unconf->cl_exchange_flags))
+ cr_ses->flags &= ~SESSION4_BACK_CHAN;
+
old = find_confirmed_client_by_name(&unconf->cl_name, nn);
if (old) {
status = mark_client_expired(old);
diff --git a/include/uapi/linux/nfs4.h b/include/uapi/linux/nfs4.h
index 788128e..028f5fc 100644
--- a/include/uapi/linux/nfs4.h
+++ b/include/uapi/linux/nfs4.h
@@ -125,6 +125,13 @@
#define EXCHGID4_FLAG_USE_PNFS_DS 0x00040000
#define EXCHGID4_FLAG_MASK_PNFS 0x00070000
+static inline bool
+is_ds_only_session(u32 exchange_flags)
+{
+ u32 mask = EXCHGID4_FLAG_USE_PNFS_DS | EXCHGID4_FLAG_USE_PNFS_MDS;
+ return (exchange_flags & mask) == EXCHGID4_FLAG_USE_PNFS_DS;
+}
+
#define EXCHGID4_FLAG_UPD_CONFIRMED_REC_A 0x40000000
#define EXCHGID4_FLAG_CONFIRMED_R 0x80000000
/*
--
1.8.3.1
From: Andy Adamson <[email protected]>
Simple linked list cache of per block device dlm pnfs data servers.
[pnfsd: define dlm export ops for the !CONFIG_PNFSD case]
[pnfsd: fix pnfs_dlm_device string parsing]
Signed-off-by: Andy Adamson <[email protected]>
[pnfsd: more fixes for pnfs_dlm_device string parsing]
Signed-off-by: Benny Halevy <[email protected]>
[restricted use of CONFIG_PNFSD]
[use NFSD_DLM_DS_LIST_MAX defined in include/linux/nfsd/nfs4pnfsdlm.h]
Acked-by: Steven Whitehouse <[email protected]>
[pnfsd: fix test in nfsd4_find_pnfs_dlm_device]
Signed-off-by: Eric Anderle <[email protected]>
Signed-off-by: J. Bruce Fields <[email protected]>
[nfsd4_pnfs_dlm_shutdown should use list_for_each_entry_safe]
Signed-off-by: Benny Halevy <[email protected]>
[pnfsd: nfs4pnfsd.c should dprint under NFSDDBG_PNFS]
Signed-off-by: Boaz Harrosh <[email protected]>
[pnfsd: Prevent ipv6 address truncation in /proc/fs/nfsd/pnfs_dlm_device]
Signed-off-by: Michael Groshans <[email protected]>
[pnfsd: Fix num_ds bug in nfsd4_set_pnfs_dlm_device()]
Signed-off-by: Eric Anderle <[email protected]>
Signed-off-by: Benny Halevy <[email protected]>
Signed-off-by: Benny Halevy <[email protected]>
---
fs/nfsd/Makefile | 2 +-
fs/nfsd/nfs4pnfsdlm.c | 164 +++++++++++++++++++++++++++++++++++++++
fs/nfsd/nfsctl.c | 2 +
include/linux/nfsd/nfs4pnfsdlm.h | 49 ++++++++++++
4 files changed, 216 insertions(+), 1 deletion(-)
create mode 100644 fs/nfsd/nfs4pnfsdlm.c
create mode 100644 include/linux/nfsd/nfs4pnfsdlm.h
diff --git a/fs/nfsd/Makefile b/fs/nfsd/Makefile
index 5ebe5df..84ae177 100644
--- a/fs/nfsd/Makefile
+++ b/fs/nfsd/Makefile
@@ -12,4 +12,4 @@ nfsd-$(CONFIG_NFSD_V3) += nfs3proc.o nfs3xdr.o
nfsd-$(CONFIG_NFSD_V3_ACL) += nfs3acl.o
nfsd-$(CONFIG_NFSD_V4) += nfs4proc.o nfs4xdr.o nfs4state.o nfs4idmap.o \
nfs4acl.o nfs4callback.o nfs4recover.o
-nfsd-$(CONFIG_PNFSD) += nfs4pnfsd.o
+nfsd-$(CONFIG_PNFSD) += nfs4pnfsd.o nfs4pnfsdlm.o
diff --git a/fs/nfsd/nfs4pnfsdlm.c b/fs/nfsd/nfs4pnfsdlm.c
new file mode 100644
index 0000000..906c370
--- /dev/null
+++ b/fs/nfsd/nfs4pnfsdlm.c
@@ -0,0 +1,164 @@
+/******************************************************************************
+ *
+ * (c) 2007 Network Appliance, Inc. All Rights Reserved.
+ * (c) 2009 NetApp. All Rights Reserved.
+ *
+ * NetApp provides this source code under the GPL v2 License.
+ * The GPL v2 license is available at
+ * http://opensource.org/licenses/gpl-license.php.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR
+ * CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
+ * EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
+ * PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
+ * PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
+ * LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
+ * NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
+ * SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ *
+ ******************************************************************************/
+
+#include <linux/nfsd/debug.h>
+#include <linux/nfsd/nfs4pnfsdlm.h>
+
+#define NFSDDBG_FACILITY NFSDDBG_FILELAYOUT
+
+/* Just use a linked list. Do not expect more than 32 dlm_device_entries
+ * the first implementation will just use one device per cluster file system
+ */
+
+static LIST_HEAD(dlm_device_list);
+static DEFINE_SPINLOCK(dlm_device_list_lock);
+
+struct dlm_device_entry {
+ struct list_head dlm_dev_list;
+ char disk_name[DISK_NAME_LEN];
+ int num_ds;
+ char ds_list[NFSD_DLM_DS_LIST_MAX];
+};
+
+static struct dlm_device_entry *
+nfsd4_find_pnfs_dlm_device(char *disk_name)
+{
+ struct dlm_device_entry *dlm_pdev;
+
+ spin_lock(&dlm_device_list_lock);
+ list_for_each_entry(dlm_pdev, &dlm_device_list, dlm_dev_list) {
+ if (!memcmp(dlm_pdev->disk_name, disk_name, strlen(disk_name))) {
+ spin_unlock(&dlm_device_list_lock);
+ return dlm_pdev;
+ }
+ }
+ spin_unlock(&dlm_device_list_lock);
+ return NULL;
+}
+
+/*
+ * pnfs_dlm_device string format:
+ * block-device-path:<ds1 ipv4 address>,<ds2 ipv4 address>
+ *
+ * Examples
+ * /dev/sda:192.168.1.96,192.168.1.97' creates a data server list with
+ * two data servers for the dlm cluster file system mounted on /dev/sda.
+ *
+ * /dev/sda:192.168.1.96,192.168.1.100'
+ * replaces the data server list for /dev/sda
+ *
+ * Only the deviceid == 1 is supported. Can add device id to
+ * pnfs_dlm_device string when needed.
+ *
+ * Only the round robin each data server once stripe index is supported.
+ */
+int
+nfsd4_set_pnfs_dlm_device(char *pnfs_dlm_device, int len)
+
+{
+ struct dlm_device_entry *new, *found;
+ char *bufp = pnfs_dlm_device;
+ char *endp = bufp + strlen(bufp);
+ int err = -ENOMEM;
+
+ dprintk("--> %s len %d\n", __func__, len);
+
+ new = kzalloc(sizeof(*new), GFP_KERNEL);
+ if (!new)
+ return err;
+
+ err = -EINVAL;
+ /* disk_name */
+ /* FIXME: need to check for valid disk_name. search superblocks?
+ * check for slash dev slash ?
+ */
+ len = strcspn(bufp, ":");
+ if (len > DISK_NAME_LEN)
+ goto out_free;
+ memcpy(new->disk_name, bufp, len);
+
+ err = -EINVAL;
+ bufp += len + 1;
+ if (bufp >= endp)
+ goto out_free;
+
+ /* data server list */
+ /* FIXME: need to check for comma separated valid ip format */
+ len = strlen(bufp);
+ if (len > NFSD_DLM_DS_LIST_MAX)
+ goto out_free;
+ memcpy(new->ds_list, bufp, len);
+
+ /* count the number of comma-delimited DS IPs */
+ new->num_ds = 1;
+ while ((bufp = strchr(bufp, ',')) != NULL) {
+ new->num_ds++;
+ bufp++;
+ }
+
+ dprintk("%s disk_name %s num_ds %d ds_list %s\n", __func__,
+ new->disk_name, new->num_ds, new->ds_list);
+
+ found = nfsd4_find_pnfs_dlm_device(new->disk_name);
+ if (found) {
+ /* FIXME: should compare found->ds_list with new->ds_list
+ * and if it is different, kick off a CB_NOTIFY change
+ * deviceid.
+ */
+ dprintk("%s pnfs_dlm_device %s:%s already in cache "
+ " replace ds_list with new ds_list %s\n", __func__,
+ found->disk_name, found->ds_list, new->ds_list);
+ memset(found->ds_list, 0, DISK_NAME_LEN);
+ memcpy(found->ds_list, new->ds_list, strlen(new->ds_list));
+ found->num_ds = new->num_ds;
+ kfree(new);
+ } else {
+ dprintk("%s Adding pnfs_dlm_device %s:%s\n", __func__,
+ new->disk_name, new->ds_list);
+ spin_lock(&dlm_device_list_lock);
+ list_add(&new->dlm_dev_list, &dlm_device_list);
+ spin_unlock(&dlm_device_list_lock);
+ }
+ dprintk("<-- %s Success\n", __func__);
+ return 0;
+
+out_free:
+ kfree(new);
+ dprintk("<-- %s returns %d\n", __func__, err);
+ return err;
+}
+
+void nfsd4_pnfs_dlm_shutdown(void)
+{
+ struct dlm_device_entry *dlm_pdev, *next;
+
+ dprintk("--> %s\n", __func__);
+
+ spin_lock(&dlm_device_list_lock);
+ list_for_each_entry_safe (dlm_pdev, next, &dlm_device_list,
+ dlm_dev_list) {
+ list_del(&dlm_pdev->dlm_dev_list);
+ kfree(dlm_pdev);
+ }
+ spin_unlock(&dlm_device_list_lock);
+}
diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
index 7f55517..b8bfa2e 100644
--- a/fs/nfsd/nfsctl.c
+++ b/fs/nfsd/nfsctl.c
@@ -15,6 +15,7 @@
#include <linux/sunrpc/gss_krb5_enctypes.h>
#include <linux/sunrpc/rpc_pipe_fs.h>
#include <linux/module.h>
+#include <linux/nfsd/nfs4pnfsdlm.h>
#include "idmap.h"
#include "nfsd.h"
@@ -1210,6 +1211,7 @@ static int __init init_nfsd(void)
static void __exit exit_nfsd(void)
{
+ nfsd4_pnfs_dlm_shutdown();
nfsd_reply_cache_shutdown();
remove_proc_entry("fs/nfs/exports", NULL);
remove_proc_entry("fs/nfs", NULL);
diff --git a/include/linux/nfsd/nfs4pnfsdlm.h b/include/linux/nfsd/nfs4pnfsdlm.h
new file mode 100644
index 0000000..63248aa
--- /dev/null
+++ b/include/linux/nfsd/nfs4pnfsdlm.h
@@ -0,0 +1,49 @@
+/******************************************************************************
+ *
+ * (c) 2007 Network Appliance, Inc. All Rights Reserved.
+ * (c) 2009 NetApp. All Rights Reserved.
+ *
+ * NetApp provides this source code under the GPL v2 License.
+ * The GPL v2 license is available at
+ * http://opensource.org/licenses/gpl-license.php.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR
+ * CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
+ * EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
+ * PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
+ * PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
+ * LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
+ * NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
+ * SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ *
+ ******************************************************************************/
+#include <linux/genhd.h>
+
+/*
+ * Length of comma separated pnfs data server IPv4 addresses. Enough room for
+ * 32 addresses.
+ */
+#define NFSD_DLM_DS_LIST_MAX 512
+/*
+ * Length of colon separated pnfs dlm device of the form
+ * disk_name:comma separated data server IPv4 address
+ */
+#define NFSD_PNFS_DLM_DEVICE_MAX (NFSD_DLM_DS_LIST_MAX + DISK_NAME_LEN + 1)
+
+#ifdef CONFIG_PNFSD
+
+int nfsd4_set_pnfs_dlm_device(char *pnfs_dlm_device, int len);
+
+void nfsd4_pnfs_dlm_shutdown(void);
+
+#else /* CONFIG_PNFSD */
+
+static inline void nfsd4_pnfs_dlm_shutdown(void)
+{
+ return;
+}
+
+#endif /* CONFIG_PNFSD */
--
1.8.3.1
From: Benny Halevy <[email protected]>
Implement the generic handling of GETDEVICELIST and GETDEVICEINFO.
After verifying that the requested layout type is supported,
getdevlist uses the get_device_iter pnfs export method
to encode the list of deviceids and get the cookie, verifier,
and eof flag to be used be the client to iterate through
the whole device list.
Getdevinfo uses the get_device_info pnfs export method
to encode the device info for the given deviceid.
The filesystem can choose to return valid cookie and cookieverf
on eof, pointing at the end of the device list so that subsequent
calls to GETDEVIE LIST will return an empty list.
Note that with the file layout, lots of devices are sent under a
single device id, so the client will need to send a relatively
large value of maxcount.
If maxcount is 0 then just update notifications.
The nfsv4.1 spec forbids returning ETOOSMALL in this case.
It is up to the implementor of the get_device_info method
to verify the deviceid in this case and return no
info for it.
If no notifications are given represent gdir_notification as an empty
bitmap array rather than one consisting of a single zeroed entry.
Thanks to Dean Hildebrand for suggesting this optimization
and to Peter Staubach for convincing that it's worth it.
Nfsd should return sbid while getting device list so that it can operate it properly later in nfsd4_getdevinfo.
[extracted from pnfsd: Initial pNFS server implementation.]
Signed-off-by: Benny Halevy <[email protected]>
[pnfsd: update pNFS server ops to draft 13]
Signed-off-by: Marc Eshel <[email protected]>
[pnfsd: Fix server getdevicelist update to draft 13]
Signed-off-by: Andy Adamson<[email protected]>
[pnfsd: update pNFS server ops to draft 13]
Signed-off-by: Marc Eshel <[email protected]>
[pnfsd: Fix server GETDEVICELIST to comply with NFSv4.1 Draft 13]
Signed-off-by: Ricardo Labiaga <[email protected]>
[pnfsd: Streamline error code checking for non-pnfs filesystems]
Signed-off-by: Dean Hildebrand <[email protected]>
[pnfsd: Simplify device export ops.]
Signed-off-by: Dean Hildebrand <[email protected]>
[pnfs: fix compile problems if CONFIG_PNFS turned off - exportfs.h]
Signed-off-by: Fred Isaman <[email protected]>
[pnfsd: Implement getdevlist maxcount checking.]
[pnfsd: use nfs error codes]
[pnfsd: Use 128 bit deviceid on server]
Signed-off-by: Dean Hildebrand <[email protected]>
[pnfsd: fix warning in nfsd4_encode_devlist_iterator()]
Signed-off-by: Mike Sager <[email protected]>
[pnfsd: Update getdeviceinfo for draft-19]
Signed-off-by: Dean Hildebrand <[email protected]>
[pnfsd: encode empty getdeviceinfo notify bitmap rather than zeroed]
Signed-off-by: Benny Halevy <[email protected]>
[pnfsd: do not depend on the current file handle in getdeviceinfo]
[pnfsd: update export hold count]
Signed-off-by: Marc Eshel <[email protected]>
[pnfsd: Update getdevlist for draft 19]
Signed-off-by: Dean Hildebrand <[email protected]>
[pnfsd: fix GETDEVICELIST encoding]
Signed-off-by: Mike Sager <[email protected]>
[pnfsd: use nfsd4_compoundres pointer in pnfs_xdr_info]
[pnfsd: fix NFS4ERR_TOOSMALL for getdeviceinfo]
[pnfsd: enable multipage getdeviceinfo da_addr_body]
Signed-off-by: Andy Adamson <[email protected]>
[pnfsd: move vfs api structures to nfsd4_pnfs.h]
[pnfsd: convert generic code to use new pnfs api]
[pnfsd: define pnfs_export_operations]
[pnfsd: obliterate old vfs api]
[pnfsd: fixup ENCODE_HEAD for getdevicelist/info]
Signed-off-by: Benny Halevy <[email protected]>
[pnfsd: get device list/info all layout types]
[pnfsd: check ex_pnfs in nfsd4_verify_layout]
Signed-off-by: Andy Adamson <[email protected]>
[removed nfsd4_pnfs_fl_getdev{info,iter} stubs]
[pnfsd: filelayout: convert to using exp_xdr]
[pnfsd: get rid of getdevinfo notify_types]
[pnfsd: copy getdevinfo deviceid in one piece]
[pnfsd: rename deviceid_t struct pnfs_deviceid]
[pnfsd: fix cosmetic checkpatch warnings]
[pnfsd: handle s_pnfs_op==NULL]
[pnfsd: move getdevinfo xdr structure to private header]
[pnfsd: clean up getdeviceinfo export op API]
[pnfsd: getdeviceinfo deviceid needs to be const.]
[pnfsd: allow returning empty device list.]
[pnfsd: return NFS4ERR_INVAL when maxdevices is zero.]
[pnfsd: move getdevlist xdr structure to private header]
[pnfsd: dev_iter: clean up export API]
[pnfsd: rename device fsid member to sbid]
[pnfsd: use devid.sbid for locating super block in getdevinfo]
[pnfsd: fixup nfsd4_encode_getdev{list,info} to use __be32 nfserr]
Signed-off-by: Benny Halevy <[email protected]>
[pnfsd: Use list_move instead list_del and list_add]
Signed-off-by: Bian Naimeng <[email protected]>
[pnfsd: using sbid instead of fsid while returning device list to client]
Signed-off-by: Zhengju Sha <[email protected]>
Signed-off-by: Benny Halevy <[email protected]>
Signed-off-by: Benny Halevy <[email protected]>
---
fs/nfsd/export.c | 3 +-
fs/nfsd/nfs4pnfsd.c | 30 ++++-
fs/nfsd/nfs4proc.c | 92 ++++++++++++++
fs/nfsd/nfs4xdr.c | 257 ++++++++++++++++++++++++++++++++++++++++
fs/nfsd/pnfsd.h | 3 +
fs/nfsd/xdr4.h | 22 ++++
include/linux/nfsd/nfsd4_pnfs.h | 32 +++++
7 files changed, 433 insertions(+), 6 deletions(-)
diff --git a/fs/nfsd/export.c b/fs/nfsd/export.c
index d803414..462f0df 100644
--- a/fs/nfsd/export.c
+++ b/fs/nfsd/export.c
@@ -377,7 +377,8 @@ static int check_export(struct inode *inode, int *flags, unsigned char *uuid)
}
if (inode->i_sb->s_pnfs_op &&
- !inode->i_sb->s_pnfs_op->layout_type) {
+ (!inode->i_sb->s_pnfs_op->layout_type ||
+ !inode->i_sb->s_pnfs_op->get_device_info)) {
dprintk("exp_export: export of invalid fs pnfs export ops.\n");
return -EINVAL;
}
diff --git a/fs/nfsd/nfs4pnfsd.c b/fs/nfsd/nfs4pnfsd.c
index 9a7cbc9..d219e42 100644
--- a/fs/nfsd/nfs4pnfsd.c
+++ b/fs/nfsd/nfs4pnfsd.c
@@ -118,7 +118,29 @@ struct sbid_tracker {
return id;
}
-static u64
+struct super_block *
+find_sbid_id(u64 id)
+{
+ struct sbid_tracker *sbid;
+ struct super_block *sb = NULL;
+ unsigned long hash_idx = id & SBID_HASH_MASK;
+ int pos = 0;
+
+ spin_lock(&layout_lock);
+ list_for_each_entry (sbid, &sbid_hashtbl[hash_idx], hash) {
+ pos++;
+ if (sbid->id != id)
+ continue;
+ if (pos > 1)
+ list_move(&sbid->hash, &sbid_hashtbl[hash_idx]);
+ sb = sbid->sb;
+ break;
+ }
+ spin_unlock(&layout_lock);
+ return sb;
+}
+
+u64
find_create_sbid(struct super_block *sb)
{
struct sbid_tracker *sbid;
@@ -131,10 +153,8 @@ struct sbid_tracker {
pos++;
if (sbid->sb != sb)
continue;
- if (pos > 1) {
- list_del(&sbid->hash);
- list_add(&sbid->hash, &sbid_hashtbl[hash_idx]);
- }
+ if (pos > 1)
+ list_move(&sbid->hash, &sbid_hashtbl[hash_idx]);
id = sbid->id;
break;
}
diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
index 576b635..feea3a9 100644
--- a/fs/nfsd/nfs4proc.c
+++ b/fs/nfsd/nfs4proc.c
@@ -1146,6 +1146,87 @@ static int fill_in_write_vector(struct kvec *vec, struct nfsd4_write *write)
out:
return status;
}
+
+static __be32
+nfsd4_getdevlist(struct svc_rqst *rqstp,
+ struct nfsd4_compound_state *cstate,
+ struct nfsd4_pnfs_getdevlist *gdlp)
+{
+ struct super_block *sb;
+ struct svc_fh *current_fh = &cstate->current_fh;
+ int status;
+
+ dprintk("%s: type %u maxdevices %u cookie %llu verf %llu\n",
+ __func__, gdlp->gd_layout_type, gdlp->gd_maxdevices,
+ gdlp->gd_cookie, gdlp->gd_verf);
+
+
+ status = fh_verify(rqstp, current_fh, 0, NFSD_MAY_NOP);
+ if (status)
+ goto out;
+
+ status = nfserr_inval;
+ sb = current_fh->fh_dentry->d_inode->i_sb;
+ if (!sb)
+ goto out;
+
+ /* We must be able to encode at list one device */
+ if (!gdlp->gd_maxdevices)
+ goto out;
+
+ /* Ensure underlying file system supports pNFS and,
+ * if so, the requested layout type
+ */
+ status = nfsd4_layout_verify(sb, current_fh->fh_export,
+ gdlp->gd_layout_type);
+ if (status)
+ goto out;
+
+ /* Do nothing if underlying file system does not support
+ * getdevicelist */
+ if (!sb->s_pnfs_op->get_device_iter) {
+ status = nfserr_notsupp;
+ goto out;
+ }
+
+ /* Set up arguments so device can be retrieved at encode time */
+ gdlp->gd_fhp = &cstate->current_fh;
+out:
+ return status;
+}
+
+static __be32
+nfsd4_getdevinfo(struct svc_rqst *rqstp,
+ struct nfsd4_compound_state *cstate,
+ struct nfsd4_pnfs_getdevinfo *gdp)
+{
+ struct super_block *sb;
+ int status;
+
+ dprintk("%s: layout_type %u dev_id %llx:%llx maxcnt %u\n",
+ __func__, gdp->gd_layout_type, gdp->gd_devid.sbid,
+ gdp->gd_devid.devid, gdp->gd_maxcount);
+
+ status = nfserr_inval;
+ sb = find_sbid_id(gdp->gd_devid.sbid);
+ dprintk("%s: sb %p\n", __func__, sb);
+ if (!sb) {
+ status = nfserr_noent;
+ goto out;
+ }
+
+ /* Ensure underlying file system supports pNFS and,
+ * if so, the requested layout type
+ */
+ status = nfsd4_layout_verify(sb, NULL, gdp->gd_layout_type);
+ if (status)
+ goto out;
+
+ /* Set up arguments so device can be retrieved at encode time */
+ gdp->gd_sb = sb;
+out:
+ return status;
+}
#endif /* CONFIG_PNFSD */
/*
@@ -1879,6 +1960,17 @@ static inline u32 nfsd4_create_session_rsize(struct svc_rqst *rqstp, struct nfsd
.op_get_currentstateid = (stateid_getter)nfsd4_get_freestateid,
.op_rsize_bop = (nfsd4op_rsize)nfsd4_only_status_rsize,
},
+#if defined(CONFIG_PNFSD)
+ [OP_GETDEVICELIST] = {
+ .op_func = (nfsd4op_func)nfsd4_getdevlist,
+ .op_name = "OP_GETDEVICELIST",
+ },
+ [OP_GETDEVICEINFO] = {
+ .op_func = (nfsd4op_func)nfsd4_getdevinfo,
+ .op_flags = ALLOWED_WITHOUT_FH,
+ .op_name = "OP_GETDEVICEINFO",
+ },
+#endif /* CONFIG_PNFSD */
};
#ifdef NFSD_DEBUG
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index d9454fe..a761514 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -46,6 +46,7 @@
#include <linux/utsname.h>
#include <linux/pagemap.h>
#include <linux/sunrpc/svcauth_gss.h>
+#include <linux/exportfs.h>
#include "idmap.h"
#include "acl.h"
@@ -54,6 +55,7 @@
#include "state.h"
#include "cache.h"
#include "netns.h"
+#include "pnfsd.h"
#ifdef CONFIG_NFSD_V4_SECURITY_LABEL
#include <linux/security.h>
@@ -1484,6 +1486,42 @@ static __be32 nfsd4_decode_reclaim_complete(struct nfsd4_compoundargs *argp, str
DECODE_TAIL;
}
+#if defined(CONFIG_PNFSD)
+static __be32
+nfsd4_decode_getdevlist(struct nfsd4_compoundargs *argp,
+ struct nfsd4_pnfs_getdevlist *gdevl)
+{
+ DECODE_HEAD;
+
+ READ_BUF(16 + sizeof(nfs4_verifier));
+ READ32(gdevl->gd_layout_type);
+ READ32(gdevl->gd_maxdevices);
+ READ64(gdevl->gd_cookie);
+ COPYMEM(&gdevl->gd_verf, sizeof(nfs4_verifier));
+
+ DECODE_TAIL;
+}
+
+static __be32
+nfsd4_decode_getdevinfo(struct nfsd4_compoundargs *argp,
+ struct nfsd4_pnfs_getdevinfo *gdev)
+{
+ u32 num;
+ DECODE_HEAD;
+
+ READ_BUF(12 + sizeof(struct nfsd4_pnfs_deviceid));
+ READ64(gdev->gd_devid.sbid);
+ READ64(gdev->gd_devid.devid);
+ READ32(gdev->gd_layout_type);
+ READ32(gdev->gd_maxcount);
+ READ32(num);
+ if (num)
+ READ_BUF(4); /* TODO: for now, just skip notify_types */
+
+ DECODE_TAIL;
+}
+#endif /* CONFIG_PNFSD */
+
static __be32
nfsd4_decode_noop(struct nfsd4_compoundargs *argp, void *p)
{
@@ -1585,11 +1623,19 @@ static __be32 nfsd4_decode_reclaim_complete(struct nfsd4_compoundargs *argp, str
[OP_DESTROY_SESSION] = (nfsd4_dec)nfsd4_decode_destroy_session,
[OP_FREE_STATEID] = (nfsd4_dec)nfsd4_decode_free_stateid,
[OP_GET_DIR_DELEGATION] = (nfsd4_dec)nfsd4_decode_notsupp,
+#if defined(CONFIG_PNFSD)
+ [OP_GETDEVICEINFO] = (nfsd4_dec)nfsd4_decode_getdevinfo,
+ [OP_GETDEVICELIST] = (nfsd4_dec)nfsd4_decode_getdevlist,
+ [OP_LAYOUTCOMMIT] = (nfsd4_dec)nfsd4_decode_notsupp,
+ [OP_LAYOUTGET] = (nfsd4_dec)nfsd4_decode_notsupp,
+ [OP_LAYOUTRETURN] = (nfsd4_dec)nfsd4_decode_notsupp,
+#else /* CONFIG_PNFSD */
[OP_GETDEVICEINFO] = (nfsd4_dec)nfsd4_decode_notsupp,
[OP_GETDEVICELIST] = (nfsd4_dec)nfsd4_decode_notsupp,
[OP_LAYOUTCOMMIT] = (nfsd4_dec)nfsd4_decode_notsupp,
[OP_LAYOUTGET] = (nfsd4_dec)nfsd4_decode_notsupp,
[OP_LAYOUTRETURN] = (nfsd4_dec)nfsd4_decode_notsupp,
+#endif /* CONFIG_PNFSD */
[OP_SECINFO_NO_NAME] = (nfsd4_dec)nfsd4_decode_secinfo_no_name,
[OP_SEQUENCE] = (nfsd4_dec)nfsd4_decode_sequence,
[OP_SET_SSV] = (nfsd4_dec)nfsd4_decode_notsupp,
@@ -3519,6 +3565,209 @@ static __be32 nfsd4_encode_bind_conn_to_session(struct nfsd4_compoundres *resp,
return nfserr;
}
+#if defined(CONFIG_PNFSD)
+
+/* Uses the export interface to iterate through the available devices
+ * and encodes them on the response stream.
+ */
+static __be32
+nfsd4_encode_devlist_iterator(struct nfsd4_compoundres *resp,
+ struct nfsd4_pnfs_getdevlist *gdevl,
+ unsigned int *dev_count)
+{
+ struct super_block *sb = gdevl->gd_fhp->fh_dentry->d_inode->i_sb;
+ __be32 nfserr;
+ int status;
+ __be32 *p;
+ struct nfsd4_pnfs_dev_iter_res res = {
+ .gd_cookie = gdevl->gd_cookie,
+ .gd_verf = gdevl->gd_verf,
+ .gd_eof = 0
+ };
+ u64 sbid;
+
+ dprintk("%s: Begin\n", __func__);
+
+ sbid = find_create_sbid(sb);
+ *dev_count = 0;
+ do {
+ status = sb->s_pnfs_op->get_device_iter(sb,
+ gdevl->gd_layout_type,
+ &res);
+ if (status) {
+ if (status == -ENOENT) {
+ res.gd_eof = 1;
+ /* return success */
+ break;
+ }
+ nfserr = nfserrno(status);
+ goto out_err;
+ }
+
+ /* Encode device id and layout type */
+ RESERVE_SPACE(sizeof(struct nfsd4_pnfs_deviceid));
+ WRITE64(sbid);
+ WRITE64(res.gd_devid); /* devid minor */
+ ADJUST_ARGS();
+ (*dev_count)++;
+ } while (*dev_count < gdevl->gd_maxdevices && !res.gd_eof);
+ gdevl->gd_cookie = res.gd_cookie;
+ gdevl->gd_verf = res.gd_verf;
+ gdevl->gd_eof = res.gd_eof;
+ nfserr = nfs_ok;
+out_err:
+ dprintk("%s: Encoded %u devices\n", __func__, *dev_count);
+ return nfserr;
+}
+
+/* Encodes the response of get device list.
+*/
+static __be32
+nfsd4_encode_getdevlist(struct nfsd4_compoundres *resp, __be32 nfserr,
+ struct nfsd4_pnfs_getdevlist *gdevl)
+{
+ unsigned int dev_count = 0, lead_count;
+ u32 *p_in = resp->p;
+ __be32 *p;
+
+ dprintk("%s: err %d\n", __func__, nfserr);
+ if (nfserr)
+ return nfserr;
+
+ /* Ensure we have room for cookie, verifier, and devlist len,
+ * which we will backfill in after we encode as many devices as possible
+ */
+ lead_count = 8 + sizeof(nfs4_verifier) + 4;
+ RESERVE_SPACE(lead_count);
+ /* skip past these values */
+ p += XDR_QUADLEN(lead_count);
+ ADJUST_ARGS();
+
+ /* Iterate over as many device ids as possible on the xdr stream */
+ nfserr = nfsd4_encode_devlist_iterator(resp, gdevl, &dev_count);
+ if (nfserr)
+ goto out_err;
+
+ /* Backfill in cookie, verf and number of devices encoded */
+ p = p_in;
+ WRITE64(gdevl->gd_cookie);
+ WRITEMEM(&gdevl->gd_verf, sizeof(nfs4_verifier));
+ WRITE32(dev_count);
+
+ /* Skip over devices */
+ p += XDR_QUADLEN(dev_count * sizeof(struct nfsd4_pnfs_deviceid));
+ ADJUST_ARGS();
+
+ /* are we at the end of devices? */
+ RESERVE_SPACE(4);
+ WRITE32(gdevl->gd_eof);
+ ADJUST_ARGS();
+
+ dprintk("%s: done.\n", __func__);
+
+ nfserr = nfs_ok;
+out:
+ return nfserr;
+out_err:
+ p = p_in;
+ ADJUST_ARGS();
+ goto out;
+}
+
+/* For a given device id, have the file system retrieve and encode the
+ * associated device. For file layout, the encoding function is
+ * passed down to the file system. The file system then has the option
+ * of using this encoding function or one of its own.
+ *
+ * Note: the file system must return the XDR size of struct device_addr4
+ * da_addr_body in pnfs_xdr_info.bytes_written on NFS4ERR_TOOSMALL for the
+ * gdir_mincount calculation.
+ */
+static __be32
+nfsd4_encode_getdevinfo(struct nfsd4_compoundres *resp, __be32 nfserr,
+ struct nfsd4_pnfs_getdevinfo *gdev)
+{
+ struct super_block *sb;
+ int maxcount = 0, type_notify_len = 12;
+ __be32 *p, *p_save = NULL, *p_in = resp->p;
+ struct exp_xdr_stream xdr;
+
+ dprintk("%s: err %d\n", __func__, nfserr);
+ if (nfserr)
+ return nfserr;
+
+ sb = gdev->gd_sb;
+
+ if (gdev->gd_maxcount != 0) {
+ /* FIXME: this will be bound by the session max response */
+ maxcount = svc_max_payload(resp->rqstp);
+ if (maxcount > gdev->gd_maxcount)
+ maxcount = gdev->gd_maxcount;
+
+ /* Ensure have room for type and notify field */
+ maxcount -= type_notify_len;
+ if (maxcount < 0) {
+ nfserr = -ETOOSMALL;
+ goto toosmall;
+ }
+ }
+
+ RESERVE_SPACE(4);
+ WRITE32(gdev->gd_layout_type);
+ ADJUST_ARGS();
+
+ /* If maxcount is 0 then just update notifications */
+ if (gdev->gd_maxcount == 0)
+ goto handle_notifications;
+
+ xdr.p = p_save = resp->p;
+ xdr.end = resp->end;
+ if (xdr.end - xdr.p > exp_xdr_qwords(maxcount & ~3))
+ xdr.end = xdr.p + exp_xdr_qwords(maxcount & ~3);
+
+ nfserr = sb->s_pnfs_op->get_device_info(sb, &xdr, gdev->gd_layout_type,
+ &gdev->gd_devid);
+ if (nfserr) {
+ /* Rewind to the beginning */
+ p = p_in;
+ ADJUST_ARGS();
+ if (nfserr == -ETOOSMALL)
+ goto toosmall;
+ printk(KERN_ERR "%s: export ERROR %d\n", __func__, nfserr);
+ goto out;
+ }
+
+ /* The file system should never write 0 bytes without
+ * returning an error
+ */
+ BUG_ON(xdr.p == p_save);
+ BUG_ON(xdr.p > xdr.end);
+
+ /* Update the xdr stream with the number of bytes encoded
+ * by the file system.
+ */
+ p = xdr.p;
+ ADJUST_ARGS();
+
+handle_notifications:
+ /* Encode supported device notifications.
+ * Note: Currently none are supported.
+ */
+ RESERVE_SPACE(4);
+ WRITE32(0);
+ ADJUST_ARGS();
+
+out:
+ return nfserrno(nfserr);
+toosmall:
+ dprintk("%s: maxcount too small\n", __func__);
+ RESERVE_SPACE(4);
+ WRITE32((p_save ? (xdr.p - p_save) * 4 : 0) + type_notify_len);
+ ADJUST_ARGS();
+ goto out;
+}
+#endif /* CONFIG_PNFSD */
+
static __be32
nfsd4_encode_noop(struct nfsd4_compoundres *resp, __be32 nfserr, void *p)
{
@@ -3579,11 +3828,19 @@ static __be32 nfsd4_encode_bind_conn_to_session(struct nfsd4_compoundres *resp,
[OP_DESTROY_SESSION] = (nfsd4_enc)nfsd4_encode_destroy_session,
[OP_FREE_STATEID] = (nfsd4_enc)nfsd4_encode_free_stateid,
[OP_GET_DIR_DELEGATION] = (nfsd4_enc)nfsd4_encode_noop,
+#if defined(CONFIG_PNFSD)
+ [OP_GETDEVICEINFO] = (nfsd4_enc)nfsd4_encode_getdevinfo,
+ [OP_GETDEVICELIST] = (nfsd4_enc)nfsd4_encode_getdevlist,
+ [OP_LAYOUTCOMMIT] = (nfsd4_enc)nfsd4_encode_noop,
+ [OP_LAYOUTGET] = (nfsd4_enc)nfsd4_encode_noop,
+ [OP_LAYOUTRETURN] = (nfsd4_enc)nfsd4_encode_noop,
+#else /* CONFIG_PNFSD */
[OP_GETDEVICEINFO] = (nfsd4_enc)nfsd4_encode_noop,
[OP_GETDEVICELIST] = (nfsd4_enc)nfsd4_encode_noop,
[OP_LAYOUTCOMMIT] = (nfsd4_enc)nfsd4_encode_noop,
[OP_LAYOUTGET] = (nfsd4_enc)nfsd4_encode_noop,
[OP_LAYOUTRETURN] = (nfsd4_enc)nfsd4_encode_noop,
+#endif /* CONFIG_PNFSD */
[OP_SECINFO_NO_NAME] = (nfsd4_enc)nfsd4_encode_secinfo_no_name,
[OP_SEQUENCE] = (nfsd4_enc)nfsd4_encode_sequence,
[OP_SET_SSV] = (nfsd4_enc)nfsd4_encode_noop,
diff --git a/fs/nfsd/pnfsd.h b/fs/nfsd/pnfsd.h
index 29ea2e7..cfcfc9a 100644
--- a/fs/nfsd/pnfsd.h
+++ b/fs/nfsd/pnfsd.h
@@ -38,4 +38,7 @@
#include "xdr4.h"
+u64 find_create_sbid(struct super_block *);
+struct super_block *find_sbid_id(u64);
+
#endif /* LINUX_NFSD_PNFSD_H */
diff --git a/fs/nfsd/xdr4.h b/fs/nfsd/xdr4.h
index b3ed644..faf37bc 100644
--- a/fs/nfsd/xdr4.h
+++ b/fs/nfsd/xdr4.h
@@ -37,6 +37,8 @@
#ifndef _LINUX_NFSD_XDR4_H
#define _LINUX_NFSD_XDR4_H
+#include <linux/nfsd/nfsd4_pnfs.h>
+
#include "state.h"
#include "nfsd.h"
@@ -430,6 +432,22 @@ struct nfsd4_reclaim_complete {
u32 rca_one_fs;
};
+struct nfsd4_pnfs_getdevinfo {
+ struct nfsd4_pnfs_deviceid gd_devid; /* request */
+ u32 gd_layout_type; /* request */
+ u32 gd_maxcount; /* request */
+ struct super_block *gd_sb;
+};
+
+struct nfsd4_pnfs_getdevlist {
+ u32 gd_layout_type; /* request */
+ u32 gd_maxdevices; /* request */
+ u64 gd_cookie; /* request - response */
+ u64 gd_verf; /* request - response */
+ struct svc_fh *gd_fhp; /* response */
+ u32 gd_eof; /* response */
+};
+
struct nfsd4_op {
int opnum;
__be32 status;
@@ -475,6 +493,10 @@ struct nfsd4_op {
struct nfsd4_reclaim_complete reclaim_complete;
struct nfsd4_test_stateid test_stateid;
struct nfsd4_free_stateid free_stateid;
+#if defined(CONFIG_PNFSD)
+ struct nfsd4_pnfs_getdevlist pnfs_getdevlist;
+ struct nfsd4_pnfs_getdevinfo pnfs_getdevinfo;
+#endif /* CONFIG_PNFSD */
} u;
struct nfs4_replay * replay;
};
diff --git a/include/linux/nfsd/nfsd4_pnfs.h b/include/linux/nfsd/nfsd4_pnfs.h
index d44669e..53a0bb7 100644
--- a/include/linux/nfsd/nfsd4_pnfs.h
+++ b/include/linux/nfsd/nfsd4_pnfs.h
@@ -35,6 +35,19 @@
#define _LINUX_NFSD_NFSD4_PNFS_H
#include <linux/exportfs.h>
+#include <linux/exp_xdr.h>
+
+struct nfsd4_pnfs_deviceid {
+ u64 sbid; /* per-superblock unique ID */
+ u64 devid; /* filesystem-wide unique device ID */
+};
+
+struct nfsd4_pnfs_dev_iter_res {
+ u64 gd_cookie; /* request/repsonse */
+ u64 gd_verf; /* request/repsonse */
+ u64 gd_devid; /* response */
+ u32 gd_eof; /* response */
+};
/*
* pNFS export operations vector.
@@ -49,6 +62,25 @@
struct pnfs_export_operations {
/* Returns the supported pnfs_layouttype4. */
int (*layout_type) (struct super_block *);
+
+ /* Encode device info onto the xdr stream. */
+ int (*get_device_info) (struct super_block *,
+ struct exp_xdr_stream *,
+ u32 layout_type,
+ const struct nfsd4_pnfs_deviceid *);
+
+ /* Retrieve all available devices via an iterator.
+ * arg->cookie == 0 indicates the beginning of the list,
+ * otherwise arg->verf is used to verify that the list hasn't changed
+ * while retrieved.
+ *
+ * On output, the filesystem sets the devid based on the current cookie
+ * and sets res->cookie and res->verf corresponding to the next entry.
+ * When the last entry in the list is retrieved, res->eof is set to 1.
+ */
+ int (*get_device_iter) (struct super_block *,
+ u32 layout_type,
+ struct nfsd4_pnfs_dev_iter_res *);
};
#endif /* _LINUX_NFSD_NFSD4_PNFS_H */
--
1.8.3.1
Should go straight to Al independent of this series.
we don't want to hold the state_lock while the file system may block
Signed-off-by: Benny Halevy <[email protected]>
---
fs/nfsd/nfs4pnfsd.c | 4 +++-
fs/nfsd/nfs4state.c | 34 +++++++++++++++++++++++++++++++---
fs/nfsd/state.h | 1 +
3 files changed, 35 insertions(+), 4 deletions(-)
diff --git a/fs/nfsd/nfs4pnfsd.c b/fs/nfsd/nfs4pnfsd.c
index 8d16b85..1807455 100644
--- a/fs/nfsd/nfs4pnfsd.c
+++ b/fs/nfsd/nfs4pnfsd.c
@@ -166,6 +166,7 @@ struct sbid_tracker {
{
struct nfs4_layout_state *ls =
container_of(kref, struct nfs4_layout_state, ls_ref);
+ struct nfs4_file *fp;
nfsd4_remove_stid(&ls->ls_stid);
if (!list_empty(&ls->ls_perclnt)) {
@@ -173,8 +174,9 @@ struct sbid_tracker {
unhash_layout_state(ls);
spin_unlock(&layout_lock);
}
- put_nfs4_file(ls->ls_file);
+ fp = ls->ls_file;
nfsd4_free_stid(layout_state_slab, &ls->ls_stid);
+ put_nfs4_file_locked(fp);
}
/*
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index e11d96f..5d5dead 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -239,14 +239,42 @@ static void nfsd4_free_file(struct nfs4_file *f)
kmem_cache_free(file_slab, f);
}
-void
-put_nfs4_file(struct nfs4_file *fi)
+static struct inode *put_nfs4_file_common(struct nfs4_file *fi)
{
if (atomic_dec_and_lock(&fi->fi_ref, &recall_lock)) {
+ struct inode *ino;
+
hlist_del(&fi->fi_hash);
spin_unlock(&recall_lock);
- iput(fi->fi_inode);
+ ino = fi->fi_inode;
nfsd4_free_file(fi);
+
+ return ino;
+ }
+ return NULL;
+}
+
+void
+put_nfs4_file(struct nfs4_file *fi)
+{
+ struct inode *ino;
+
+ ino = put_nfs4_file_common(fi);
+ if (ino)
+ iput(ino);
+}
+
+void
+put_nfs4_file_locked(struct nfs4_file *fi)
+{
+ struct inode *ino;
+
+ nfs4_assert_state_locked();
+ ino = put_nfs4_file_common(fi);
+ if (ino) {
+ nfs4_unlock_state();
+ iput(ino);
+ nfs4_lock_state();
}
}
diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h
index 1ef09ae..3be7507 100644
--- a/fs/nfsd/state.h
+++ b/fs/nfsd/state.h
@@ -491,6 +491,7 @@ extern struct nfs4_client_reclaim *nfs4_client_to_reclaim(const char *name,
extern void nfsd4_free_slab(struct kmem_cache **);
extern struct nfs4_file *find_alloc_file(struct inode *, struct svc_fh *);
extern void put_nfs4_file(struct nfs4_file *);
+extern void put_nfs4_file_locked(struct nfs4_file *);
extern void get_nfs4_file(struct nfs4_file *);
extern struct nfs4_client *find_confirmed_client(clientid_t *, bool sessions, struct nfsd_net *);
extern struct nfs4_stid *nfsd4_alloc_stid(struct nfs4_client *cl, struct kmem_cache *slab);
--
1.8.3.1
On Sun, Sep 29, 2013 at 05:21:30AM -0700, Christoph Hellwig wrote:
> > Bruce - are you ok with moving the pnfs interface definitions to
> > include/linux/exportfs.h along with struct export_operations?
> >
> > In fact we can actually extend struct export_operations rather
> > than adding pnfs_export_operations...
>
> Yes, it probably should go into the export ops, although the actual
> method signatures might need to be made a litle less nfs-specific for
> that.
I jsut took a brief look over the diff for the whole series in the git
tree and the old tree that still had block and exofs servers and have
revised my opinion a little bit:
- the should be a layout_type field in struct export_operations,
indicating that a filesystem support some sort of pnfs-like export.
- there should be a struct pnfs_operations, but it should be confined
to fs/nfsd: each layout can be a separate loadable module and gets
registered there. For the initial file layout that module is
self-contained, but for e.g. block or objects it would have
call into the filesystem through export_ops, although way lower level
than the NFS XDR level, e.g. for block there would be one of to get
the extent map, and one to allocate an extent.
This way we alsod avoid the dependcy on nfsd in the filesystems that the
cureent version introduces.
Destroy the client layout state upon expiry.
pnfs_return_client_layouts is used to locate the related layout state
as if the client returned all of its layouts via RETURN_ALL with IOMODE_ANY.
Signed-off-by: Benny Halevy <[email protected]>
---
fs/nfsd/nfs4pnfsd.c | 22 ++++++++++++++++++++++
fs/nfsd/nfs4state.c | 1 +
fs/nfsd/state.h | 2 ++
3 files changed, 25 insertions(+)
diff --git a/fs/nfsd/nfs4pnfsd.c b/fs/nfsd/nfs4pnfsd.c
index 2ba4a29..d18e2a1 100644
--- a/fs/nfsd/nfs4pnfsd.c
+++ b/fs/nfsd/nfs4pnfsd.c
@@ -862,3 +862,25 @@ int nfs4_pnfs_return_layout(struct svc_rqst *rqstp,
dprintk("pNFS %s: exit status %d\n", __func__, status);
return status;
}
+
+/*
+ * Note: must be called under the state lock
+ */
+void pnfs_expire_client(struct nfs4_client *clp)
+{
+ struct nfsd4_pnfs_layoutreturn lr = {
+ .args.lr_return_type = RETURN_ALL,
+ .args.lr_seg = {
+ .iomode = IOMODE_ANY,
+ .offset = 0,
+ .length = NFS4_MAX_UINT64,
+ },
+ };
+ LIST_HEAD(lo_destroy_list);
+
+ nfs4_assert_state_locked();
+
+ pnfs_return_client_layouts(clp, &lr, 0, &lo_destroy_list);
+
+ destroy_layout_list(&lo_destroy_list);
+}
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index a9bd82b..600edbc 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -1172,6 +1172,7 @@ static struct nfs4_client *alloc_client(struct xdr_netobj name)
oo = list_entry(clp->cl_openowners.next, struct nfs4_openowner, oo_perclient);
release_openowner(oo);
}
+ pnfs_expire_client(clp);
nfsd4_shutdown_callback(clp);
if (clp->cl_cb_conn.cb_xprt)
svc_xprt_put(clp->cl_cb_conn.cb_xprt);
diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h
index d2c75c5..90ec8b9 100644
--- a/fs/nfsd/state.h
+++ b/fs/nfsd/state.h
@@ -506,9 +506,11 @@ extern struct nfs4_client_reclaim *nfs4_client_to_reclaim(const char *name,
#if defined(CONFIG_PNFSD)
extern int nfsd4_init_pnfs_slabs(void);
extern void nfsd4_free_pnfs_slabs(void);
+extern void pnfs_expire_client(struct nfs4_client *);
#else /* CONFIG_PNFSD */
static inline void nfsd4_free_pnfs_slabs(void) {}
static inline int nfsd4_init_pnfs_slabs(void) { return 0; }
+static inline void pnfs_expire_client(struct nfs4_client *clp) {}
#endif /* CONFIG_PNFSD */
static inline u64
--
1.8.3.1
On Thu, Sep 26, 2013 at 02:40:24PM -0400, Benny Halevy wrote:
> From: Benny Halevy <[email protected]>
>
> Verify whether the server and file system support the given layout type.
>
> [was pnfsd: Streamline error code checking for non-pnfs filesystems]
> Signed-off-by: Dean Hildebrand <[email protected]>
> [pnfsd: Add super block to layout_type()]
> Signed-off-by: Marc Eshel <[email protected]>
> [pnfsd: Fix order of ops in nfsd4_layout_verify]
> Signed-off-by: Dean Hildebrand <[email protected]>
> [pnfsd: convert generic code to use new pnfs api]
> [pnfsd: define pnfs_export_operations]
> [pnfsd: obliterate old vfs api]
> Signed-off-by: Benny Halevy <[email protected]>
> [pnfsd: layout verify all layout types]
> Signed-off-by: Andy Adamson <[email protected]>
> [pnfsd: tone nfsd4_layout_verify printk down to dprintk]
> Signed-off-by: Benny Halevy <[email protected]>
> [pnfsd: check ex_pnfs in nfsd4_verify_layout]
> Signed-off-by: Andy Adamson <[email protected]>
> [pnfsd: handle s_pnfs_op==NULL]
> [pnfsd: verify export option only if svc_export is present]
> Signed-off-by: Benny Halevy <[email protected]>
> Signed-off-by: Benny Halevy <[email protected]>
> ---
> fs/nfsd/export.c | 6 ++++++
> fs/nfsd/nfs4proc.c | 39 +++++++++++++++++++++++++++++++++++++++
> fs/nfsd/pnfsd.h | 2 ++
> include/linux/nfsd/nfsd4_pnfs.h | 5 ++++-
> 4 files changed, 51 insertions(+), 1 deletion(-)
>
> diff --git a/fs/nfsd/export.c b/fs/nfsd/export.c
> index 7730dfd..d803414 100644
> --- a/fs/nfsd/export.c
> +++ b/fs/nfsd/export.c
> @@ -376,6 +376,12 @@ static int check_export(struct inode *inode, int *flags, unsigned char *uuid)
> return -EINVAL;
> }
>
> + if (inode->i_sb->s_pnfs_op &&
> + !inode->i_sb->s_pnfs_op->layout_type) {
> + dprintk("exp_export: export of invalid fs pnfs export ops.\n");
> + return -EINVAL;
> + }
> +
If you haven't already done it you may want to look at modifying
nfs-utils/utils/exportfs/exportfs.c:test_export() to add the pnfs option
when appropriate so the error can be returned at exportfs time.
> return 0;
>
> }
> diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
> index 419572f..576b635 100644
> --- a/fs/nfsd/nfs4proc.c
> +++ b/fs/nfsd/nfs4proc.c
> @@ -41,6 +41,7 @@
> #include "vfs.h"
> #include "current_stateid.h"
> #include "netns.h"
> +#include "pnfsd.h"
>
> #ifdef CONFIG_NFSD_V4_SECURITY_LABEL
> #include <linux/security.h>
> @@ -1109,6 +1110,44 @@ static int fill_in_write_vector(struct kvec *vec, struct nfsd4_write *write)
> return status == nfserr_same ? nfs_ok : status;
> }
>
> +#if defined(CONFIG_PNFSD)
> +static __be32
> +nfsd4_layout_verify(struct super_block *sb, struct svc_export *exp,
> + unsigned int layout_type)
> +{
> + int status, type;
> +
> + /* check to see if pNFS is supported. */
> + status = nfserr_layoutunavailable;
> + if (exp && exp->ex_pnfs == 0) {
Can this really be called with exp == NULL? If so don't you want to
fail that as well?
> + dprintk("%s: Underlying file system "
> + "is not exported over pNFS\n", __func__);
> + goto out;
> + }
> + if (!sb->s_pnfs_op || !sb->s_pnfs_op->layout_type) {
> + dprintk("%s: Underlying file system "
> + "does not support pNFS\n", __func__);
> + goto out;
> + }
> +
> + type = sb->s_pnfs_op->layout_type(sb);
> +
> + /* check to see if requested layout type is supported. */
> + status = nfserr_unknown_layouttype;
> + if (!type)
> + dprintk("BUG: %s: layout_type 0 is reserved and must not be "
> + "used by filesystem\n", __func__);
> + else if (type != layout_type)
> + dprintk("%s: requested layout type %d "
> + "does not match supported type %d\n",
> + __func__, layout_type, type);
> + else
> + status = nfs_ok;
> +out:
> + return status;
> +}
> +#endif /* CONFIG_PNFSD */
> +
> /*
> * NULL call.
> */
> diff --git a/fs/nfsd/pnfsd.h b/fs/nfsd/pnfsd.h
> index 65fb57e..7c46791 100644
> --- a/fs/nfsd/pnfsd.h
> +++ b/fs/nfsd/pnfsd.h
> @@ -34,4 +34,6 @@
> #ifndef LINUX_NFSD_PNFSD_H
> #define LINUX_NFSD_PNFSD_H
>
> +#include <linux/nfsd/nfsd4_pnfs.h>
> +
> #endif /* LINUX_NFSD_PNFSD_H */
> diff --git a/include/linux/nfsd/nfsd4_pnfs.h b/include/linux/nfsd/nfsd4_pnfs.h
> index ff6613e..d44669e 100644
> --- a/include/linux/nfsd/nfsd4_pnfs.h
> +++ b/include/linux/nfsd/nfsd4_pnfs.h
> @@ -34,6 +34,8 @@
> #ifndef _LINUX_NFSD_NFSD4_PNFS_H
> #define _LINUX_NFSD_NFSD4_PNFS_H
>
> +#include <linux/exportfs.h>
> +
> /*
> * pNFS export operations vector.
> *
> @@ -45,7 +47,8 @@
> * All other methods are optional and can be set to NULL if not implemented.
> */
> struct pnfs_export_operations {
> - /* stub */
> + /* Returns the supported pnfs_layouttype4. */
> + int (*layout_type) (struct super_block *);
> };
>
> #endif /* _LINUX_NFSD_NFSD4_PNFS_H */
> --
> 1.8.3.1
>
From: Benny Halevy <[email protected]>
Signed-off-by: Ricardo Labiaga <[email protected]>
[rephrased text and moved down to fs/nfsd/Kconfig]
[remove CONFIG_PNFSD's dependency on NFSD_V4_1]
Signed-off-by: Benny Halevy <[email protected]>
[rephrase and remove CONFIG_PNFSD's dependency on EXPERIMENTAL]
Signed-off-by: Benny Halevy <[email protected]>
---
fs/nfsd/Kconfig | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/fs/nfsd/Kconfig b/fs/nfsd/Kconfig
index dc8f1ef..4d68a8c 100644
--- a/fs/nfsd/Kconfig
+++ b/fs/nfsd/Kconfig
@@ -106,3 +106,13 @@ config NFSD_FAULT_INJECTION
testing error recovery on the NFS client.
If unsure, say N.
+
+config PNFSD
+ bool "NFSv4.1 server support for Parallel NFS (pNFS) (EXPERIMENTAL)"
+ depends on NFSD_V4
+ help
+ This option enables support for the parallel NFS features of the
+ minor version 1 of the NFSv4 protocol (RFC5661)
+ in the kernel's NFS server.
+
+ If unsure, say N.
--
1.8.3.1
On 2013-09-27 17:44, J. Bruce Fields wrote:
> On Thu, Sep 26, 2013 at 02:40:24PM -0400, Benny Halevy wrote:
>> From: Benny Halevy <[email protected]>
>>
>> Verify whether the server and file system support the given layout type.
>>
>> [was pnfsd: Streamline error code checking for non-pnfs filesystems]
>> Signed-off-by: Dean Hildebrand <[email protected]>
>> [pnfsd: Add super block to layout_type()]
>> Signed-off-by: Marc Eshel <[email protected]>
>> [pnfsd: Fix order of ops in nfsd4_layout_verify]
>> Signed-off-by: Dean Hildebrand <[email protected]>
>> [pnfsd: convert generic code to use new pnfs api]
>> [pnfsd: define pnfs_export_operations]
>> [pnfsd: obliterate old vfs api]
>> Signed-off-by: Benny Halevy <[email protected]>
>> [pnfsd: layout verify all layout types]
>> Signed-off-by: Andy Adamson <[email protected]>
>> [pnfsd: tone nfsd4_layout_verify printk down to dprintk]
>> Signed-off-by: Benny Halevy <[email protected]>
>> [pnfsd: check ex_pnfs in nfsd4_verify_layout]
>> Signed-off-by: Andy Adamson <[email protected]>
>> [pnfsd: handle s_pnfs_op==NULL]
>> [pnfsd: verify export option only if svc_export is present]
>> Signed-off-by: Benny Halevy <[email protected]>
>> Signed-off-by: Benny Halevy <[email protected]>
>> ---
>> fs/nfsd/export.c | 6 ++++++
>> fs/nfsd/nfs4proc.c | 39 +++++++++++++++++++++++++++++++++++++++
>> fs/nfsd/pnfsd.h | 2 ++
>> include/linux/nfsd/nfsd4_pnfs.h | 5 ++++-
>> 4 files changed, 51 insertions(+), 1 deletion(-)
>>
>> diff --git a/fs/nfsd/export.c b/fs/nfsd/export.c
>> index 7730dfd..d803414 100644
>> --- a/fs/nfsd/export.c
>> +++ b/fs/nfsd/export.c
>> @@ -376,6 +376,12 @@ static int check_export(struct inode *inode, int *flags, unsigned char *uuid)
>> return -EINVAL;
>> }
>>
>> + if (inode->i_sb->s_pnfs_op &&
>> + !inode->i_sb->s_pnfs_op->layout_type) {
>> + dprintk("exp_export: export of invalid fs pnfs export ops.\n");
>> + return -EINVAL;
>> + }
>> +
>
> If you haven't already done it you may want to look at modifying
> nfs-utils/utils/exportfs/exportfs.c:test_export() to add the pnfs option
> when appropriate so the error can be returned at exportfs time.
Hmm, I'm not sure I follow your proposal.
In fs/nfsd/exportf.c:check_export() we check whether i_sb->s_export_op and
respectively, i_sb->s_pnfs_op support the required export methods.
How would we know in utils/exportfs when is appropriate to add the pnfs option?
Benny
>
>> return 0;
>>
>> }
>> diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
>> index 419572f..576b635 100644
>> --- a/fs/nfsd/nfs4proc.c
>> +++ b/fs/nfsd/nfs4proc.c
>> @@ -41,6 +41,7 @@
>> #include "vfs.h"
>> #include "current_stateid.h"
>> #include "netns.h"
>> +#include "pnfsd.h"
>>
>> #ifdef CONFIG_NFSD_V4_SECURITY_LABEL
>> #include <linux/security.h>
>> @@ -1109,6 +1110,44 @@ static int fill_in_write_vector(struct kvec *vec, struct nfsd4_write *write)
>> return status == nfserr_same ? nfs_ok : status;
>> }
>>
>> +#if defined(CONFIG_PNFSD)
>> +static __be32
>> +nfsd4_layout_verify(struct super_block *sb, struct svc_export *exp,
>> + unsigned int layout_type)
>> +{
>> + int status, type;
>> +
>> + /* check to see if pNFS is supported. */
>> + status = nfserr_layoutunavailable;
>> + if (exp && exp->ex_pnfs == 0) {
>
> Can this really be called with exp == NULL? If so don't you want to
> fail that as well?
It is called with exp == NULL from nfsd4_getdevinfo where it shouldn't
cause an error return.
Benny
>
>> + dprintk("%s: Underlying file system "
>> + "is not exported over pNFS\n", __func__);
>> + goto out;
>> + }
>> + if (!sb->s_pnfs_op || !sb->s_pnfs_op->layout_type) {
>> + dprintk("%s: Underlying file system "
>> + "does not support pNFS\n", __func__);
>> + goto out;
>> + }
>> +
>> + type = sb->s_pnfs_op->layout_type(sb);
>> +
>> + /* check to see if requested layout type is supported. */
>> + status = nfserr_unknown_layouttype;
>> + if (!type)
>> + dprintk("BUG: %s: layout_type 0 is reserved and must not be "
>> + "used by filesystem\n", __func__);
>> + else if (type != layout_type)
>> + dprintk("%s: requested layout type %d "
>> + "does not match supported type %d\n",
>> + __func__, layout_type, type);
>> + else
>> + status = nfs_ok;
>> +out:
>> + return status;
>> +}
>> +#endif /* CONFIG_PNFSD */
>> +
>> /*
>> * NULL call.
>> */
>> diff --git a/fs/nfsd/pnfsd.h b/fs/nfsd/pnfsd.h
>> index 65fb57e..7c46791 100644
>> --- a/fs/nfsd/pnfsd.h
>> +++ b/fs/nfsd/pnfsd.h
>> @@ -34,4 +34,6 @@
>> #ifndef LINUX_NFSD_PNFSD_H
>> #define LINUX_NFSD_PNFSD_H
>>
>> +#include <linux/nfsd/nfsd4_pnfs.h>
>> +
>> #endif /* LINUX_NFSD_PNFSD_H */
>> diff --git a/include/linux/nfsd/nfsd4_pnfs.h b/include/linux/nfsd/nfsd4_pnfs.h
>> index ff6613e..d44669e 100644
>> --- a/include/linux/nfsd/nfsd4_pnfs.h
>> +++ b/include/linux/nfsd/nfsd4_pnfs.h
>> @@ -34,6 +34,8 @@
>> #ifndef _LINUX_NFSD_NFSD4_PNFS_H
>> #define _LINUX_NFSD_NFSD4_PNFS_H
>>
>> +#include <linux/exportfs.h>
>> +
>> /*
>> * pNFS export operations vector.
>> *
>> @@ -45,7 +47,8 @@
>> * All other methods are optional and can be set to NULL if not implemented.
>> */
>> struct pnfs_export_operations {
>> - /* stub */
>> + /* Returns the supported pnfs_layouttype4. */
>> + int (*layout_type) (struct super_block *);
>> };
>>
>> #endif /* _LINUX_NFSD_NFSD4_PNFS_H */
>> --
>> 1.8.3.1
>>
On Thu, Sep 26, 2013 at 02:40:07PM -0400, Benny Halevy wrote:
> From: Dean Hildebrand <[email protected]>
I don't understand why we need to do this.
Also: based on the previous patch I believe we set the
EXCHGID4_FLAG_USE_PNFS_MDS bit in the reply unconditionally, so
regardless of what the client requests we're permitting it to use this
client as a MDS (or plain non-pnfs) server, so I'm not sure it matters
what the client requested.
Could you just drop this patch? Unless you have some good argument for
it.
--b.
>
> [was pnfsd: Add use of pnfs exchange flags]
> Signed-off-by: Dean Hildebrand <[email protected]>
> [pnfsd: define a is_ds_only_session helper]
> Signed-off-by: Benny Halevy <[email protected]>
> Signed-off-by: Benny Halevy <[email protected]>
> ---
> fs/nfsd/nfs4state.c | 4 ++++
> include/uapi/linux/nfs4.h | 7 +++++++
> 2 files changed, 11 insertions(+)
>
> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> index 21c15fc..2c973e6 100644
> --- a/fs/nfsd/nfs4state.c
> +++ b/fs/nfsd/nfs4state.c
> @@ -1953,6 +1953,10 @@ static __be32 nfsd4_check_cb_sec(struct nfsd4_cb_sec *cbs)
> status = nfserr_seq_misordered;
> goto out_free_conn;
> }
> +
> + if (is_ds_only_session(unconf->cl_exchange_flags))
> + cr_ses->flags &= ~SESSION4_BACK_CHAN;
> +
> old = find_confirmed_client_by_name(&unconf->cl_name, nn);
> if (old) {
> status = mark_client_expired(old);
> diff --git a/include/uapi/linux/nfs4.h b/include/uapi/linux/nfs4.h
> index 788128e..028f5fc 100644
> --- a/include/uapi/linux/nfs4.h
> +++ b/include/uapi/linux/nfs4.h
> @@ -125,6 +125,13 @@
> #define EXCHGID4_FLAG_USE_PNFS_DS 0x00040000
> #define EXCHGID4_FLAG_MASK_PNFS 0x00070000
>
> +static inline bool
> +is_ds_only_session(u32 exchange_flags)
> +{
> + u32 mask = EXCHGID4_FLAG_USE_PNFS_DS | EXCHGID4_FLAG_USE_PNFS_MDS;
> + return (exchange_flags & mask) == EXCHGID4_FLAG_USE_PNFS_DS;
> +}
> +
> #define EXCHGID4_FLAG_UPD_CONFIRMED_REC_A 0x40000000
> #define EXCHGID4_FLAG_CONFIRMED_R 0x80000000
> /*
> --
> 1.8.3.1
>
From: Benny Halevy <[email protected]>
Currently, always return a single record in the log_layout array.
If an invalid iomode, or an iomode of LAYOUTIOMODE4_ANY is specified, the
metadata server MUST return NFS4ERR_BADIOMODE.
[extracted from pnfsd: Initial pNFS server implementation.]
[pnfsd: nfsd layout cache: layout return changes]
[pnfsd: add debug printouts in return_layout path]
[pnfsd: refactor return_layout]
Signed-off-by: Benny Halevy <[email protected]>
[pnfsd: Streamline error code checking for non-pnfs filesystems]
[pnfsd: Use nfsd4_layout_seg instead of wrapper struct.]
[pnfsd: Move nfsd4_layout_seg to exportfs.h]
[pnfsd: Fix file layout layoutget export op for d13]
[pnfsd: Simplify layout get export interface.]
Signed-off-by: Dean Hildebrand <[email protected]>
[pnfsd: improve nfs4_pnfs_get_layout dprintks]
Signed-off-by: Benny Halevy <[email protected]>
[pnfsd: initialize layoutget return_on_close]
Signed-off-by: Andy Adamson <[email protected]>
[pnfsd: update server layout xdr for draft 19.]
Signed-off-by: Dean Hildebrand <[email protected]>
[pnfsd: use stateid_t for layout stateid xdr data structs]
Signed-off-by: Benny Halevy <[email protected]>
[pnfsd: Update getdeviceinfo for draft-19]
Signed-off-by: Dean Hildebrand <[email protected]>
[pnfsd: xdr encode layoutget response logr_layout array count as per draft-19]
[pnfsd: use stateid xdr {en,de}code functions for layoutget]
Signed-off-by: Benny Halevy <[email protected]>
[pnfsd: use nfsd4_compoundres pointer in pnfs_xdr_info]
Signed-off-by: Andy Adamson <[email protected]>
[pnfsd: move vfs api structures to nfsd4_pnfs.h]
[pnfsd: convert generic code to use new pnfs api]
[pnfsd: define pnfs_export_operations]
[pnfsd: obliterate old vfs api]
Signed-off-by: Benny Halevy <[email protected]>
[Split this patch into filelayout only (this patch) and all layout types]
(patch pnfsd: layout get all layout types).
Remove use of pnfs_export_operations.
Signed-off-by: Andy Adamson <[email protected]>
[pnfsd: fixup ENCODE_HEAD for layoutget]
[pnfsd: rewind xdr response pointer on nfsd4_encode_layoutget error]
Signed-off-by: Benny Halevy <[email protected]>
[Move pnfsd code from nfs4state.c to nfs4pnfsd.c]
[Move common state code from linux/nfsd/state.h to fs/nfsd/internal.h]
Signed-off-by: Andy Adamson <[email protected]>
[pnfsd: Release lock during layout export ops.]
Signed-off-by: Dean Hildebrand <[email protected]>
[cosmetic changes from pnfsd: Helper functions for layout stateid processing.]
[pnfsd: layout get all layout types]
[pnfsd: check ex_pnfs in nfsd4_verify_layout]
Signed-off-by: Andy Adamson <[email protected]>
[removed the nfsd4_pnfs_fl_layoutget stub]
[pnfsd: get rid of layout encoding function vector]
[pnfsd: filelayout: convert to using exp_xdr]
Signed-off-by: Benny Halevy <[email protected]>
[pnfsd: Move pnfsd code out of nfs4state.c/h]
Signed-off-by: Boaz Harrosh <[email protected]>
[fixed !CONFIG_PNFSD and clean up for pnfsd-files]
[gfs2: set pnfs_dlm_export_ops only for CONFIG_PNFSD]
[moved pnfsd defs back into state.h]
[pnfsd: rename deviceid_t struct pnfs_deviceid]
[pnfsd: fix cosmetic checkpatch warnings]
[pnfsd: handle s_pnfs_op==NULL]
[pnfsd: move layoutget xdr structure to xdr4.h]
[pnfsd: clean up layoutget export API]
[pnfsd: moved find_alloc_file to nfs4state.c]
[moved struct nfs4_fsid to public include/linux/nfs4.h]
[pnfsd: rename device fsid member to sbid]
[pnfsd: use sbid hash table to map super_blocks to devid major identifiers]
Signed-off-by: Benny Halevy <[email protected]>
[pnfsd: fix file system API layout_get error codes]
[pnfsd: fix NFS4ERR_BADIOMODE in layoutget]
Signed-off-by: Andy Adamson <[email protected]>
Signed-off-by: Benny Halevy <[email protected]>
[pnfsd: require filesystem layout_get method return a u32 rather than int]
[pnfsd: allow filesystem to return canonical nfs4 errors for layoutget]
[pnfsd: do not allow filesystem to return encoded nfs errors on layout_get]
[pnfsd: fixup nfs4_pnfs_get_layout to use __be32 nfserr]
[pnfsd: allow filesystem to return NFS4ERR_WRONG_TYPE for layout_get]
[pnfsd: fix error handling in layout_get]
[pnfsd: fix uninitialized usage of nfserr in nfs4_pnfs_get_layout]
Signed-off-by: Benny Halevy <[email protected]>
[pnfsd: handle LAYOUTGET with maxcount >= 2^31]
[pnfsd: verify minlength and range as per RFC5661]
[pnfsd: use nfsd_net for layoutget starting v3.8]
[pnfsd: merge_layout needs to acquire the layout_lock for traversing fi_layouts]
[pnfsd: return bool from merge_layout and fix not found path]
Signed-off-by: Benny Halevy <[email protected]>
[pnfsd: nfsd4_pnfs_dlm_layoutget]
Signed-off-by: Andy Adamson <[email protected]>
[pnfsd: layout state: hang layouts on layout state]
[pnfsd: do not release the state lock around call to fs layout_get]
Signed-off-by: Benny Halevy <[email protected]>
---
fs/nfsd/export.c | 3 +-
fs/nfsd/nfs4pnfsd.c | 169 ++++++++++++++++++++++++++++++++++++++++
fs/nfsd/nfs4proc.c | 50 ++++++++++++
fs/nfsd/nfs4state.c | 51 ++++++------
fs/nfsd/nfs4xdr.c | 109 +++++++++++++++++++++++++-
fs/nfsd/pnfsd.h | 8 ++
fs/nfsd/state.h | 33 ++++++++
fs/nfsd/xdr4.h | 11 +++
include/linux/exportfs.h | 3 +-
include/linux/nfs4.h | 5 ++
include/linux/nfsd/nfsd4_pnfs.h | 67 ++++++++++++++++
11 files changed, 479 insertions(+), 30 deletions(-)
diff --git a/fs/nfsd/export.c b/fs/nfsd/export.c
index 462f0df..043c8e2 100644
--- a/fs/nfsd/export.c
+++ b/fs/nfsd/export.c
@@ -378,7 +378,8 @@ static int check_export(struct inode *inode, int *flags, unsigned char *uuid)
if (inode->i_sb->s_pnfs_op &&
(!inode->i_sb->s_pnfs_op->layout_type ||
- !inode->i_sb->s_pnfs_op->get_device_info)) {
+ !inode->i_sb->s_pnfs_op->get_device_info ||
+ !inode->i_sb->s_pnfs_op->layout_get)) {
dprintk("exp_export: export of invalid fs pnfs export ops.\n");
return -EINVAL;
}
diff --git a/fs/nfsd/nfs4pnfsd.c b/fs/nfsd/nfs4pnfsd.c
index d219e42..b8ddd82 100644
--- a/fs/nfsd/nfs4pnfsd.c
+++ b/fs/nfsd/nfs4pnfsd.c
@@ -22,11 +22,24 @@
*****************************************************************************/
#include "pnfsd.h"
+#include "netns.h"
#define NFSDDBG_FACILITY NFSDDBG_PNFS
+/*
+ * w.r.t layout lists and recalls, layout_lock protects readers from a writer
+ * All modifications to per-file layout state / layout lists are done under the file_lo_lock
+ * The only writer-exclusion done with layout_lock is for the sbid table
+ */
static DEFINE_SPINLOCK(layout_lock);
+#define ASSERT_LAYOUT_LOCKED() assert_spin_locked(&layout_lock);
+
+/*
+ * Layout state - NFSv4.1 pNFS
+ */
+static struct kmem_cache *pnfs_layout_slab;
+
/* hash table for nfsd4_pnfs_deviceid.sbid */
#define SBID_HASH_BITS 8
#define SBID_HASH_SIZE (1 << SBID_HASH_BITS)
@@ -68,6 +81,8 @@ struct sbid_tracker {
int i;
struct sbid_tracker *sbid;
+ nfsd4_free_slab(&pnfs_layout_slab);
+
for (i = 0; i < SBID_HASH_SIZE; i++) {
while (!list_empty(&sbid_hashtbl[i])) {
sbid = list_first_entry(&sbid_hashtbl[i],
@@ -83,12 +98,39 @@ struct sbid_tracker {
{
int i;
+ pnfs_layout_slab = kmem_cache_create("pnfs_layouts",
+ sizeof(struct nfs4_layout), 0, 0, NULL);
+ if (pnfs_layout_slab == NULL)
+ return -ENOMEM;
+
for (i = 0; i < SBID_HASH_SIZE; i++)
INIT_LIST_HEAD(&sbid_hashtbl[i]);
return 0;
}
+static struct nfs4_layout *
+alloc_layout(void)
+{
+ return kmem_cache_alloc(pnfs_layout_slab, GFP_KERNEL);
+}
+
+static void
+free_layout(struct nfs4_layout *lp)
+{
+ kmem_cache_free(pnfs_layout_slab, lp);
+}
+
+static void
+init_layout(struct nfs4_layout *lp,
+ struct nfsd4_layout_seg *seg)
+{
+ dprintk("pNFS %s: lp %p\n", __func__, lp);
+
+ memcpy(&lp->lo_seg, seg, sizeof(lp->lo_seg));
+ dprintk("pNFS %s end\n", __func__);
+}
+
static u64
alloc_init_sbid(struct super_block *sb)
{
@@ -165,3 +207,130 @@ struct super_block *
return id;
}
+
+__be32
+nfs4_pnfs_get_layout(struct svc_rqst *rqstp,
+ struct nfsd4_pnfs_layoutget *lgp,
+ struct exp_xdr_stream *xdr)
+{
+ u32 status;
+ __be32 nfserr;
+ struct inode *ino = lgp->lg_fhp->fh_dentry->d_inode;
+ struct super_block *sb = ino->i_sb;
+ struct nfs4_file *fp;
+ struct nfs4_client *clp;
+ struct nfs4_layout *lp = NULL;
+ struct nfsd4_pnfs_layoutget_arg args = {
+ .lg_minlength = lgp->lg_minlength,
+ .lg_fh = &lgp->lg_fhp->fh_handle,
+ };
+ struct nfsd4_pnfs_layoutget_res res = {
+ .lg_seg = lgp->lg_seg,
+ };
+
+ dprintk("NFSD: %s Begin\n", __func__);
+
+ /* verify minlength and range as per RFC5661:
+ * o If loga_length is less than loga_minlength,
+ * the metadata server MUST return NFS4ERR_INVAL.
+ * o If the sum of loga_offset and loga_minlength exceeds
+ * NFS4_UINT64_MAX, and loga_minlength is not
+ * NFS4_UINT64_MAX, the error NFS4ERR_INVAL MUST result.
+ * o If the sum of loga_offset and loga_length exceeds
+ * NFS4_UINT64_MAX, and loga_length is not NFS4_UINT64_MAX,
+ * the error NFS4ERR_INVAL MUST result.
+ */
+ if ((lgp->lg_seg.length < lgp->lg_minlength) ||
+ (lgp->lg_minlength != NFS4_MAX_UINT64 &&
+ lgp->lg_minlength > NFS4_MAX_UINT64 - lgp->lg_seg.offset) ||
+ (lgp->lg_seg.length != NFS4_MAX_UINT64 &&
+ lgp->lg_seg.length > NFS4_MAX_UINT64 - lgp->lg_seg.offset)) {
+ nfserr = nfserr_inval;
+ goto out;
+ }
+
+ args.lg_sbid = find_create_sbid(sb);
+ if (!args.lg_sbid) {
+ nfserr = nfserr_layouttrylater;
+ goto out;
+ }
+
+ nfs4_lock_state();
+ fp = find_alloc_file(ino, lgp->lg_fhp);
+ clp = find_confirmed_client((clientid_t *)&lgp->lg_seg.clientid, true,
+ net_generic(SVC_NET(rqstp), nfsd_net_id));
+ dprintk("pNFS %s: fp %p clp %p\n", __func__, fp, clp);
+ if (!fp || !clp) {
+ nfserr = nfserr_inval;
+ goto out_unlock;
+ }
+
+ lp = alloc_layout();
+ if (!lp) {
+ nfserr = nfserr_layouttrylater;
+ goto out_unlock;
+ }
+
+ dprintk("pNFS %s: pre-export type 0x%x maxcount %Zd "
+ "iomode %u offset %llu length %llu\n",
+ __func__, lgp->lg_seg.layout_type,
+ exp_xdr_qbytes(xdr->end - xdr->p),
+ lgp->lg_seg.iomode, lgp->lg_seg.offset, lgp->lg_seg.length);
+
+ status = sb->s_pnfs_op->layout_get(ino, xdr, &args, &res);
+
+ dprintk("pNFS %s: post-export status %u "
+ "iomode %u offset %llu length %llu\n",
+ __func__, status, res.lg_seg.iomode,
+ res.lg_seg.offset, res.lg_seg.length);
+
+ /*
+ * The allowable error codes for the layout_get pNFS export
+ * operations vector function (from the file system) can be
+ * expanded as needed to include other errors defined for
+ * the RFC 5561 LAYOUTGET operation.
+ */
+ switch (status) {
+ case 0:
+ nfserr = NFS4_OK;
+ break;
+ case NFS4ERR_ACCESS:
+ case NFS4ERR_BADIOMODE:
+ /* No support for LAYOUTIOMODE4_RW layouts */
+ case NFS4ERR_BADLAYOUT:
+ /* No layout matching loga_minlength rules */
+ case NFS4ERR_INVAL:
+ case NFS4ERR_IO:
+ case NFS4ERR_LAYOUTTRYLATER:
+ case NFS4ERR_LAYOUTUNAVAILABLE:
+ case NFS4ERR_LOCKED:
+ case NFS4ERR_NOSPC:
+ case NFS4ERR_RECALLCONFLICT:
+ case NFS4ERR_SERVERFAULT:
+ case NFS4ERR_TOOSMALL:
+ /* Requested layout too big for loga_maxcount */
+ case NFS4ERR_WRONG_TYPE:
+ /* Not a regular file */
+ nfserr = cpu_to_be32(status);
+ goto out_freelayout;
+ default:
+ BUG();
+ nfserr = nfserr_serverfault;
+ }
+
+ lgp->lg_seg = res.lg_seg;
+ lgp->lg_roc = res.lg_return_on_close;
+
+ init_layout(lp, &res.lg_seg);
+out_unlock:
+ nfs4_unlock_state();
+ if (fp)
+ put_nfs4_file(fp);
+out:
+ dprintk("pNFS %s: lp %p exit nfserr %u\n", __func__, lp,
+ be32_to_cpu(nfserr));
+ return nfserr;
+out_freelayout:
+ free_layout(lp);
+ goto out_unlock;
+}
diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
index 81d41a4..960d8ff 100644
--- a/fs/nfsd/nfs4proc.c
+++ b/fs/nfsd/nfs4proc.c
@@ -1228,6 +1228,52 @@ static int fill_in_write_vector(struct kvec *vec, struct nfsd4_write *write)
out:
return status;
}
+
+static __be32
+nfsd4_layoutget(struct svc_rqst *rqstp,
+ struct nfsd4_compound_state *cstate,
+ struct nfsd4_pnfs_layoutget *lgp)
+{
+ int status;
+ struct super_block *sb;
+ struct svc_fh *current_fh = &cstate->current_fh;
+ int accmode;
+
+ if (lgp->lg_seg.iomode == IOMODE_READ) {
+ accmode = NFSD_MAY_READ;
+ } else if (lgp->lg_seg.iomode == IOMODE_RW) {
+ accmode = NFSD_MAY_READ | NFSD_MAY_WRITE;
+ } else {
+ status = nfserr_badiomode;
+ dprintk("pNFS %s: invalid iomode %d\n", __func__,
+ lgp->lg_seg.iomode);
+ goto out;
+ }
+
+ status = fh_verify(rqstp, current_fh, 0, accmode);
+ if (status)
+ goto out;
+
+ status = nfserr_inval;
+ sb = current_fh->fh_dentry->d_inode->i_sb;
+ if (!sb)
+ goto out;
+
+ /* Ensure underlying file system supports pNFS and,
+ * if so, the requested layout type
+ */
+ status = nfsd4_layout_verify(sb, current_fh->fh_export,
+ lgp->lg_seg.layout_type);
+ if (status)
+ goto out;
+
+ /* Set up arguments so layout can be retrieved at encode time */
+ lgp->lg_fhp = current_fh;
+ copy_clientid((clientid_t *)&lgp->lg_seg.clientid, cstate->session);
+ status = nfs_ok;
+out:
+ return status;
+}
#endif /* CONFIG_PNFSD */
/*
@@ -1971,6 +2017,10 @@ static inline u32 nfsd4_create_session_rsize(struct svc_rqst *rqstp, struct nfsd
.op_flags = ALLOWED_WITHOUT_FH,
.op_name = "OP_GETDEVICEINFO",
},
+ [OP_LAYOUTGET] = {
+ .op_func = (nfsd4op_func)nfsd4_layoutget,
+ .op_name = "OP_LAYOUTGET",
+ },
#endif /* CONFIG_PNFSD */
};
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index f6022a6..a8a18d4 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -239,7 +239,7 @@ static void nfsd4_free_file(struct nfs4_file *f)
kmem_cache_free(file_slab, f);
}
-static inline void
+void
put_nfs4_file(struct nfs4_file *fi)
{
if (atomic_dec_and_lock(&fi->fi_ref, &recall_lock)) {
@@ -250,7 +250,7 @@ static void nfsd4_free_file(struct nfs4_file *f)
}
}
-static inline void
+void
get_nfs4_file(struct nfs4_file *fi)
{
atomic_inc(&fi->fi_ref);
@@ -1462,7 +1462,7 @@ static struct nfs4_client *create_client(struct xdr_netobj name,
return NULL;
}
-static struct nfs4_client *
+struct nfs4_client *
find_confirmed_client(clientid_t *clid, bool sessions, struct nfsd_net *nn)
{
struct list_head *tbl = nn->conf_id_hashtbl;
@@ -2490,7 +2490,8 @@ static struct nfs4_file *nfsd4_alloc_file(void)
}
/* OPEN Share state helper functions */
-static void nfsd4_init_file(struct nfs4_file *fp, struct inode *ino)
+static void nfsd4_init_file(struct nfs4_file *fp, struct inode *ino,
+ struct svc_fh *current_fh)
{
unsigned int hashval = file_hashval(ino);
@@ -2507,7 +2508,7 @@ static void nfsd4_init_file(struct nfs4_file *fp, struct inode *ino)
spin_unlock(&recall_lock);
}
-static void
+void
nfsd4_free_slab(struct kmem_cache **slab)
{
if (*slab == NULL)
@@ -2524,6 +2525,7 @@ static void nfsd4_init_file(struct nfs4_file *fp, struct inode *ino)
nfsd4_free_slab(&file_slab);
nfsd4_free_slab(&stateid_slab);
nfsd4_free_slab(&deleg_slab);
+ nfsd4_free_pnfs_slabs();
}
int
@@ -2549,6 +2551,8 @@ static void nfsd4_init_file(struct nfs4_file *fp, struct inode *ino)
sizeof(struct nfs4_delegation), 0, 0, NULL);
if (deleg_slab == NULL)
goto out_nomem;
+ if (nfsd4_init_pnfs_slabs())
+ goto out_nomem;
return 0;
out_nomem:
nfsd4_free_slabs();
@@ -2702,6 +2706,21 @@ static void init_open_stateid(struct nfs4_ol_stateid *stp, struct nfs4_file *fp,
return NULL;
}
+struct nfs4_file *
+find_alloc_file(struct inode *ino, struct svc_fh *current_fh)
+{
+ struct nfs4_file *fp;
+
+ fp = find_file(ino);
+ if (!fp) {
+ fp = nfsd4_alloc_file();
+ if (fp)
+ nfsd4_init_file(fp, ino, current_fh);
+ }
+
+ return fp;
+}
+
/*
* Called to check deny when READ with all zero stateid or
* WRITE with all zero or all one stateid
@@ -3244,7 +3263,7 @@ static void nfsd4_deleg_xgrade_none_ext(struct nfsd4_open *open,
status = nfserr_jukebox;
fp = open->op_file;
open->op_file = NULL;
- nfsd4_init_file(fp, ino);
+ nfsd4_init_file(fp, ino, current_fh);
}
/*
@@ -4082,26 +4101,6 @@ static void nfsd4_close_open_stateid(struct nfs4_ol_stateid *s)
#define LOCKOWNER_INO_HASH_MASK (LOCKOWNER_INO_HASH_SIZE - 1)
-static inline u64
-end_offset(u64 start, u64 len)
-{
- u64 end;
-
- end = start + len;
- return end >= start ? end: NFS4_MAX_UINT64;
-}
-
-/* last octet in a range */
-static inline u64
-last_byte_offset(u64 start, u64 len)
-{
- u64 end;
-
- WARN_ON_ONCE(!len);
- end = start + len;
- return end > start ? end - 1: NFS4_MAX_UINT64;
-}
-
static unsigned int lockowner_ino_hashval(struct inode *inode, u32 cl_id, struct xdr_netobj *ownername)
{
return (file_hashval(inode) + cl_id
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index ed86a2d..1cc19cd 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -1521,6 +1521,26 @@ static __be32 nfsd4_decode_reclaim_complete(struct nfsd4_compoundargs *argp, str
DECODE_TAIL;
}
+
+static __be32
+nfsd4_decode_layoutget(struct nfsd4_compoundargs *argp,
+ struct nfsd4_pnfs_layoutget *lgp)
+{
+ DECODE_HEAD;
+
+ READ_BUF(36);
+ READ32(lgp->lg_signal);
+ READ32(lgp->lg_seg.layout_type);
+ READ32(lgp->lg_seg.iomode);
+ READ64(lgp->lg_seg.offset);
+ READ64(lgp->lg_seg.length);
+ READ64(lgp->lg_minlength);
+ nfsd4_decode_stateid(argp, &lgp->lg_sid);
+ READ_BUF(4);
+ READ32(lgp->lg_maxcount);
+
+ DECODE_TAIL;
+}
#endif /* CONFIG_PNFSD */
static __be32
@@ -1628,7 +1648,7 @@ static __be32 nfsd4_decode_reclaim_complete(struct nfsd4_compoundargs *argp, str
[OP_GETDEVICEINFO] = (nfsd4_dec)nfsd4_decode_getdevinfo,
[OP_GETDEVICELIST] = (nfsd4_dec)nfsd4_decode_getdevlist,
[OP_LAYOUTCOMMIT] = (nfsd4_dec)nfsd4_decode_notsupp,
- [OP_LAYOUTGET] = (nfsd4_dec)nfsd4_decode_notsupp,
+ [OP_LAYOUTGET] = (nfsd4_dec)nfsd4_decode_layoutget,
[OP_LAYOUTRETURN] = (nfsd4_dec)nfsd4_decode_notsupp,
#else /* CONFIG_PNFSD */
[OP_GETDEVICEINFO] = (nfsd4_dec)nfsd4_decode_notsupp,
@@ -3767,6 +3787,91 @@ static __be32 nfsd4_encode_bind_conn_to_session(struct nfsd4_compoundres *resp,
ADJUST_ARGS();
goto out;
}
+
+static __be32
+nfsd4_encode_layoutget(struct nfsd4_compoundres *resp,
+ __be32 nfserr,
+ struct nfsd4_pnfs_layoutget *lgp)
+{
+ u32 maxcount, leadcount;
+ struct super_block *sb;
+ struct exp_xdr_stream xdr;
+ __be32 *p, *p_save, *p_start = resp->p;
+
+ dprintk("%s: err %d\n", __func__, nfserr);
+ if (nfserr)
+ return nfserr;
+
+ sb = lgp->lg_fhp->fh_dentry->d_inode->i_sb;
+ maxcount = PAGE_SIZE;
+ if (maxcount > lgp->lg_maxcount)
+ maxcount = lgp->lg_maxcount;
+
+ /* Check for space on xdr stream */
+ leadcount = 36 + sizeof(stateid_opaque_t);
+ RESERVE_SPACE(leadcount);
+ /* encode layout metadata after file system encodes layout */
+ p += XDR_QUADLEN(leadcount);
+ ADJUST_ARGS();
+
+ /* Ensure have room for ret_on_close, off, len, iomode, type */
+ if (maxcount < leadcount) {
+ dprintk("%s: buffer too small for response header (%u < %u)\n",
+ __func__, maxcount, leadcount);
+ nfserr = nfserr_toosmall;
+ goto err;
+ }
+ maxcount -= leadcount;
+
+ /* Set xdr info so file system can encode layout */
+ xdr.p = p_save = resp->p;
+ xdr.end = resp->end;
+ if (xdr.end - xdr.p > exp_xdr_qwords(maxcount & ~3))
+ xdr.end = xdr.p + exp_xdr_qwords(maxcount & ~3);
+
+ /* Retrieve, encode, and merge layout */
+ nfserr = nfs4_pnfs_get_layout(resp->rqstp, lgp, &xdr);
+ if (nfserr)
+ goto err;
+
+ /* Ensure file system returned enough bytes for the client
+ * to access.
+ */
+ if (lgp->lg_seg.length < lgp->lg_minlength) {
+ nfserr = nfserr_badlayout;
+ goto err;
+ }
+
+ /* The file system should never write 0 bytes without
+ * returning an error
+ */
+ BUG_ON(xdr.p == p_save);
+
+ /* Rewind to beginning and encode attrs */
+ resp->p = p_start;
+ RESERVE_SPACE(4);
+ WRITE32(lgp->lg_roc); /* return on close */
+ ADJUST_ARGS();
+ nfsd4_encode_stateid(resp, &lgp->lg_sid);
+ RESERVE_SPACE(28);
+ /* Note: response logr_layout array count, always one for now */
+ WRITE32(1);
+ WRITE64(lgp->lg_seg.offset);
+ WRITE64(lgp->lg_seg.length);
+ WRITE32(lgp->lg_seg.iomode);
+ WRITE32(lgp->lg_seg.layout_type);
+
+ /* Update the xdr stream with the number of bytes written
+ * by the file system
+ */
+ p = xdr.p;
+ ADJUST_ARGS();
+
+ return nfs_ok;
+err:
+ resp->p = p_start;
+ return nfserr;
+}
#endif /* CONFIG_PNFSD */
static __be32
@@ -3833,7 +3938,7 @@ static __be32 nfsd4_encode_bind_conn_to_session(struct nfsd4_compoundres *resp,
[OP_GETDEVICEINFO] = (nfsd4_enc)nfsd4_encode_getdevinfo,
[OP_GETDEVICELIST] = (nfsd4_enc)nfsd4_encode_getdevlist,
[OP_LAYOUTCOMMIT] = (nfsd4_enc)nfsd4_encode_noop,
- [OP_LAYOUTGET] = (nfsd4_enc)nfsd4_encode_noop,
+ [OP_LAYOUTGET] = (nfsd4_enc)nfsd4_encode_layoutget,
[OP_LAYOUTRETURN] = (nfsd4_enc)nfsd4_encode_noop,
#else /* CONFIG_PNFSD */
[OP_GETDEVICEINFO] = (nfsd4_enc)nfsd4_encode_noop,
diff --git a/fs/nfsd/pnfsd.h b/fs/nfsd/pnfsd.h
index cfcfc9a..6920e43 100644
--- a/fs/nfsd/pnfsd.h
+++ b/fs/nfsd/pnfsd.h
@@ -34,11 +34,19 @@
#ifndef LINUX_NFSD_PNFSD_H
#define LINUX_NFSD_PNFSD_H
+#include <linux/list.h>
#include <linux/nfsd/nfsd4_pnfs.h>
+#include "state.h"
#include "xdr4.h"
+/* outstanding layout */
+struct nfs4_layout {
+ struct nfsd4_layout_seg lo_seg;
+};
+
u64 find_create_sbid(struct super_block *);
struct super_block *find_sbid_id(u64);
+__be32 nfs4_pnfs_get_layout(struct svc_rqst *, struct nfsd4_pnfs_layoutget *, struct exp_xdr_stream *);
#endif /* LINUX_NFSD_PNFSD_H */
diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h
index 2e601a2..b85ad60 100644
--- a/fs/nfsd/state.h
+++ b/fs/nfsd/state.h
@@ -481,6 +481,39 @@ extern struct nfs4_client_reclaim *nfs4_client_to_reclaim(const char *name,
struct nfsd_net *nn);
extern bool nfs4_has_reclaimed_state(const char *name, struct nfsd_net *nn);
extern void put_client_renew(struct nfs4_client *clp);
+extern void nfsd4_free_slab(struct kmem_cache **);
+extern struct nfs4_file *find_alloc_file(struct inode *, struct svc_fh *);
+extern void put_nfs4_file(struct nfs4_file *);
+extern void get_nfs4_file(struct nfs4_file *);
+extern struct nfs4_client *find_confirmed_client(clientid_t *, bool sessions, struct nfsd_net *);
+
+#if defined(CONFIG_PNFSD)
+extern int nfsd4_init_pnfs_slabs(void);
+extern void nfsd4_free_pnfs_slabs(void);
+#else /* CONFIG_PNFSD */
+static inline void nfsd4_free_pnfs_slabs(void) {}
+static inline int nfsd4_init_pnfs_slabs(void) { return 0; }
+#endif /* CONFIG_PNFSD */
+
+static inline u64
+end_offset(u64 start, u64 len)
+{
+ u64 end;
+
+ end = start + len;
+ return end >= start ? end: NFS4_MAX_UINT64;
+}
+
+/* last octet in a range */
+static inline u64
+last_byte_offset(u64 start, u64 len)
+{
+ u64 end;
+
+ WARN_ON_ONCE(!len);
+ end = start + len;
+ return end > start ? end - 1: NFS4_MAX_UINT64;
+}
/* nfs4recover operations */
extern int nfsd4_client_tracking_init(struct net *net);
diff --git a/fs/nfsd/xdr4.h b/fs/nfsd/xdr4.h
index faf37bc..727288b 100644
--- a/fs/nfsd/xdr4.h
+++ b/fs/nfsd/xdr4.h
@@ -448,6 +448,16 @@ struct nfsd4_pnfs_getdevlist {
u32 gd_eof; /* response */
};
+struct nfsd4_pnfs_layoutget {
+ u64 lg_minlength; /* request */
+ u32 lg_signal; /* request */
+ u32 lg_maxcount; /* request */
+ struct svc_fh *lg_fhp; /* request */
+ stateid_t lg_sid; /* request/response */
+ struct nfsd4_layout_seg lg_seg; /* request/response */
+ u32 lg_roc; /* response */
+};
+
struct nfsd4_op {
int opnum;
__be32 status;
@@ -496,6 +506,7 @@ struct nfsd4_op {
#if defined(CONFIG_PNFSD)
struct nfsd4_pnfs_getdevlist pnfs_getdevlist;
struct nfsd4_pnfs_getdevinfo pnfs_getdevinfo;
+ struct nfsd4_pnfs_layoutget pnfs_layoutget;
#endif /* CONFIG_PNFSD */
} u;
struct nfs4_replay * replay;
diff --git a/include/linux/exportfs.h b/include/linux/exportfs.h
index ade74e1..017f1753 100644
--- a/include/linux/exportfs.h
+++ b/include/linux/exportfs.h
@@ -218,6 +218,7 @@ extern struct dentry *generic_fh_to_parent(struct super_block *sb,
extern int filelayout_encode_devinfo(struct exp_xdr_stream *xdr,
const struct pnfs_filelayout_device *fdev);
-
+extern int filelayout_encode_layout(struct exp_xdr_stream *xdr,
+ const struct pnfs_filelayout_layout *flp);
#endif /* defined(CONFIG_EXPORTFS_FILE_LAYOUT) */
#endif /* LINUX_EXPORTFS_H */
diff --git a/include/linux/nfs4.h b/include/linux/nfs4.h
index e36dee5..2c3aa9f 100644
--- a/include/linux/nfs4.h
+++ b/include/linux/nfs4.h
@@ -32,6 +32,11 @@ struct nfs4_acl {
struct nfs4_ace aces[0];
};
+struct nfs4_fsid {
+ u64 major;
+ u64 minor;
+};
+
#define NFS4_MAXLABELLEN 2048
struct nfs4_label {
diff --git a/include/linux/nfsd/nfsd4_pnfs.h b/include/linux/nfsd/nfsd4_pnfs.h
index 28f9daa..a680085 100644
--- a/include/linux/nfsd/nfsd4_pnfs.h
+++ b/include/linux/nfsd/nfsd4_pnfs.h
@@ -56,6 +56,36 @@ struct nfsd4_pnfs_dev_iter_res {
u32 gd_eof; /* response */
};
+struct nfsd4_layout_seg {
+ u64 clientid;
+ u32 layout_type;
+ u32 iomode;
+ u64 offset;
+ u64 length;
+};
+
+/* Used by layout_get to encode layout (loc_body var in spec)
+ * Args:
+ * minlength - min number of accessible bytes given by layout
+ * fsid - Major part of struct pnfs_deviceid. File system uses this
+ * to build the deviceid returned in the layout.
+ * fh - fs can modify the file handle for use on data servers
+ * seg - layout info requested and layout info returned
+ * xdr - xdr info
+ * return_on_close - true if layout to be returned on file close
+ */
+
+struct nfsd4_pnfs_layoutget_arg {
+ u64 lg_minlength;
+ u64 lg_sbid;
+ const struct knfsd_fh *lg_fh;
+};
+
+struct nfsd4_pnfs_layoutget_res {
+ struct nfsd4_layout_seg lg_seg; /* request/resopnse */
+ u32 lg_return_on_close;
+};
+
/*
* pNFS export operations vector.
*
@@ -88,6 +118,43 @@ struct pnfs_export_operations {
int (*get_device_iter) (struct super_block *,
u32 layout_type,
struct nfsd4_pnfs_dev_iter_res *);
+
+ /* Retrieve and encode a layout for inode onto the xdr stream.
+ * arg->minlength is the minimum number of accessible bytes required
+ * by the client.
+ * The maximum number of bytes to encode the layout is given by
+ * the xdr stream end pointer.
+ * arg->fsid contains the major part of struct pnfs_deviceid.
+ * The file system uses this to build the deviceid returned
+ * in the layout.
+ * res->seg - layout segment requested and layout info returned.
+ * res->fh can be modified the file handle for use on data servers
+ * res->return_on_close - true if layout to be returned on file close
+ *
+ * return one of the following nfs errors:
+ * NFS_OK Success
+ * NFS4ERR_ACCESS Permission error
+ * NFS4ERR_BADIOMODE Server does not support requested iomode
+ * NFS4ERR_BADLAYOUT No layout matching loga_minlength rules
+ * NFS4ERR_INVAL Parameter other than layout is invalid
+ * NFS4ERR_IO I/O error
+ * NFS4ERR_LAYOUTTRYLATER Layout may be retrieved later
+ * NFS4ERR_LAYOUTUNAVAILABLE Layout unavailable for this file
+ * NFS4ERR_LOCKED Lock conflict
+ * NFS4ERR_NOSPC Out-of-space error occured
+ * NFS4ERR_RECALLCONFLICT Layout currently unavialable due to
+ * a conflicting CB_LAYOUTRECALL
+ * NFS4ERR_SERVERFAULT Server went bezerk
+ * NFS4ERR_TOOSMALL loga_maxcount too small to fit layout
+ * NFS4ERR_WRONG_TYPE Wrong file type (not a regular file)
+ */
+ enum nfsstat4 (*layout_get) (struct inode *,
+ struct exp_xdr_stream *xdr,
+ const struct nfsd4_pnfs_layoutget_arg *,
+ struct nfsd4_pnfs_layoutget_res *);
+
+ /* Can layout segments be merged for this layout type? */
+ int (*can_merge_layouts) (u32 layout_type);
};
#endif /* _LINUX_NFSD_NFSD4_PNFS_H */
--
1.8.3.1
On 10/03/2013 08:29 AM, Benny Halevy wrote:
> On 2013-10-03 12:55, Christoph Hellwig wrote:
>> >On Thu, Oct 03, 2013 at 09:02:27AM +0300, Benny Halevy wrote:
>>> >>Just that this is dlm specific logic.
>>> >>For example, using dlm_ino_hash() in nfsd4_pnfs_dlm_layoutget().
>>> >>Or even knowing that
>>> >> layout->lg_stripe_type = STRIPE_SPARSE;
>>> >>assumes knowledge of the underlying cluster fs implementation.
>> >
>> >Which in-tree or soon in-tree filesystem do you care about? And why
>> >don't we see pnfs support for it submitted instead of the fairly useless
>> >gfs2 support?
> I picked gfs2 as the initial use case for simplicity and ease of review.
> If there is a rough consensus that it's useless and not worthy of inclusion
> then the one we care about the most is exofs that has a more complete pnfs
> implementation.
>
> Benny
>
I don't see having GFS2 supported as a base for pNFS as useless. Christoph, is
this a concern about GFS2 being too complicated for normal deployment or a lack
in the pNFS support on top of it?
thanks!
Ric
On 2013-10-13 14:08, Christoph Hellwig wrote:
> On Sun, Oct 13, 2013 at 09:11:40AM +0300, Benny Halevy wrote:
>> On 2013-10-11 22:56, Christoph Hellwig wrote:
>>> On Thu, Sep 26, 2013 at 02:40:31PM -0400, Benny Halevy wrote:
>>>> From: Benny Halevy <[email protected]>
>>>
>>> This is entirely unessecary. Just make sure all layouts set the sbid
>>> field in getdevicelist to the s_dev value of the filesystem and you
>>> can just use user_get_super.
>>
>> That's true for filesystmes that have a meaningful s_dev, unlike exofs,
>> but this functionality can be added later respectively.
>
> Even exofs has a s_dev from the anon dev_T allocator. Given that
> layouts aren't supposed to surived over reboots of the server I don't
> see a problem with it.
As far as I can see, exofs doesn't actually doing that.
Currently exofs_fill_super sets s_dev = 0;
Boaz, did I miss anything?
Benny
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
On Sun, Sep 29, 2013 at 02:16:40PM +0300, Benny Halevy wrote:
> On 2013-09-27 17:44, J. Bruce Fields wrote:
> > On Thu, Sep 26, 2013 at 02:40:24PM -0400, Benny Halevy wrote:
> >> return 0;
> >>
> >> }
> >> diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
> >> index 419572f..576b635 100644
> >> --- a/fs/nfsd/nfs4proc.c
> >> +++ b/fs/nfsd/nfs4proc.c
> >> @@ -41,6 +41,7 @@
> >> #include "vfs.h"
> >> #include "current_stateid.h"
> >> #include "netns.h"
> >> +#include "pnfsd.h"
> >>
> >> #ifdef CONFIG_NFSD_V4_SECURITY_LABEL
> >> #include <linux/security.h>
> >> @@ -1109,6 +1110,44 @@ static int fill_in_write_vector(struct kvec *vec, struct nfsd4_write *write)
> >> return status == nfserr_same ? nfs_ok : status;
> >> }
> >>
> >> +#if defined(CONFIG_PNFSD)
> >> +static __be32
> >> +nfsd4_layout_verify(struct super_block *sb, struct svc_export *exp,
> >> + unsigned int layout_type)
> >> +{
> >> + int status, type;
> >> +
> >> + /* check to see if pNFS is supported. */
> >> + status = nfserr_layoutunavailable;
> >> + if (exp && exp->ex_pnfs == 0) {
> >
> > Can this really be called with exp == NULL? If so don't you want to
> > fail that as well?
>
> It is called with exp == NULL from nfsd4_getdevinfo where it shouldn't
> cause an error return.
Looking through the sbid code, this is making me uncomfortable.
Among other things this allows you to perform getdeviceinfo even against
filesystems that aren't exported. Maybe that's not awful, but.
Also the sbid hash holds all these pointers to superblocks, but does it
take a reference to them anywhere? I can't see it, in which case
there's a crash waiting to happen here if one of them is unmounted.
I'm inclined to think these should be referencing exports, not
superblocks.
--b.
On 2013-10-11 22:37, Christoph Hellwig wrote:
> On Thu, Sep 26, 2013 at 02:41:16PM -0400, Benny Halevy wrote:
>> idr_remove is about to be called before kmem_cache_free so unhashing it
>> is redundant
>>
>> Signed-off-by: Benny Halevy <[email protected]>
>
> This is probably something thast should go straight to Bruce, same for
> the next one in the series.
>
True.
Bruce, are you ok with merging these patches for 3.13?
Benny
On Thu, Sep 26, 2013 at 02:40:31PM -0400, Benny Halevy wrote:
> From: Benny Halevy <[email protected]>
This is entirely unessecary. Just make sure all layouts set the sbid
field in getdevicelist to the s_dev value of the filesystem and you
can just use user_get_super.
On Thu, Oct 03, 2013 at 03:29:06PM +0300, Benny Halevy wrote:
> I picked gfs2 as the initial use case for simplicity and ease of review.
> If there is a rough consensus that it's useless and not worthy of inclusion
> then the one we care about the most is exofs that has a more complete pnfs
> implementation.
This was in reference to file layout implementation details, so exofs
isn't a contender there.
As far as exofs is concerned a pnfs implementation based on it has just
as much toy status as the current gfs2 one. While the pnfs side of it
might as well be a lot better, a filesystem that lacks all the integrity
and scalability features developed in the last 30 years can't be
considered more than a proof of concept.
On Wed, Oct 02, 2013 at 05:27:15PM +0300, Benny Halevy wrote:
> On 2013-09-29 15:16, Christoph Hellwig wrote:
> > The actual XDR encoding doesn't have business being under fs/exportfs
> > and should be in the NFSD code itself.
>
> But then it will create a module dependency we want to avoid.
Not for the current patchset where the only consumer is nfsd.ko, and not
for any method scheme that isn't braindead.
Admittedly, I'm not sure. I'm convinced it will be a big help to have pre-integrated in
Ganesha, I was only speculating wrt Linux/knfsd.
Matt
----- "Christoph Hellwig" <[email protected]> wrote:
> On Thu, Oct 03, 2013 at 08:58:08AM -0400, Matt W. Benjamin wrote:
> > Hi,
> >
> > I think saying exofs is a proof of concept/toy is missing the
> point.
> > Exofs is an implementation baseline that provides insight into the
> > scalability/performance values that a pnfs implementation can
> achieve,
> > and potentially how to achieve them.
>
> Speaking like a true diplomat..
>
> What amount of that data are we going to get by merging an exofs
> based
> pnfs server that we haven't been able to gather with it out of tree
> for
> the last 6 years? How is merging it and complicating the nfs servers
> for it going to provide a benefit outside of the small group of about
> a
> dozend people that actively care about the T10 OSD support in Linux?
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs"
> in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
Matt Benjamin
The Linux Box
206 South Fifth Ave. Suite 150
Ann Arbor, MI 48104
http://linuxbox.com
tel. 734-761-4689
fax. 734-769-8938
cel. 734-216-5309
On 2013-10-03 16:18, Ric Wheeler wrote:
> On 10/03/2013 09:17 AM, Christoph Hellwig wrote:
>> On Thu, Oct 03, 2013 at 09:12:24AM -0400, Ric Wheeler wrote:
>>>>>> Which in-tree or soon in-tree filesystem do you care about? And why
>>>>>> don't we see pnfs support for it submitted instead of the fairly useless
>>>>>> gfs2 support?
>>>> I picked gfs2 as the initial use case for simplicity and ease of review.
>>>> If there is a rough consensus that it's useless and not worthy of inclusion
>>>> then the one we care about the most is exofs that has a more complete pnfs
>>>> implementation.
>>>>
>>>> Benny
>>>>
>>> I don't see having GFS2 supported as a base for pNFS as useless.
>>> Christoph, is this a concern about GFS2 being too complicated for
>>> normal deployment or a lack in the pNFS support on top of it?
>> Fairly useless was specific to the particular implementation:
>>
>> - which in the stipped down version here only supports DS access for
>> reads
>> - which in the previous version showed worse performance than always
>> going through the MDS
>>
>> I don't have a problem with using GFS2 by itself, but any implementation
>> proposed should actually show signifiant real life benefits before it
>> gets merged.
>>
The question is what is the minimum value for submitting upstream...
The thing pnfs over dlm/gfs2 is missing mostly is supporting read/write layout.
One could use them load balancing, e.g. by either redirecting to a node
holding an exclusive lock on the file, if there is one, or dlm_ino_hash in its absence.
Benny
>
> Makes sense, thanks!
>
> Ric
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
On 2013-10-03 12:55, Christoph Hellwig wrote:
> On Thu, Oct 03, 2013 at 09:02:27AM +0300, Benny Halevy wrote:
>> Just that this is dlm specific logic.
>> For example, using dlm_ino_hash() in nfsd4_pnfs_dlm_layoutget().
>> Or even knowing that
>> layout->lg_stripe_type = STRIPE_SPARSE;
>> assumes knowledge of the underlying cluster fs implementation.
>
> Which in-tree or soon in-tree filesystem do you care about? And why
> don't we see pnfs support for it submitted instead of the fairly useless
> gfs2 support?
I picked gfs2 as the initial use case for simplicity and ease of review.
If there is a rough consensus that it's useless and not worthy of inclusion
then the one we care about the most is exofs that has a more complete pnfs
implementation.
Benny
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
On 2013-10-10 01:41, Andrew Morton wrote:
> On Wed, 02 Oct 2013 17:36:29 +0300 Benny Halevy <[email protected]> wrote:
>
>> From: Benny Halevy <[email protected]>
>>
>> get_cached_acl is defined as inline in posix_acl.h
>> requiring the full definition of struct inode as it
>> dereferences its struct inode * parameter.
>
> That's very old code so you must have a peculiar config. Please
> describe the circumstances under which this occurs, because I'd like to
> avoid merging this patch.
>
Wow, sorry, you're right. It originated in 2.6.33 as far as I can see
and it is no longer needed.
>> --- a/include/linux/posix_acl.h
>> +++ b/include/linux/posix_acl.h
>> @@ -9,6 +9,7 @@
>> #define __LINUX_POSIX_ACL_H
>>
>> #include <linux/bug.h>
>> +#include <linux/fs.h>
>> #include <linux/slab.h>
>> #include <linux/rcupdate.h>
>
> A better fix is to undo all that crazy inlining in posix_acl.h.
>
>
> From: Andrew Morton <[email protected]>
> Subject: posix_acl: uninlining
>
> Uninline vast tracts of nested inline functions in
> include/linux/posix_acl.h.
>
> This reduces the text+data+bss size of x86_64 allyesconfig vmlinux by 8026
> bytes.
>
> Also fixes an obscure build error reported by Benny: get_cached_acl()
> needs fs.h for struct inode internals.
Sorry for the stale report, I have no problem with 3.12-rc3.
>
> The patch also regularises the positioning of the EXPORT_SYMBOLs in
> posix_acl.c.
>
> Reported-by:: Benny Halevy <[email protected]>
ditto.
> Cc: Alexander Viro <[email protected]>
> Cc: J. Bruce Fields <[email protected]>
> Cc: Trond Myklebust <[email protected]>
> Cc: Benny Halevy <[email protected]>
> Cc: Andreas Gruenbacher <[email protected]>
> Signed-off-by: Andrew Morton <[email protected]>
FWIW, I tested my linux-pnfs 3.12-rc3 tree with this patch
and it builds and causes no regression with the connectathon test suite over pnfs.
Benny
> ---
>
> fs/posix_acl.c | 84 +++++++++++++++++++++++++++++++++---
> include/linux/posix_acl.h | 78 ++-------------------------------
> 2 files changed, 85 insertions(+), 77 deletions(-)
>
> diff -puN fs/posix_acl.c~posix_acl-uninlining fs/posix_acl.c
> --- a/fs/posix_acl.c~posix_acl-uninlining
> +++ a/fs/posix_acl.c
> @@ -22,11 +22,80 @@
>
> #include <linux/errno.h>
>
> -EXPORT_SYMBOL(posix_acl_init);
> -EXPORT_SYMBOL(posix_acl_alloc);
> -EXPORT_SYMBOL(posix_acl_valid);
> -EXPORT_SYMBOL(posix_acl_equiv_mode);
> -EXPORT_SYMBOL(posix_acl_from_mode);
> +struct posix_acl **acl_by_type(struct inode *inode, int type)
> +{
> + switch (type) {
> + case ACL_TYPE_ACCESS:
> + return &inode->i_acl;
> + case ACL_TYPE_DEFAULT:
> + return &inode->i_default_acl;
> + default:
> + BUG();
> + }
> +}
> +EXPORT_SYMBOL(acl_by_type);
> +
> +struct posix_acl *get_cached_acl(struct inode *inode, int type)
> +{
> + struct posix_acl **p = acl_by_type(inode, type);
> + struct posix_acl *acl = ACCESS_ONCE(*p);
> + if (acl) {
> + spin_lock(&inode->i_lock);
> + acl = *p;
> + if (acl != ACL_NOT_CACHED)
> + acl = posix_acl_dup(acl);
> + spin_unlock(&inode->i_lock);
> + }
> + return acl;
> +}
> +EXPORT_SYMBOL(get_cached_acl);
> +
> +struct posix_acl *get_cached_acl_rcu(struct inode *inode, int type)
> +{
> + return rcu_dereference(*acl_by_type(inode, type));
> +}
> +EXPORT_SYMBOL(get_cached_acl_rcu);
> +
> +void set_cached_acl(struct inode *inode, int type, struct posix_acl *acl)
> +{
> + struct posix_acl **p = acl_by_type(inode, type);
> + struct posix_acl *old;
> + spin_lock(&inode->i_lock);
> + old = *p;
> + rcu_assign_pointer(*p, posix_acl_dup(acl));
> + spin_unlock(&inode->i_lock);
> + if (old != ACL_NOT_CACHED)
> + posix_acl_release(old);
> +}
> +EXPORT_SYMBOL(set_cached_acl);
> +
> +void forget_cached_acl(struct inode *inode, int type)
> +{
> + struct posix_acl **p = acl_by_type(inode, type);
> + struct posix_acl *old;
> + spin_lock(&inode->i_lock);
> + old = *p;
> + *p = ACL_NOT_CACHED;
> + spin_unlock(&inode->i_lock);
> + if (old != ACL_NOT_CACHED)
> + posix_acl_release(old);
> +}
> +EXPORT_SYMBOL(forget_cached_acl);
> +
> +void forget_all_cached_acls(struct inode *inode)
> +{
> + struct posix_acl *old_access, *old_default;
> + spin_lock(&inode->i_lock);
> + old_access = inode->i_acl;
> + old_default = inode->i_default_acl;
> + inode->i_acl = inode->i_default_acl = ACL_NOT_CACHED;
> + spin_unlock(&inode->i_lock);
> + if (old_access != ACL_NOT_CACHED)
> + posix_acl_release(old_access);
> + if (old_default != ACL_NOT_CACHED)
> + posix_acl_release(old_default);
> +}
> +EXPORT_SYMBOL(forget_all_cached_acls);
>
> /*
> * Init a fresh posix_acl
> @@ -37,6 +106,7 @@ posix_acl_init(struct posix_acl *acl, in
> atomic_set(&acl->a_refcount, 1);
> acl->a_count = count;
> }
> +EXPORT_SYMBOL(posix_acl_init);
>
> /*
> * Allocate a new ACL with the specified number of entries.
> @@ -51,6 +121,7 @@ posix_acl_alloc(int count, gfp_t flags)
> posix_acl_init(acl, count);
> return acl;
> }
> +EXPORT_SYMBOL(posix_acl_alloc);
>
> /*
> * Clone an ACL.
> @@ -146,6 +217,7 @@ posix_acl_valid(const struct posix_acl *
> return 0;
> return -EINVAL;
> }
> +EXPORT_SYMBOL(posix_acl_valid);
>
> /*
> * Returns 0 if the acl can be exactly represented in the traditional
> @@ -186,6 +258,7 @@ posix_acl_equiv_mode(const struct posix_
> *mode_p = (*mode_p & ~S_IRWXUGO) | mode;
> return not_equiv;
> }
> +EXPORT_SYMBOL(posix_acl_equiv_mode);
>
> /*
> * Create an ACL representing the file mode permission bits of an inode.
> @@ -207,6 +280,7 @@ posix_acl_from_mode(umode_t mode, gfp_t
> acl->a_entries[2].e_perm = (mode & S_IRWXO);
> return acl;
> }
> +EXPORT_SYMBOL(posix_acl_from_mode);
>
> /*
> * Return 0 if current is granted want access to the inode
> diff -puN include/linux/posix_acl.h~posix_acl-uninlining include/linux/posix_acl.h
> --- a/include/linux/posix_acl.h~posix_acl-uninlining
> +++ a/include/linux/posix_acl.h
> @@ -94,78 +94,12 @@ extern int posix_acl_chmod(struct posix_
> extern struct posix_acl *get_posix_acl(struct inode *, int);
> extern int set_posix_acl(struct inode *, int, struct posix_acl *);
>
> -#ifdef CONFIG_FS_POSIX_ACL
> -static inline struct posix_acl **acl_by_type(struct inode *inode, int type)
> -{
> - switch (type) {
> - case ACL_TYPE_ACCESS:
> - return &inode->i_acl;
> - case ACL_TYPE_DEFAULT:
> - return &inode->i_default_acl;
> - default:
> - BUG();
> - }
> -}
> -
> -static inline struct posix_acl *get_cached_acl(struct inode *inode, int type)
> -{
> - struct posix_acl **p = acl_by_type(inode, type);
> - struct posix_acl *acl = ACCESS_ONCE(*p);
> - if (acl) {
> - spin_lock(&inode->i_lock);
> - acl = *p;
> - if (acl != ACL_NOT_CACHED)
> - acl = posix_acl_dup(acl);
> - spin_unlock(&inode->i_lock);
> - }
> - return acl;
> -}
> -
> -static inline struct posix_acl *get_cached_acl_rcu(struct inode *inode, int type)
> -{
> - return rcu_dereference(*acl_by_type(inode, type));
> -}
> -
> -static inline void set_cached_acl(struct inode *inode,
> - int type,
> - struct posix_acl *acl)
> -{
> - struct posix_acl **p = acl_by_type(inode, type);
> - struct posix_acl *old;
> - spin_lock(&inode->i_lock);
> - old = *p;
> - rcu_assign_pointer(*p, posix_acl_dup(acl));
> - spin_unlock(&inode->i_lock);
> - if (old != ACL_NOT_CACHED)
> - posix_acl_release(old);
> -}
> -
> -static inline void forget_cached_acl(struct inode *inode, int type)
> -{
> - struct posix_acl **p = acl_by_type(inode, type);
> - struct posix_acl *old;
> - spin_lock(&inode->i_lock);
> - old = *p;
> - *p = ACL_NOT_CACHED;
> - spin_unlock(&inode->i_lock);
> - if (old != ACL_NOT_CACHED)
> - posix_acl_release(old);
> -}
> -
> -static inline void forget_all_cached_acls(struct inode *inode)
> -{
> - struct posix_acl *old_access, *old_default;
> - spin_lock(&inode->i_lock);
> - old_access = inode->i_acl;
> - old_default = inode->i_default_acl;
> - inode->i_acl = inode->i_default_acl = ACL_NOT_CACHED;
> - spin_unlock(&inode->i_lock);
> - if (old_access != ACL_NOT_CACHED)
> - posix_acl_release(old_access);
> - if (old_default != ACL_NOT_CACHED)
> - posix_acl_release(old_default);
> -}
> -#endif
> +struct posix_acl **acl_by_type(struct inode *inode, int type);
> +struct posix_acl *get_cached_acl(struct inode *inode, int type);
> +struct posix_acl *get_cached_acl_rcu(struct inode *inode, int type);
> +void set_cached_acl(struct inode *inode, int type, struct posix_acl *acl);
> +void forget_cached_acl(struct inode *inode, int type);
> +void forget_all_cached_acls(struct inode *inode);
>
> static inline void cache_no_acl(struct inode *inode)
> {
> _
>
On Tue, Oct 01, 2013 at 04:31:21PM +0300, Benny Halevy wrote:
> So the reason not to hold it is that the nfs state lock is global to the
> server and blocks all state modifying operations such as:
> open, close, lock, clientid, session operations, etc.
While not really related to this patch: what's the reason it's not
split? The way nfsd works there should be almost no state that isn't
per-export.
On 2013-10-01 16:37, Christoph Hellwig wrote:
> On Tue, Oct 01, 2013 at 04:31:21PM +0300, Benny Halevy wrote:
>> So the reason not to hold it is that the nfs state lock is global to the
>> server and blocks all state modifying operations such as:
>> open, close, lock, clientid, session operations, etc.
>
> While not really related to this patch: what's the reason it's not
> split? The way nfsd works there should be almost no state that isn't
> per-export.
I'll defer to Bruce. We did reduce the use of the state_lock in the past
but I think there is potential for further reducing what it covers.
Memory allocations for client, file, stateid etc. and can be optimized for
the common path by opportunistically allocating these after a quick
search (lockless of rcu_read is possible)
In the uncommon case insertion of the newly allocated stuff would fail and
they will be freed.
The candidates I can quickly see are:
- the idr calls manipulating and looking up the different stateid hash tables,
- clientid rb_trees, and
- file hash table
- currently protected with the recall_lock spin_lock but we can factor out,
I think, the call to nfsd4_alloc_file and combine find_file and nfsd4_init_file
for conditional insertion under the recall_lock.
Benny
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
On 2013-10-01 04:21, Boaz Harrosh wrote:
> On 09/29/2013 05:17 AM, Christoph Hellwig wrote:
>> Seems like layout_type should just be a field in the export ops instead
>> of a method.
>
> No! this field does not make any sense at all it should just be
> removed.
>
> There is a pNFS inquiry a client sends that asks for an array
> of all the types supported by this mount point. So these method
> should be returning an array. If at all.
We're not there yet.
The pnfs generic implementation and none of the filesystems do that
at the moment. When we go there we should revise the interface.
>
> The layout_type is just an input to some operations and are
> no concern of NFSD. This is not a yes/no flag for pNFS the
> opts vector should be the flag.
We can push the check for the layout type down to file system
but then it will have to encode FATTR4_WORD1_FS_LAYOUT_TYPES
and FATTR4_WORD2_LAYOUT_TYPES.
Benny
>
> Cheers
> Boaz
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
On 2013-10-02 01:14, J. Bruce Fields wrote:
> See previous comments. What guarantees these superblock pointers stay
> good as long as they're in the cache?
Currently, the dependency on nfsd.ko should hold them but that should go away.
Trying to think about referencing svc_export instead, we use find_sbid_id
to get to the superblock in nfsd4_getdevinfo since we have no current fh.
And we need the superblock to call into the fs sb->s_pnfs_op->get_device_info
later in nfsd4_encode_getdevinfo.
Just to make sure, we can safely get to the sb via exp->ex_path.dentry->d_inode->i_sb
right?
Benny
>
> --b.
>
> On Thu, Sep 26, 2013 at 02:40:31PM -0400, Benny Halevy wrote:
>> From: Benny Halevy <[email protected]>
>>
>> Signed-off-by: Benny Halevy <[email protected]>
>> [pnfsd: alloc_sid should kmalloc a object not a pointer]
>> Signed-off-by: Bian Naimeng <[email protected]>
>> Signed-off-by: Benny Halevy <[email protected]>
>> Signed-off-by: Benny Halevy <[email protected]>
>> ---
>> fs/nfsd/nfs4pnfsd.c | 120 ++++++++++++++++++++++++++++++++++++++++++++++++++++
>> fs/nfsd/pnfsd.h | 2 +
>> 2 files changed, 122 insertions(+)
>>
>> diff --git a/fs/nfsd/nfs4pnfsd.c b/fs/nfsd/nfs4pnfsd.c
>> index cb28207..9a7cbc9 100644
>> --- a/fs/nfsd/nfs4pnfsd.c
>> +++ b/fs/nfsd/nfs4pnfsd.c
>> @@ -25,3 +25,123 @@
>>
>> #define NFSDDBG_FACILITY NFSDDBG_PNFS
>>
>> +static DEFINE_SPINLOCK(layout_lock);
>> +
>> +/* hash table for nfsd4_pnfs_deviceid.sbid */
>> +#define SBID_HASH_BITS 8
>> +#define SBID_HASH_SIZE (1 << SBID_HASH_BITS)
>> +#define SBID_HASH_MASK (SBID_HASH_SIZE - 1)
>> +
>> +struct sbid_tracker {
>> + u64 id;
>> + struct super_block *sb;
>> + struct list_head hash;
>> +};
>> +
>> +static u64 current_sbid;
>> +static struct list_head sbid_hashtbl[SBID_HASH_SIZE];
>> +
>> +static unsigned long
>> +sbid_hashval(struct super_block *sb)
>> +{
>> + return hash_ptr(sb, SBID_HASH_BITS);
>> +}
>> +
>> +static struct sbid_tracker *
>> +alloc_sbid(void)
>> +{
>> + return kmalloc(sizeof(struct sbid_tracker), GFP_KERNEL);
>> +}
>> +
>> +static void
>> +destroy_sbid(struct sbid_tracker *sbid)
>> +{
>> + spin_lock(&layout_lock);
>> + list_del(&sbid->hash);
>> + spin_unlock(&layout_lock);
>> + kfree(sbid);
>> +}
>> +
>> +void
>> +nfsd4_free_pnfs_slabs(void)
>> +{
>> + int i;
>> + struct sbid_tracker *sbid;
>> +
>> + for (i = 0; i < SBID_HASH_SIZE; i++) {
>> + while (!list_empty(&sbid_hashtbl[i])) {
>> + sbid = list_first_entry(&sbid_hashtbl[i],
>> + struct sbid_tracker,
>> + hash);
>> + destroy_sbid(sbid);
>> + }
>> + }
>> +}
>> +
>> +int
>> +nfsd4_init_pnfs_slabs(void)
>> +{
>> + int i;
>> +
>> + for (i = 0; i < SBID_HASH_SIZE; i++)
>> + INIT_LIST_HEAD(&sbid_hashtbl[i]);
>> +
>> + return 0;
>> +}
>> +
>> +static u64
>> +alloc_init_sbid(struct super_block *sb)
>> +{
>> + struct sbid_tracker *sbid;
>> + struct sbid_tracker *new = alloc_sbid();
>> + unsigned long hash_idx = sbid_hashval(sb);
>> + u64 id = 0;
>> +
>> + if (likely(new)) {
>> + spin_lock(&layout_lock);
>> + id = ++current_sbid;
>> + new->id = (id << SBID_HASH_BITS) | (hash_idx & SBID_HASH_MASK);
>> + id = new->id;
>> + BUG_ON(id == 0);
>> + new->sb = sb;
>> +
>> + list_for_each_entry (sbid, &sbid_hashtbl[hash_idx], hash)
>> + if (sbid->sb == sb) {
>> + kfree(new);
>> + id = sbid->id;
>> + spin_unlock(&layout_lock);
>> + return id;
>> + }
>> + list_add(&new->hash, &sbid_hashtbl[hash_idx]);
>> + spin_unlock(&layout_lock);
>> + }
>> + return id;
>> +}
>> +
>> +static u64
>> +find_create_sbid(struct super_block *sb)
>> +{
>> + struct sbid_tracker *sbid;
>> + unsigned long hash_idx = sbid_hashval(sb);
>> + int pos = 0;
>> + u64 id = 0;
>> +
>> + spin_lock(&layout_lock);
>> + list_for_each_entry (sbid, &sbid_hashtbl[hash_idx], hash) {
>> + pos++;
>> + if (sbid->sb != sb)
>> + continue;
>> + if (pos > 1) {
>> + list_del(&sbid->hash);
>> + list_add(&sbid->hash, &sbid_hashtbl[hash_idx]);
>> + }
>> + id = sbid->id;
>> + break;
>> + }
>> + spin_unlock(&layout_lock);
>> +
>> + if (!id)
>> + id = alloc_init_sbid(sb);
>> +
>> + return id;
>> +}
>> diff --git a/fs/nfsd/pnfsd.h b/fs/nfsd/pnfsd.h
>> index 7c46791..29ea2e7 100644
>> --- a/fs/nfsd/pnfsd.h
>> +++ b/fs/nfsd/pnfsd.h
>> @@ -36,4 +36,6 @@
>>
>> #include <linux/nfsd/nfsd4_pnfs.h>
>>
>> +#include "xdr4.h"
>> +
>> #endif /* LINUX_NFSD_PNFSD_H */
>> --
>> 1.8.3.1
>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
On Wed, 02 Oct 2013 17:36:29 +0300 Benny Halevy <[email protected]> wrote:
> From: Benny Halevy <[email protected]>
>
> get_cached_acl is defined as inline in posix_acl.h
> requiring the full definition of struct inode as it
> dereferences its struct inode * parameter.
That's very old code so you must have a peculiar config. Please
describe the circumstances under which this occurs, because I'd like to
avoid merging this patch.
> --- a/include/linux/posix_acl.h
> +++ b/include/linux/posix_acl.h
> @@ -9,6 +9,7 @@
> #define __LINUX_POSIX_ACL_H
>
> #include <linux/bug.h>
> +#include <linux/fs.h>
> #include <linux/slab.h>
> #include <linux/rcupdate.h>
A better fix is to undo all that crazy inlining in posix_acl.h.
From: Andrew Morton <[email protected]>
Subject: posix_acl: uninlining
Uninline vast tracts of nested inline functions in
include/linux/posix_acl.h.
This reduces the text+data+bss size of x86_64 allyesconfig vmlinux by 8026
bytes.
Also fixes an obscure build error reported by Benny: get_cached_acl()
needs fs.h for struct inode internals.
The patch also regularises the positioning of the EXPORT_SYMBOLs in
posix_acl.c.
Reported-by:: Benny Halevy <[email protected]>
Cc: Alexander Viro <[email protected]>
Cc: J. Bruce Fields <[email protected]>
Cc: Trond Myklebust <[email protected]>
Cc: Benny Halevy <[email protected]>
Cc: Andreas Gruenbacher <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
---
fs/posix_acl.c | 84 +++++++++++++++++++++++++++++++++---
include/linux/posix_acl.h | 78 ++-------------------------------
2 files changed, 85 insertions(+), 77 deletions(-)
diff -puN fs/posix_acl.c~posix_acl-uninlining fs/posix_acl.c
--- a/fs/posix_acl.c~posix_acl-uninlining
+++ a/fs/posix_acl.c
@@ -22,11 +22,80 @@
#include <linux/errno.h>
-EXPORT_SYMBOL(posix_acl_init);
-EXPORT_SYMBOL(posix_acl_alloc);
-EXPORT_SYMBOL(posix_acl_valid);
-EXPORT_SYMBOL(posix_acl_equiv_mode);
-EXPORT_SYMBOL(posix_acl_from_mode);
+struct posix_acl **acl_by_type(struct inode *inode, int type)
+{
+ switch (type) {
+ case ACL_TYPE_ACCESS:
+ return &inode->i_acl;
+ case ACL_TYPE_DEFAULT:
+ return &inode->i_default_acl;
+ default:
+ BUG();
+ }
+}
+EXPORT_SYMBOL(acl_by_type);
+
+struct posix_acl *get_cached_acl(struct inode *inode, int type)
+{
+ struct posix_acl **p = acl_by_type(inode, type);
+ struct posix_acl *acl = ACCESS_ONCE(*p);
+ if (acl) {
+ spin_lock(&inode->i_lock);
+ acl = *p;
+ if (acl != ACL_NOT_CACHED)
+ acl = posix_acl_dup(acl);
+ spin_unlock(&inode->i_lock);
+ }
+ return acl;
+}
+EXPORT_SYMBOL(get_cached_acl);
+
+struct posix_acl *get_cached_acl_rcu(struct inode *inode, int type)
+{
+ return rcu_dereference(*acl_by_type(inode, type));
+}
+EXPORT_SYMBOL(get_cached_acl_rcu);
+
+void set_cached_acl(struct inode *inode, int type, struct posix_acl *acl)
+{
+ struct posix_acl **p = acl_by_type(inode, type);
+ struct posix_acl *old;
+ spin_lock(&inode->i_lock);
+ old = *p;
+ rcu_assign_pointer(*p, posix_acl_dup(acl));
+ spin_unlock(&inode->i_lock);
+ if (old != ACL_NOT_CACHED)
+ posix_acl_release(old);
+}
+EXPORT_SYMBOL(set_cached_acl);
+
+void forget_cached_acl(struct inode *inode, int type)
+{
+ struct posix_acl **p = acl_by_type(inode, type);
+ struct posix_acl *old;
+ spin_lock(&inode->i_lock);
+ old = *p;
+ *p = ACL_NOT_CACHED;
+ spin_unlock(&inode->i_lock);
+ if (old != ACL_NOT_CACHED)
+ posix_acl_release(old);
+}
+EXPORT_SYMBOL(forget_cached_acl);
+
+void forget_all_cached_acls(struct inode *inode)
+{
+ struct posix_acl *old_access, *old_default;
+ spin_lock(&inode->i_lock);
+ old_access = inode->i_acl;
+ old_default = inode->i_default_acl;
+ inode->i_acl = inode->i_default_acl = ACL_NOT_CACHED;
+ spin_unlock(&inode->i_lock);
+ if (old_access != ACL_NOT_CACHED)
+ posix_acl_release(old_access);
+ if (old_default != ACL_NOT_CACHED)
+ posix_acl_release(old_default);
+}
+EXPORT_SYMBOL(forget_all_cached_acls);
/*
* Init a fresh posix_acl
@@ -37,6 +106,7 @@ posix_acl_init(struct posix_acl *acl, in
atomic_set(&acl->a_refcount, 1);
acl->a_count = count;
}
+EXPORT_SYMBOL(posix_acl_init);
/*
* Allocate a new ACL with the specified number of entries.
@@ -51,6 +121,7 @@ posix_acl_alloc(int count, gfp_t flags)
posix_acl_init(acl, count);
return acl;
}
+EXPORT_SYMBOL(posix_acl_alloc);
/*
* Clone an ACL.
@@ -146,6 +217,7 @@ posix_acl_valid(const struct posix_acl *
return 0;
return -EINVAL;
}
+EXPORT_SYMBOL(posix_acl_valid);
/*
* Returns 0 if the acl can be exactly represented in the traditional
@@ -186,6 +258,7 @@ posix_acl_equiv_mode(const struct posix_
*mode_p = (*mode_p & ~S_IRWXUGO) | mode;
return not_equiv;
}
+EXPORT_SYMBOL(posix_acl_equiv_mode);
/*
* Create an ACL representing the file mode permission bits of an inode.
@@ -207,6 +280,7 @@ posix_acl_from_mode(umode_t mode, gfp_t
acl->a_entries[2].e_perm = (mode & S_IRWXO);
return acl;
}
+EXPORT_SYMBOL(posix_acl_from_mode);
/*
* Return 0 if current is granted want access to the inode
diff -puN include/linux/posix_acl.h~posix_acl-uninlining include/linux/posix_acl.h
--- a/include/linux/posix_acl.h~posix_acl-uninlining
+++ a/include/linux/posix_acl.h
@@ -94,78 +94,12 @@ extern int posix_acl_chmod(struct posix_
extern struct posix_acl *get_posix_acl(struct inode *, int);
extern int set_posix_acl(struct inode *, int, struct posix_acl *);
-#ifdef CONFIG_FS_POSIX_ACL
-static inline struct posix_acl **acl_by_type(struct inode *inode, int type)
-{
- switch (type) {
- case ACL_TYPE_ACCESS:
- return &inode->i_acl;
- case ACL_TYPE_DEFAULT:
- return &inode->i_default_acl;
- default:
- BUG();
- }
-}
-
-static inline struct posix_acl *get_cached_acl(struct inode *inode, int type)
-{
- struct posix_acl **p = acl_by_type(inode, type);
- struct posix_acl *acl = ACCESS_ONCE(*p);
- if (acl) {
- spin_lock(&inode->i_lock);
- acl = *p;
- if (acl != ACL_NOT_CACHED)
- acl = posix_acl_dup(acl);
- spin_unlock(&inode->i_lock);
- }
- return acl;
-}
-
-static inline struct posix_acl *get_cached_acl_rcu(struct inode *inode, int type)
-{
- return rcu_dereference(*acl_by_type(inode, type));
-}
-
-static inline void set_cached_acl(struct inode *inode,
- int type,
- struct posix_acl *acl)
-{
- struct posix_acl **p = acl_by_type(inode, type);
- struct posix_acl *old;
- spin_lock(&inode->i_lock);
- old = *p;
- rcu_assign_pointer(*p, posix_acl_dup(acl));
- spin_unlock(&inode->i_lock);
- if (old != ACL_NOT_CACHED)
- posix_acl_release(old);
-}
-
-static inline void forget_cached_acl(struct inode *inode, int type)
-{
- struct posix_acl **p = acl_by_type(inode, type);
- struct posix_acl *old;
- spin_lock(&inode->i_lock);
- old = *p;
- *p = ACL_NOT_CACHED;
- spin_unlock(&inode->i_lock);
- if (old != ACL_NOT_CACHED)
- posix_acl_release(old);
-}
-
-static inline void forget_all_cached_acls(struct inode *inode)
-{
- struct posix_acl *old_access, *old_default;
- spin_lock(&inode->i_lock);
- old_access = inode->i_acl;
- old_default = inode->i_default_acl;
- inode->i_acl = inode->i_default_acl = ACL_NOT_CACHED;
- spin_unlock(&inode->i_lock);
- if (old_access != ACL_NOT_CACHED)
- posix_acl_release(old_access);
- if (old_default != ACL_NOT_CACHED)
- posix_acl_release(old_default);
-}
-#endif
+struct posix_acl **acl_by_type(struct inode *inode, int type);
+struct posix_acl *get_cached_acl(struct inode *inode, int type);
+struct posix_acl *get_cached_acl_rcu(struct inode *inode, int type);
+void set_cached_acl(struct inode *inode, int type, struct posix_acl *acl);
+void forget_cached_acl(struct inode *inode, int type);
+void forget_all_cached_acls(struct inode *inode);
static inline void cache_no_acl(struct inode *inode)
{
_
On Wed, Oct 02, 2013 at 02:35:42PM +0300, Benny Halevy wrote:
> For loosely clustered files-layout file systems the MDS would also
> need to handle LAYOUTCOMMIT, and also generate and handle layout recalls,
> depending on how much of that we can do in a generic way and how much is
> file system specific.
Which is something we can deal with in great detail once such a
filesystem and the pnfs support for it land in the kernel tree.
On Sun, Oct 13, 2013 at 09:11:40AM +0300, Benny Halevy wrote:
> On 2013-10-11 22:56, Christoph Hellwig wrote:
> > On Thu, Sep 26, 2013 at 02:40:31PM -0400, Benny Halevy wrote:
> >> From: Benny Halevy <[email protected]>
> >
> > This is entirely unessecary. Just make sure all layouts set the sbid
> > field in getdevicelist to the s_dev value of the filesystem and you
> > can just use user_get_super.
>
> That's true for filesystmes that have a meaningful s_dev, unlike exofs,
> but this functionality can be added later respectively.
Even exofs has a s_dev from the anon dev_T allocator. Given that
layouts aren't supposed to surived over reboots of the server I don't
see a problem with it.
On Sun, Sep 29, 2013 at 02:16:40PM +0300, Benny Halevy wrote:
> On 2013-09-27 17:44, J. Bruce Fields wrote:
> > On Thu, Sep 26, 2013 at 02:40:24PM -0400, Benny Halevy wrote:
> >> From: Benny Halevy <[email protected]>
> >>
> >> Verify whether the server and file system support the given layout type.
> >>
> >> [was pnfsd: Streamline error code checking for non-pnfs filesystems]
> >> Signed-off-by: Dean Hildebrand <[email protected]>
> >> [pnfsd: Add super block to layout_type()]
> >> Signed-off-by: Marc Eshel <[email protected]>
> >> [pnfsd: Fix order of ops in nfsd4_layout_verify]
> >> Signed-off-by: Dean Hildebrand <[email protected]>
> >> [pnfsd: convert generic code to use new pnfs api]
> >> [pnfsd: define pnfs_export_operations]
> >> [pnfsd: obliterate old vfs api]
> >> Signed-off-by: Benny Halevy <[email protected]>
> >> [pnfsd: layout verify all layout types]
> >> Signed-off-by: Andy Adamson <[email protected]>
> >> [pnfsd: tone nfsd4_layout_verify printk down to dprintk]
> >> Signed-off-by: Benny Halevy <[email protected]>
> >> [pnfsd: check ex_pnfs in nfsd4_verify_layout]
> >> Signed-off-by: Andy Adamson <[email protected]>
> >> [pnfsd: handle s_pnfs_op==NULL]
> >> [pnfsd: verify export option only if svc_export is present]
> >> Signed-off-by: Benny Halevy <[email protected]>
> >> Signed-off-by: Benny Halevy <[email protected]>
> >> ---
> >> fs/nfsd/export.c | 6 ++++++
> >> fs/nfsd/nfs4proc.c | 39 +++++++++++++++++++++++++++++++++++++++
> >> fs/nfsd/pnfsd.h | 2 ++
> >> include/linux/nfsd/nfsd4_pnfs.h | 5 ++++-
> >> 4 files changed, 51 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/fs/nfsd/export.c b/fs/nfsd/export.c
> >> index 7730dfd..d803414 100644
> >> --- a/fs/nfsd/export.c
> >> +++ b/fs/nfsd/export.c
> >> @@ -376,6 +376,12 @@ static int check_export(struct inode *inode, int *flags, unsigned char *uuid)
> >> return -EINVAL;
> >> }
> >>
> >> + if (inode->i_sb->s_pnfs_op &&
> >> + !inode->i_sb->s_pnfs_op->layout_type) {
> >> + dprintk("exp_export: export of invalid fs pnfs export ops.\n");
> >> + return -EINVAL;
> >> + }
> >> +
> >
> > If you haven't already done it you may want to look at modifying
> > nfs-utils/utils/exportfs/exportfs.c:test_export() to add the pnfs option
> > when appropriate so the error can be returned at exportfs time.
>
> Hmm, I'm not sure I follow your proposal.
> In fs/nfsd/exportf.c:check_export() we check whether i_sb->s_export_op and
> respectively, i_sb->s_pnfs_op support the required export methods.
> How would we know in utils/exportfs when is appropriate to add the pnfs option?
At exportfs time, if somebody requests pnfs, but this filesystem doesn't
support that, you probably want to return an error.
The way you'd do that would be by passing that pnfs option to
nfs-utils/utils/exportfs/exportfs.c:test_export() and including it on
the test-export it passes to the kernel. That test export will then
succeed or fail depending on whether the filesystem supports pnfs or
not.
Otherwise the failure in check_export() above won't be noticed until
it's too late to give helpful feedback to the user.
--b.
>
> Benny
>
> >
> >> return 0;
> >>
> >> }
> >> diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
> >> index 419572f..576b635 100644
> >> --- a/fs/nfsd/nfs4proc.c
> >> +++ b/fs/nfsd/nfs4proc.c
> >> @@ -41,6 +41,7 @@
> >> #include "vfs.h"
> >> #include "current_stateid.h"
> >> #include "netns.h"
> >> +#include "pnfsd.h"
> >>
> >> #ifdef CONFIG_NFSD_V4_SECURITY_LABEL
> >> #include <linux/security.h>
> >> @@ -1109,6 +1110,44 @@ static int fill_in_write_vector(struct kvec *vec, struct nfsd4_write *write)
> >> return status == nfserr_same ? nfs_ok : status;
> >> }
> >>
> >> +#if defined(CONFIG_PNFSD)
> >> +static __be32
> >> +nfsd4_layout_verify(struct super_block *sb, struct svc_export *exp,
> >> + unsigned int layout_type)
> >> +{
> >> + int status, type;
> >> +
> >> + /* check to see if pNFS is supported. */
> >> + status = nfserr_layoutunavailable;
> >> + if (exp && exp->ex_pnfs == 0) {
> >
> > Can this really be called with exp == NULL? If so don't you want to
> > fail that as well?
>
> It is called with exp == NULL from nfsd4_getdevinfo where it shouldn't
> cause an error return.
>
> Benny
>
> >
> >> + dprintk("%s: Underlying file system "
> >> + "is not exported over pNFS\n", __func__);
> >> + goto out;
> >> + }
> >> + if (!sb->s_pnfs_op || !sb->s_pnfs_op->layout_type) {
> >> + dprintk("%s: Underlying file system "
> >> + "does not support pNFS\n", __func__);
> >> + goto out;
> >> + }
> >> +
> >> + type = sb->s_pnfs_op->layout_type(sb);
> >> +
> >> + /* check to see if requested layout type is supported. */
> >> + status = nfserr_unknown_layouttype;
> >> + if (!type)
> >> + dprintk("BUG: %s: layout_type 0 is reserved and must not be "
> >> + "used by filesystem\n", __func__);
> >> + else if (type != layout_type)
> >> + dprintk("%s: requested layout type %d "
> >> + "does not match supported type %d\n",
> >> + __func__, layout_type, type);
> >> + else
> >> + status = nfs_ok;
> >> +out:
> >> + return status;
> >> +}
> >> +#endif /* CONFIG_PNFSD */
> >> +
> >> /*
> >> * NULL call.
> >> */
> >> diff --git a/fs/nfsd/pnfsd.h b/fs/nfsd/pnfsd.h
> >> index 65fb57e..7c46791 100644
> >> --- a/fs/nfsd/pnfsd.h
> >> +++ b/fs/nfsd/pnfsd.h
> >> @@ -34,4 +34,6 @@
> >> #ifndef LINUX_NFSD_PNFSD_H
> >> #define LINUX_NFSD_PNFSD_H
> >>
> >> +#include <linux/nfsd/nfsd4_pnfs.h>
> >> +
> >> #endif /* LINUX_NFSD_PNFSD_H */
> >> diff --git a/include/linux/nfsd/nfsd4_pnfs.h b/include/linux/nfsd/nfsd4_pnfs.h
> >> index ff6613e..d44669e 100644
> >> --- a/include/linux/nfsd/nfsd4_pnfs.h
> >> +++ b/include/linux/nfsd/nfsd4_pnfs.h
> >> @@ -34,6 +34,8 @@
> >> #ifndef _LINUX_NFSD_NFSD4_PNFS_H
> >> #define _LINUX_NFSD_NFSD4_PNFS_H
> >>
> >> +#include <linux/exportfs.h>
> >> +
> >> /*
> >> * pNFS export operations vector.
> >> *
> >> @@ -45,7 +47,8 @@
> >> * All other methods are optional and can be set to NULL if not implemented.
> >> */
> >> struct pnfs_export_operations {
> >> - /* stub */
> >> + /* Returns the supported pnfs_layouttype4. */
> >> + int (*layout_type) (struct super_block *);
> >> };
> >>
> >> #endif /* _LINUX_NFSD_NFSD4_PNFS_H */
> >> --
> >> 1.8.3.1
> >>
On Wed, Oct 02, 2013 at 05:32:32PM +0300, Benny Halevy wrote:
> On 2013-10-02 01:14, J. Bruce Fields wrote:
> > See previous comments. What guarantees these superblock pointers stay
> > good as long as they're in the cache?
>
> Currently, the dependency on nfsd.ko should hold them but that should go away.
I don't see how that prevents anyone from unmounting a filesystem.
> Trying to think about referencing svc_export instead, we use find_sbid_id
> to get to the superblock in nfsd4_getdevinfo since we have no current fh.
> And we need the superblock to call into the fs sb->s_pnfs_op->get_device_info
> later in nfsd4_encode_getdevinfo.
>
> Just to make sure, we can safely get to the sb via exp->ex_path.dentry->d_inode->i_sb
> right?
Right.
--b.
>
> Benny
>
> >
> > --b.
> >
> > On Thu, Sep 26, 2013 at 02:40:31PM -0400, Benny Halevy wrote:
> >> From: Benny Halevy <[email protected]>
> >>
> >> Signed-off-by: Benny Halevy <[email protected]>
> >> [pnfsd: alloc_sid should kmalloc a object not a pointer]
> >> Signed-off-by: Bian Naimeng <[email protected]>
> >> Signed-off-by: Benny Halevy <[email protected]>
> >> Signed-off-by: Benny Halevy <[email protected]>
> >> ---
> >> fs/nfsd/nfs4pnfsd.c | 120 ++++++++++++++++++++++++++++++++++++++++++++++++++++
> >> fs/nfsd/pnfsd.h | 2 +
> >> 2 files changed, 122 insertions(+)
> >>
> >> diff --git a/fs/nfsd/nfs4pnfsd.c b/fs/nfsd/nfs4pnfsd.c
> >> index cb28207..9a7cbc9 100644
> >> --- a/fs/nfsd/nfs4pnfsd.c
> >> +++ b/fs/nfsd/nfs4pnfsd.c
> >> @@ -25,3 +25,123 @@
> >>
> >> #define NFSDDBG_FACILITY NFSDDBG_PNFS
> >>
> >> +static DEFINE_SPINLOCK(layout_lock);
> >> +
> >> +/* hash table for nfsd4_pnfs_deviceid.sbid */
> >> +#define SBID_HASH_BITS 8
> >> +#define SBID_HASH_SIZE (1 << SBID_HASH_BITS)
> >> +#define SBID_HASH_MASK (SBID_HASH_SIZE - 1)
> >> +
> >> +struct sbid_tracker {
> >> + u64 id;
> >> + struct super_block *sb;
> >> + struct list_head hash;
> >> +};
> >> +
> >> +static u64 current_sbid;
> >> +static struct list_head sbid_hashtbl[SBID_HASH_SIZE];
> >> +
> >> +static unsigned long
> >> +sbid_hashval(struct super_block *sb)
> >> +{
> >> + return hash_ptr(sb, SBID_HASH_BITS);
> >> +}
> >> +
> >> +static struct sbid_tracker *
> >> +alloc_sbid(void)
> >> +{
> >> + return kmalloc(sizeof(struct sbid_tracker), GFP_KERNEL);
> >> +}
> >> +
> >> +static void
> >> +destroy_sbid(struct sbid_tracker *sbid)
> >> +{
> >> + spin_lock(&layout_lock);
> >> + list_del(&sbid->hash);
> >> + spin_unlock(&layout_lock);
> >> + kfree(sbid);
> >> +}
> >> +
> >> +void
> >> +nfsd4_free_pnfs_slabs(void)
> >> +{
> >> + int i;
> >> + struct sbid_tracker *sbid;
> >> +
> >> + for (i = 0; i < SBID_HASH_SIZE; i++) {
> >> + while (!list_empty(&sbid_hashtbl[i])) {
> >> + sbid = list_first_entry(&sbid_hashtbl[i],
> >> + struct sbid_tracker,
> >> + hash);
> >> + destroy_sbid(sbid);
> >> + }
> >> + }
> >> +}
> >> +
> >> +int
> >> +nfsd4_init_pnfs_slabs(void)
> >> +{
> >> + int i;
> >> +
> >> + for (i = 0; i < SBID_HASH_SIZE; i++)
> >> + INIT_LIST_HEAD(&sbid_hashtbl[i]);
> >> +
> >> + return 0;
> >> +}
> >> +
> >> +static u64
> >> +alloc_init_sbid(struct super_block *sb)
> >> +{
> >> + struct sbid_tracker *sbid;
> >> + struct sbid_tracker *new = alloc_sbid();
> >> + unsigned long hash_idx = sbid_hashval(sb);
> >> + u64 id = 0;
> >> +
> >> + if (likely(new)) {
> >> + spin_lock(&layout_lock);
> >> + id = ++current_sbid;
> >> + new->id = (id << SBID_HASH_BITS) | (hash_idx & SBID_HASH_MASK);
> >> + id = new->id;
> >> + BUG_ON(id == 0);
> >> + new->sb = sb;
> >> +
> >> + list_for_each_entry (sbid, &sbid_hashtbl[hash_idx], hash)
> >> + if (sbid->sb == sb) {
> >> + kfree(new);
> >> + id = sbid->id;
> >> + spin_unlock(&layout_lock);
> >> + return id;
> >> + }
> >> + list_add(&new->hash, &sbid_hashtbl[hash_idx]);
> >> + spin_unlock(&layout_lock);
> >> + }
> >> + return id;
> >> +}
> >> +
> >> +static u64
> >> +find_create_sbid(struct super_block *sb)
> >> +{
> >> + struct sbid_tracker *sbid;
> >> + unsigned long hash_idx = sbid_hashval(sb);
> >> + int pos = 0;
> >> + u64 id = 0;
> >> +
> >> + spin_lock(&layout_lock);
> >> + list_for_each_entry (sbid, &sbid_hashtbl[hash_idx], hash) {
> >> + pos++;
> >> + if (sbid->sb != sb)
> >> + continue;
> >> + if (pos > 1) {
> >> + list_del(&sbid->hash);
> >> + list_add(&sbid->hash, &sbid_hashtbl[hash_idx]);
> >> + }
> >> + id = sbid->id;
> >> + break;
> >> + }
> >> + spin_unlock(&layout_lock);
> >> +
> >> + if (!id)
> >> + id = alloc_init_sbid(sb);
> >> +
> >> + return id;
> >> +}
> >> diff --git a/fs/nfsd/pnfsd.h b/fs/nfsd/pnfsd.h
> >> index 7c46791..29ea2e7 100644
> >> --- a/fs/nfsd/pnfsd.h
> >> +++ b/fs/nfsd/pnfsd.h
> >> @@ -36,4 +36,6 @@
> >>
> >> #include <linux/nfsd/nfsd4_pnfs.h>
> >>
> >> +#include "xdr4.h"
> >> +
> >> #endif /* LINUX_NFSD_PNFSD_H */
> >> --
> >> 1.8.3.1
> >>
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> > the body of a message to [email protected]
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
> >
On 2013-09-29 15:19, Christoph Hellwig wrote:
> On Thu, Sep 26, 2013 at 02:42:59PM -0400, Benny Halevy wrote:
>> From: Benny Halevy <[email protected]>
>>
>> Include headers in nfs_xdr.h required for
>> struct rpc_task, nfs4_verifier, nfs4_stateid
>
> Is this actually dereferences or would it need a forward declaration?
>
> Also should go to Trond straight outside of this series.
Hmm, I thought I already replied to this, but I don't see my reply,
so here it is :)
I consulted with Trond about this and there's a better way to
get the definition here by including linux/nfs4.h that doesn't require
the nfs_xdr.h baggage. So this patch is dropped from the next
version.
Benny
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
See previous comments. What guarantees these superblock pointers stay
good as long as they're in the cache?
--b.
On Thu, Sep 26, 2013 at 02:40:31PM -0400, Benny Halevy wrote:
> From: Benny Halevy <[email protected]>
>
> Signed-off-by: Benny Halevy <[email protected]>
> [pnfsd: alloc_sid should kmalloc a object not a pointer]
> Signed-off-by: Bian Naimeng <[email protected]>
> Signed-off-by: Benny Halevy <[email protected]>
> Signed-off-by: Benny Halevy <[email protected]>
> ---
> fs/nfsd/nfs4pnfsd.c | 120 ++++++++++++++++++++++++++++++++++++++++++++++++++++
> fs/nfsd/pnfsd.h | 2 +
> 2 files changed, 122 insertions(+)
>
> diff --git a/fs/nfsd/nfs4pnfsd.c b/fs/nfsd/nfs4pnfsd.c
> index cb28207..9a7cbc9 100644
> --- a/fs/nfsd/nfs4pnfsd.c
> +++ b/fs/nfsd/nfs4pnfsd.c
> @@ -25,3 +25,123 @@
>
> #define NFSDDBG_FACILITY NFSDDBG_PNFS
>
> +static DEFINE_SPINLOCK(layout_lock);
> +
> +/* hash table for nfsd4_pnfs_deviceid.sbid */
> +#define SBID_HASH_BITS 8
> +#define SBID_HASH_SIZE (1 << SBID_HASH_BITS)
> +#define SBID_HASH_MASK (SBID_HASH_SIZE - 1)
> +
> +struct sbid_tracker {
> + u64 id;
> + struct super_block *sb;
> + struct list_head hash;
> +};
> +
> +static u64 current_sbid;
> +static struct list_head sbid_hashtbl[SBID_HASH_SIZE];
> +
> +static unsigned long
> +sbid_hashval(struct super_block *sb)
> +{
> + return hash_ptr(sb, SBID_HASH_BITS);
> +}
> +
> +static struct sbid_tracker *
> +alloc_sbid(void)
> +{
> + return kmalloc(sizeof(struct sbid_tracker), GFP_KERNEL);
> +}
> +
> +static void
> +destroy_sbid(struct sbid_tracker *sbid)
> +{
> + spin_lock(&layout_lock);
> + list_del(&sbid->hash);
> + spin_unlock(&layout_lock);
> + kfree(sbid);
> +}
> +
> +void
> +nfsd4_free_pnfs_slabs(void)
> +{
> + int i;
> + struct sbid_tracker *sbid;
> +
> + for (i = 0; i < SBID_HASH_SIZE; i++) {
> + while (!list_empty(&sbid_hashtbl[i])) {
> + sbid = list_first_entry(&sbid_hashtbl[i],
> + struct sbid_tracker,
> + hash);
> + destroy_sbid(sbid);
> + }
> + }
> +}
> +
> +int
> +nfsd4_init_pnfs_slabs(void)
> +{
> + int i;
> +
> + for (i = 0; i < SBID_HASH_SIZE; i++)
> + INIT_LIST_HEAD(&sbid_hashtbl[i]);
> +
> + return 0;
> +}
> +
> +static u64
> +alloc_init_sbid(struct super_block *sb)
> +{
> + struct sbid_tracker *sbid;
> + struct sbid_tracker *new = alloc_sbid();
> + unsigned long hash_idx = sbid_hashval(sb);
> + u64 id = 0;
> +
> + if (likely(new)) {
> + spin_lock(&layout_lock);
> + id = ++current_sbid;
> + new->id = (id << SBID_HASH_BITS) | (hash_idx & SBID_HASH_MASK);
> + id = new->id;
> + BUG_ON(id == 0);
> + new->sb = sb;
> +
> + list_for_each_entry (sbid, &sbid_hashtbl[hash_idx], hash)
> + if (sbid->sb == sb) {
> + kfree(new);
> + id = sbid->id;
> + spin_unlock(&layout_lock);
> + return id;
> + }
> + list_add(&new->hash, &sbid_hashtbl[hash_idx]);
> + spin_unlock(&layout_lock);
> + }
> + return id;
> +}
> +
> +static u64
> +find_create_sbid(struct super_block *sb)
> +{
> + struct sbid_tracker *sbid;
> + unsigned long hash_idx = sbid_hashval(sb);
> + int pos = 0;
> + u64 id = 0;
> +
> + spin_lock(&layout_lock);
> + list_for_each_entry (sbid, &sbid_hashtbl[hash_idx], hash) {
> + pos++;
> + if (sbid->sb != sb)
> + continue;
> + if (pos > 1) {
> + list_del(&sbid->hash);
> + list_add(&sbid->hash, &sbid_hashtbl[hash_idx]);
> + }
> + id = sbid->id;
> + break;
> + }
> + spin_unlock(&layout_lock);
> +
> + if (!id)
> + id = alloc_init_sbid(sb);
> +
> + return id;
> +}
> diff --git a/fs/nfsd/pnfsd.h b/fs/nfsd/pnfsd.h
> index 7c46791..29ea2e7 100644
> --- a/fs/nfsd/pnfsd.h
> +++ b/fs/nfsd/pnfsd.h
> @@ -36,4 +36,6 @@
>
> #include <linux/nfsd/nfsd4_pnfs.h>
>
> +#include "xdr4.h"
> +
> #endif /* LINUX_NFSD_PNFSD_H */
> --
> 1.8.3.1
>
On 2013-10-01 23:38, J. Bruce Fields wrote:
> On Sun, Sep 29, 2013 at 02:16:40PM +0300, Benny Halevy wrote:
>> On 2013-09-27 17:44, J. Bruce Fields wrote:
>>> On Thu, Sep 26, 2013 at 02:40:24PM -0400, Benny Halevy wrote:
>>>> From: Benny Halevy <[email protected]>
>>>>
>>>> Verify whether the server and file system support the given layout type.
>>>>
>>>> [was pnfsd: Streamline error code checking for non-pnfs filesystems]
>>>> Signed-off-by: Dean Hildebrand <[email protected]>
>>>> [pnfsd: Add super block to layout_type()]
>>>> Signed-off-by: Marc Eshel <[email protected]>
>>>> [pnfsd: Fix order of ops in nfsd4_layout_verify]
>>>> Signed-off-by: Dean Hildebrand <[email protected]>
>>>> [pnfsd: convert generic code to use new pnfs api]
>>>> [pnfsd: define pnfs_export_operations]
>>>> [pnfsd: obliterate old vfs api]
>>>> Signed-off-by: Benny Halevy <[email protected]>
>>>> [pnfsd: layout verify all layout types]
>>>> Signed-off-by: Andy Adamson <[email protected]>
>>>> [pnfsd: tone nfsd4_layout_verify printk down to dprintk]
>>>> Signed-off-by: Benny Halevy <[email protected]>
>>>> [pnfsd: check ex_pnfs in nfsd4_verify_layout]
>>>> Signed-off-by: Andy Adamson <[email protected]>
>>>> [pnfsd: handle s_pnfs_op==NULL]
>>>> [pnfsd: verify export option only if svc_export is present]
>>>> Signed-off-by: Benny Halevy <[email protected]>
>>>> Signed-off-by: Benny Halevy <[email protected]>
>>>> ---
>>>> fs/nfsd/export.c | 6 ++++++
>>>> fs/nfsd/nfs4proc.c | 39 +++++++++++++++++++++++++++++++++++++++
>>>> fs/nfsd/pnfsd.h | 2 ++
>>>> include/linux/nfsd/nfsd4_pnfs.h | 5 ++++-
>>>> 4 files changed, 51 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/fs/nfsd/export.c b/fs/nfsd/export.c
>>>> index 7730dfd..d803414 100644
>>>> --- a/fs/nfsd/export.c
>>>> +++ b/fs/nfsd/export.c
>>>> @@ -376,6 +376,12 @@ static int check_export(struct inode *inode, int *flags, unsigned char *uuid)
>>>> return -EINVAL;
>>>> }
>>>>
>>>> + if (inode->i_sb->s_pnfs_op &&
>>>> + !inode->i_sb->s_pnfs_op->layout_type) {
>>>> + dprintk("exp_export: export of invalid fs pnfs export ops.\n");
>>>> + return -EINVAL;
>>>> + }
>>>> +
>>>
>>> If you haven't already done it you may want to look at modifying
>>> nfs-utils/utils/exportfs/exportfs.c:test_export() to add the pnfs option
>>> when appropriate so the error can be returned at exportfs time.
>>
>> Hmm, I'm not sure I follow your proposal.
>> In fs/nfsd/exportf.c:check_export() we check whether i_sb->s_export_op and
>> respectively, i_sb->s_pnfs_op support the required export methods.
>> How would we know in utils/exportfs when is appropriate to add the pnfs option?
>
> At exportfs time, if somebody requests pnfs, but this filesystem doesn't
> support that, you probably want to return an error.
>
> The way you'd do that would be by passing that pnfs option to
> nfs-utils/utils/exportfs/exportfs.c:test_export() and including it on
> the test-export it passes to the kernel. That test export will then
> succeed or fail depending on whether the filesystem supports pnfs or
> not.
>
> Otherwise the failure in check_export() above won't be noticed until
> it's too late to give helpful feedback to the user.
I see. Thanks.
>
> --b.
>
>>
>> Benny
>>
>>>
>>>> return 0;
>>>>
>>>> }
>>>> diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
>>>> index 419572f..576b635 100644
>>>> --- a/fs/nfsd/nfs4proc.c
>>>> +++ b/fs/nfsd/nfs4proc.c
>>>> @@ -41,6 +41,7 @@
>>>> #include "vfs.h"
>>>> #include "current_stateid.h"
>>>> #include "netns.h"
>>>> +#include "pnfsd.h"
>>>>
>>>> #ifdef CONFIG_NFSD_V4_SECURITY_LABEL
>>>> #include <linux/security.h>
>>>> @@ -1109,6 +1110,44 @@ static int fill_in_write_vector(struct kvec *vec, struct nfsd4_write *write)
>>>> return status == nfserr_same ? nfs_ok : status;
>>>> }
>>>>
>>>> +#if defined(CONFIG_PNFSD)
>>>> +static __be32
>>>> +nfsd4_layout_verify(struct super_block *sb, struct svc_export *exp,
>>>> + unsigned int layout_type)
>>>> +{
>>>> + int status, type;
>>>> +
>>>> + /* check to see if pNFS is supported. */
>>>> + status = nfserr_layoutunavailable;
>>>> + if (exp && exp->ex_pnfs == 0) {
>>>
>>> Can this really be called with exp == NULL? If so don't you want to
>>> fail that as well?
>>
>> It is called with exp == NULL from nfsd4_getdevinfo where it shouldn't
>> cause an error return.
>>
>> Benny
>>
>>>
>>>> + dprintk("%s: Underlying file system "
>>>> + "is not exported over pNFS\n", __func__);
>>>> + goto out;
>>>> + }
>>>> + if (!sb->s_pnfs_op || !sb->s_pnfs_op->layout_type) {
>>>> + dprintk("%s: Underlying file system "
>>>> + "does not support pNFS\n", __func__);
>>>> + goto out;
>>>> + }
>>>> +
>>>> + type = sb->s_pnfs_op->layout_type(sb);
>>>> +
>>>> + /* check to see if requested layout type is supported. */
>>>> + status = nfserr_unknown_layouttype;
>>>> + if (!type)
>>>> + dprintk("BUG: %s: layout_type 0 is reserved and must not be "
>>>> + "used by filesystem\n", __func__);
>>>> + else if (type != layout_type)
>>>> + dprintk("%s: requested layout type %d "
>>>> + "does not match supported type %d\n",
>>>> + __func__, layout_type, type);
>>>> + else
>>>> + status = nfs_ok;
>>>> +out:
>>>> + return status;
>>>> +}
>>>> +#endif /* CONFIG_PNFSD */
>>>> +
>>>> /*
>>>> * NULL call.
>>>> */
>>>> diff --git a/fs/nfsd/pnfsd.h b/fs/nfsd/pnfsd.h
>>>> index 65fb57e..7c46791 100644
>>>> --- a/fs/nfsd/pnfsd.h
>>>> +++ b/fs/nfsd/pnfsd.h
>>>> @@ -34,4 +34,6 @@
>>>> #ifndef LINUX_NFSD_PNFSD_H
>>>> #define LINUX_NFSD_PNFSD_H
>>>>
>>>> +#include <linux/nfsd/nfsd4_pnfs.h>
>>>> +
>>>> #endif /* LINUX_NFSD_PNFSD_H */
>>>> diff --git a/include/linux/nfsd/nfsd4_pnfs.h b/include/linux/nfsd/nfsd4_pnfs.h
>>>> index ff6613e..d44669e 100644
>>>> --- a/include/linux/nfsd/nfsd4_pnfs.h
>>>> +++ b/include/linux/nfsd/nfsd4_pnfs.h
>>>> @@ -34,6 +34,8 @@
>>>> #ifndef _LINUX_NFSD_NFSD4_PNFS_H
>>>> #define _LINUX_NFSD_NFSD4_PNFS_H
>>>>
>>>> +#include <linux/exportfs.h>
>>>> +
>>>> /*
>>>> * pNFS export operations vector.
>>>> *
>>>> @@ -45,7 +47,8 @@
>>>> * All other methods are optional and can be set to NULL if not implemented.
>>>> */
>>>> struct pnfs_export_operations {
>>>> - /* stub */
>>>> + /* Returns the supported pnfs_layouttype4. */
>>>> + int (*layout_type) (struct super_block *);
>>>> };
>>>>
>>>> #endif /* _LINUX_NFSD_NFSD4_PNFS_H */
>>>> --
>>>> 1.8.3.1
>>>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
From: Benny Halevy <[email protected]>
get_cached_acl is defined as inline in posix_acl.h
requiring the full definition of struct inode as it
dereferences its struct inode * parameter.
Cc: Alexander Viro <[email protected]>
Cc: [email protected]
Cc: J. Bruce Fields <[email protected]>
Cc: Trond Myklebust <[email protected]>
Signed-off-by: Benny Halevy <[email protected]>
Signed-off-by: Benny Halevy <[email protected]>
---
include/linux/posix_acl.h | 1 +
1 file changed, 1 insertion(+)
diff --git a/include/linux/posix_acl.h b/include/linux/posix_acl.h
index 7931efe..a7d8b04 100644
--- a/include/linux/posix_acl.h
+++ b/include/linux/posix_acl.h
@@ -9,6 +9,7 @@
#define __LINUX_POSIX_ACL_H
#include <linux/bug.h>
+#include <linux/fs.h>
#include <linux/slab.h>
#include <linux/rcupdate.h>
--
1.8.3.1
On Sun, Oct 13, 2013 at 03:44:30PM +0300, Benny Halevy wrote:
> As far as I can see, exofs doesn't actually doing that.
> Currently exofs_fill_super sets s_dev = 0;
> Boaz, did I miss anything?
Just removing that line should fix it, mount_nodev already takes care of
getting a proper s_dev through set_anon_super. It'll also need to
switch .kill_sb to kill_anon_super just like the other filesystems using
mount_nodev.
On 2013-09-29 15:16, Christoph Hellwig wrote:
> The actual XDR encoding doesn't have business being under fs/exportfs
> and should be in the NFSD code itself.
But then it will create a module dependency we want to avoid.
Benny
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
I've been looking at this patch and the surrounding code a bit more and
start to really dislike it.
put_nfs4_file is very much unrelated to the state lock, so having a
version that internally drops it seems wrong. If we look at the usage
of your new locked version there's very few callers:
put_nfs4_file_locked
+ destroy_layout_state
+ put_layout_state
+ destroy_layout
+ destroy_layout_list
+ nfs4_pnfs_return_layout
+ pnfs_expire_client
+ nfs4_pnfs_get_layout
+ nfs4_pnfs_return_layout
Except for pnfs_expire_client all these are right near the places where
the state lock is dropped, so simply refactoring the code sounds like
a very valid option.
And btw, the state locking is even more of a mess than I though. I
think it really needs to be split up sooner or later.
On 2013-10-11 22:56, Christoph Hellwig wrote:
> On Thu, Sep 26, 2013 at 02:40:31PM -0400, Benny Halevy wrote:
>> From: Benny Halevy <[email protected]>
>
> This is entirely unessecary. Just make sure all layouts set the sbid
> field in getdevicelist to the s_dev value of the filesystem and you
> can just use user_get_super.
That's true for filesystmes that have a meaningful s_dev, unlike exofs,
but this functionality can be added later respectively.
Benny
On 2013-10-01 03:23, Boaz Harrosh wrote:
> On 09/27/2013 01:19 PM, Benny Halevy wrote:> On Fri, Sep 27, 2013 at 12:37 PM, Boaz Harrosh <[email protected]> wrote:
>>
>> Boaz, sorry but the files layout went first to production on the
>> client side in all major
>> enterprise distributions so it doesn't make sense to submit exofs first.
>> As for your patch series, I respect the work you did on it but
>> a. as you said it is your patch series, not mine
>
> This is not my patch series it is all of ours patch series I do
> not have a different one then yours. Every one did some work
> in his area. So I wrote the exofs part as well as lots of core
> parts. But we always kept one tree. Our tree
>
>> b. the forward port from 3.10 on changed the layout state handling
>> radically (for the better I hope :)
>> solving numerous correctness issues.
>
> Cool good is good, right? Do you mean that exofs would not work now.
> Why would it be broken?
No, it should work in principle.
The main change is moving the recall cookie from the layout return
interface to a new call: "layout_recall_done".
>
>> The motivation behind the dlm based implementation is to have a
>> minimal useful pnfs implementation
>> that folks can use and test the client against.
>
> What kind of dumb test is DLM, without any write support. It is
> plain not pNFS it is a freak. There is nothing to test. READONLY
> file system, don't you see the joke in that?
It is what it is now and it can be improved.
>
> If this is your motivation, testing, then at least put pnfs-exp
> as the reference implementation for some real client testing.
>
It is possible, but maybe borrowing some functionality from pnfsd-lexp
such as layout segments, io error injection, controlling layoutcommit-through-mds,
or return on close, might be helpful
>> On this basis, writes layout can be added,
>
> What writes layout in DLM? no hands waving please.
>
Write layouts can be used for load balancing with affinity.
For example, if someone holds an exclusive lock on the file
point to that node, otherwise pick ino % #nodes.
>> and further on, exofs
>> support can submitted as the next stage.
>>
>
> You are doing the work, what can I say. We have decided this before
> I think it was even Bruce's idea not mine. So you change that decision?
It's based on reconsidered according to the current state of upstream
and the patchset, but my goal is to submit both, just in stages, one on top of the other.
>
> For me the DLM is a joke and a bad face for the 6 years of effort
> I put on this thing.
I agree that besides being a basic testing tool and a basis for further
improvement it's a methodological stage at this point, I'm ok if it's
useful even for simplifying the review process.
> This is not pNFS and will do more arm then good
> to my cause. If you need it just for testing why do you need it in
> mainline? mainline is for real users and benefits no?
The value for users is currently improving read only bandwidth.
Again, not ideal, but a good start.
>
> I think I agree with Christoph Better wait for a real open-source
> pNFS server implementation before putting any of this in the Kernel.
There's a chicken and egg problem here.
I'm absolutely ok with reviewing the patchset piece wise and submitting exofs
in one shot. There is an opportunity to submit just the pnfsd-dlm part
that is self contained to put a stake in the ground early on.
This is essentially up to Bruce to decide what he takes in first.
> Just leave it out-of-tree as it is now. The only real open-source
> pNFS server implementation out there today that can demonstrate 10G
> saturation and scalability of up to 40G in the 40 nodes setup I had, is exofs.
> So it is the only one that can justify such a big piece added to the Kernel.
> Real sorry for the inconvenience of it being objects and not files.
> If it would matter to someone so much as it did for me then perhaps
> he would sit on his "thing" and implement one. But the fact is that no
> one cares for a files-layout open-source server.
>
> And you are off the hook this is the last I will comment on this.
Boaz, it does not have to be, and should not be a flame war.
Feel free to discuss and try to convince, just please, I just can't hear you
if you shout...
>
>> Benny
>>
>
> Thanks
> Boaz
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
On Mon, Sep 30, 2013 at 06:05:18PM -0700, Boaz Harrosh wrote:
> The pnfs protocol and people have plans to, allow a multi typed
> layouts from the same super-block. It is a per file attribute.
> It even allows a multi protocol access to the same file.
> The only flag should be the presence of the layout_get vector
> that should indicate support or lack of it.
The current method doesn't help with that as it can return a single type
only anyway. So in principle I agree with you, but the way to fix it is
not to keep the method, but to make sure it returns a bitmap of
supported layouts.
> > - there should be a struct pnfs_operations, but it should be confined
> > to fs/nfsd: each layout can be a separate loadable module and gets
> > registered there. For the initial file layout that module is
> > self-contained, but for e.g. block or objects it would have
> > call into the filesystem through export_ops, although way lower level
> > than the NFS XDR level, e.g. for block there would be one of to get
> > the extent map, and one to allocate an extent.
> >
>
> No! This does not make any sense. What you say does not fit any model of any
> cluster filesystem today.
>
> - Again the FS can support any protocol.
> - Only the FS understand the structure and layout of the file access. Any
> other model is a specific implementation and breaks abstraction. The only true
> abstraction is the LO_GET LO_RETURN LO_COMMIT DEVICE_INFO and LO_CB_RECALL. anything
> else is making assumptions.
>
> There is a pnfs vector and it is at this abstraction level exactly.
No, the problem is that the pnfs_export_operations are entirely at the
wrong level, as I tried to explain. The right level is very different
for the different layouts:
- for files it needs to boil down to a:
- get a list of devices
- given an inode/offset return the layout
- for block it's get a block map for a file / create an unwritten
extent / convert it to written
- for object it seems (not too familar):
- get a list of devices for this fs
- given an inode/offset return the layout
- tell the fs that I/O has finished
As all the layouts operate on different data structures it makes sense
to make the methods operate on those, and keep the boilerplate code
including the XDR encoding/decoding in one single place.
Now how these pnfsd object layout drivers communicate with the fs I
don't have an opinion on until we see the actual code, maybe we need a
pnfs_<layout>_ops if it's complicated enough. For the files case that
can just call into dlm directly currently as we have no other
interesting cluster fs in tree it's a mood point. For block it's simple
enough that I'd just add it to export_ops, if not by that time we redo
the current get_blocks mess in a way that we can simply piggyback it on
that which would be even easier.
> > This way we alsod avoid the dependcy on nfsd in the filesystems that the
> > cureent version introduces.
>
> There is no "dependency on nfsd in the filesystems"
The patchset as pulled will created a depency on nfsd.ko from gfs2.ko
> The only dependency the FS has is an import of some library routines
> at exportfs that take an abstract layout and device descriptions and encode
> them into an XDR buffer. But the FS knows nothing of the XDR and the
> NFSD is free to unload at any moment without forcing the FS to unload
> first or at all.
> This is actually tested, in fact I do this all the time when I want to
> start fresh and have NFSD close all resources on the FS.
That's not what happens with the file layout as posted, and it's not
something I want to see happen every, btw. In Linux we're all about
proper abstractions, and letting a fs control all pnfs aspects directly
instead of having common code is a receipe for tons of copy & pasted
code full of different bugs if we ever get additional implementations of
a layout. Not that I really expect any in tree as all the other
"interested partied" have shown to be leechers that just want to keep
their filesystems out of tree and most of the time as illegal propritary
modules anyway.
On 2013-09-29 15:19, Christoph Hellwig wrote:
> On Thu, Sep 26, 2013 at 02:42:48PM -0400, Benny Halevy wrote:
>> we don't want to hold the state_lock while the file system may block
>
> Needs a much beter changelog:
>
> - why don't you want to hold it
> - why you think the new version is safe and performs fine.
OK.
So the reason not to hold it is that the nfs state lock is global to the
server and blocks all state modifying operations such as:
open, close, lock, clientid, session operations, etc.
It is safe to release the state lock from the pnfs call sites
on the resource dereferencing path as:
a. The file system is not expected to recurs back into the knfsd code
while holding the state lock.
b. The high level operation is already done at this point and it is not
required to hold the state lock any further.
Note that there are more call sites of put_nfs4_file in nfs4state.c
that need further analysis and move to put_nfs4_file_locked where possible.
Benny
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
On 10/03/2013 10:21 AM, Christoph Hellwig wrote:
> On Thu, Oct 03, 2013 at 05:19:31PM +0300, Benny Halevy wrote:
>> The question is what is the minimum value for submitting upstream...
>>
>> The thing pnfs over dlm/gfs2 is missing mostly is supporting read/write layout.
>> One could use them load balancing, e.g. by either redirecting to a node
>> holding an exclusive lock on the file, if there is one, or dlm_ino_hash in its absence.
> One could do a lot of things, especially given infinite time. But what
> does the current code buy us? We need data on that to justify merging
> such a large chunk of code.
>
I think that we can help figure out how to get something reasonably useful ready
for upstream, just need to get some people spun up to help test out some of
these ideas...
ric
On Sun, Sep 29, 2013 at 05:35:53AM -0700, Christoph Hellwig wrote:
> On Sun, Sep 29, 2013 at 05:21:30AM -0700, Christoph Hellwig wrote:
> > > Bruce - are you ok with moving the pnfs interface definitions to
> > > include/linux/exportfs.h along with struct export_operations?
> > >
> > > In fact we can actually extend struct export_operations rather
> > > than adding pnfs_export_operations...
> >
> > Yes, it probably should go into the export ops, although the actual
> > method signatures might need to be made a litle less nfs-specific for
> > that.
>
> I jsut took a brief look over the diff for the whole series in the git
> tree and the old tree that still had block and exofs servers and have
> revised my opinion a little bit:
>
>
> - the should be a layout_type field in struct export_operations,
> indicating that a filesystem support some sort of pnfs-like export.
> - there should be a struct pnfs_operations, but it should be confined
> to fs/nfsd: each layout can be a separate loadable module and gets
> registered there. For the initial file layout that module is
> self-contained, but for e.g. block or objects it would have
> call into the filesystem through export_ops, although way lower level
> than the NFS XDR level, e.g. for block there would be one of to get
> the extent map, and one to allocate an extent.
That sounds OK to me.
My (possibly faulty) memory of how the xdr-encoding-in-the-fs came
about: I think people weren't willing to commit to any reasonable upper
limit on the size of the layout. So they originally considered
something more like readdir that would loop over extents and pass them
to a callback for encoding. But xdr isn't complicated so you could
instead give the filesystem a simple library of xdr encoders to call.
But it sounds like that's not exactly what got implemented. And maybe
nobody actually has such big layouts.
> This way we alsod avoid the dependcy on nfsd in the filesystems that the
> cureent version introduces.
Ugh--thanks for catching that.
--b.
On Sun, Sep 29, 2013 at 03:20:33PM +0300, Benny Halevy wrote:
> On 2013-09-29 15:13, Christoph Hellwig wrote:
> > On Sun, Sep 29, 2013 at 03:12:41PM +0300, Benny Halevy wrote:
> >>> Also why would you want a header
> >>> outside fs/nfsd/ ?
> >>
> >> This header contains the file system interface.
> >
> > Any interface for the filesystem should be part of exportfs.h, not
> > something nfs-specific.
>
> Makes sense. Thanks.
>
> Bruce - are you ok with moving the pnfs interface definitions to
> include/linux/exportfs.h along with struct export_operations?
Fine by me.--b.
>
> In fact we can actually extend struct export_operations rather
> than adding pnfs_export_operations...
>
> Benny
>
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> > the body of a message to [email protected]
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
> >
Hi,
I think saying exofs is a proof of concept/toy is missing the point.
Exofs is an implementation baseline that provides insight into the
scalability/performance values that a pnfs implementation can achieve,
and potentially how to achieve them.
Matt
----- "Christoph Hellwig" <[email protected]> wrote:
> On Thu, Oct 03, 2013 at 03:29:06PM +0300, Benny Halevy wrote:
> > I picked gfs2 as the initial use case for simplicity and ease of
> review.
> > If there is a rough consensus that it's useless and not worthy of
> inclusion
> > then the one we care about the most is exofs that has a more
> complete pnfs
> > implementation.
>
> This was in reference to file layout implementation details, so exofs
> isn't a contender there.
>
> As far as exofs is concerned a pnfs implementation based on it has
> just
> as much toy status as the current gfs2 one. While the pnfs side of
> it
> might as well be a lot better, a filesystem that lacks all the
> integrity
> and scalability features developed in the last 30 years can't be
> considered more than a proof of concept.
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs"
> in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
Matt Benjamin
The Linux Box
206 South Fifth Ave. Suite 150
Ann Arbor, MI 48104
http://linuxbox.com
tel. 734-761-4689
fax. 734-769-8938
cel. 734-216-5309
On 2013-10-11 22:47, Christoph Hellwig wrote:
> I've been looking at this patch and the surrounding code a bit more and
> start to really dislike it.
>
> put_nfs4_file is very much unrelated to the state lock, so having a
> version that internally drops it seems wrong. If we look at the usage
> of your new locked version there's very few callers:
>
>
> put_nfs4_file_locked
> + destroy_layout_state
> + put_layout_state
> + destroy_layout
> + destroy_layout_list
> + nfs4_pnfs_return_layout
> + pnfs_expire_client
> + nfs4_pnfs_get_layout
> + nfs4_pnfs_return_layout
>
> Except for pnfs_expire_client all these are right near the places where
> the state lock is dropped, so simply refactoring the code sounds like
> a very valid option.
I hear you. Makes sense.
>
>
> And btw, the state locking is even more of a mess than I though. I
> think it really needs to be split up sooner or later.
>
I'm working on patches eliminating the state lock by refactoring the code
that is currently protected by it, moving all blocking calls to either before
or after the critical section. I'll send a RFC patchset as soon as I have
the first stab at it completed.
Benny
On Thu, Oct 03, 2013 at 09:02:27AM +0300, Benny Halevy wrote:
> Just that this is dlm specific logic.
> For example, using dlm_ino_hash() in nfsd4_pnfs_dlm_layoutget().
> Or even knowing that
> layout->lg_stripe_type = STRIPE_SPARSE;
> assumes knowledge of the underlying cluster fs implementation.
Which in-tree or soon in-tree filesystem do you care about? And why
don't we see pnfs support for it submitted instead of the fairly useless
gfs2 support?
On Sun, Oct 13, 2013 at 09:23:51AM +0300, Benny Halevy wrote:
> On 2013-10-11 22:37, Christoph Hellwig wrote:
> > On Thu, Sep 26, 2013 at 02:41:16PM -0400, Benny Halevy wrote:
> >> idr_remove is about to be called before kmem_cache_free so unhashing it
> >> is redundant
> >>
> >> Signed-off-by: Benny Halevy <[email protected]>
> >
> > This is probably something thast should go straight to Bruce, same for
> > the next one in the series.
> >
>
> True.
> Bruce, are you ok with merging these patches for 3.13?
Could you just resend separately anything that can be justified as a
bugfix or cleanup independent of the rest of the pnfs stuff?
--b.
On 2013-09-29 15:16, Christoph Hellwig wrote:
> The actual XDR encoding doesn't have business being under fs/exportfs
> and should be in the NFSD code itself.
Hmm, I thought of it as in the family of fh encoding and decoding...
But I'm not married to this idea.
Benny
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
On Tue, Oct 01, 2013 at 06:37:39AM -0700, Christoph Hellwig wrote:
> On Tue, Oct 01, 2013 at 04:31:21PM +0300, Benny Halevy wrote:
> > So the reason not to hold it is that the nfs state lock is global to the
> > server and blocks all state modifying operations such as:
> > open, close, lock, clientid, session operations, etc.
>
> While not really related to this patch: what's the reason it's not
> split? The way nfsd works there should be almost no state that isn't
> per-export.
There's the NFSv4 client, but yes, the global state lock is an
embarassment....
--b.
On 2013-10-01 23:30, J. Bruce Fields wrote:
> On Sun, Sep 29, 2013 at 05:35:53AM -0700, Christoph Hellwig wrote:
>> On Sun, Sep 29, 2013 at 05:21:30AM -0700, Christoph Hellwig wrote:
>>>> Bruce - are you ok with moving the pnfs interface definitions to
>>>> include/linux/exportfs.h along with struct export_operations?
>>>>
>>>> In fact we can actually extend struct export_operations rather
>>>> than adding pnfs_export_operations...
>>>
>>> Yes, it probably should go into the export ops, although the actual
>>> method signatures might need to be made a litle less nfs-specific for
>>> that.
>>
>> I jsut took a brief look over the diff for the whole series in the git
>> tree and the old tree that still had block and exofs servers and have
>> revised my opinion a little bit:
>>
>>
>> - the should be a layout_type field in struct export_operations,
>> indicating that a filesystem support some sort of pnfs-like export.
>> - there should be a struct pnfs_operations, but it should be confined
>> to fs/nfsd: each layout can be a separate loadable module and gets
>> registered there. For the initial file layout that module is
>> self-contained, but for e.g. block or objects it would have
>> call into the filesystem through export_ops, although way lower level
>> than the NFS XDR level, e.g. for block there would be one of to get
>> the extent map, and one to allocate an extent.
>
> That sounds OK to me.
>
> My (possibly faulty) memory of how the xdr-encoding-in-the-fs came
> about: I think people weren't willing to commit to any reasonable upper
> limit on the size of the layout. So they originally considered
> something more like readdir that would loop over extents and pass them
> to a callback for encoding. But xdr isn't complicated so you could
> instead give the filesystem a simple library of xdr encoders to call.
>
> But it sounds like that's not exactly what got implemented. And maybe
> nobody actually has such big layouts.
>
>> This way we alsod avoid the dependcy on nfsd in the filesystems that the
>> cureent version introduces.
>
> Ugh--thanks for catching that.
I suggest moving the code in fs/nfsd/nfs4pnfsdlm.c to
fs/dlm.
Benny
>
> --b.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
On Wed, Oct 02, 2013 at 02:36:54PM +0300, Benny Halevy wrote:
> I suggest moving the code in fs/nfsd/nfs4pnfsdlm.c to
> fs/dlm.
I don't reall care where in the tree it lives, but it needs to be
neither part of nfsd or the filesystem (respectively dlm in this case).
On Mon, Sep 30, 2013 at 06:23:42PM +0300, Benny Halevy wrote:
> This makes sense for blocks for its use of the generic block allocation and mapping
> calls (and it needs a new call for committing uninitialized extents)
> But for objects there are no such calls and the integration with exofs
> is pretty intimate.
That's just because there is no proper split between exofs and the
pnfsd-objects layer. The split between the two doesn't seem too hard
and would dramatically improve the interface.
With that and moving the recall handling on truncate to generic code
where it belongs almost nothin pnfs-specific will be left in exofs.
On 2013-10-03 17:24, Ric Wheeler wrote:
> On 10/03/2013 10:21 AM, Christoph Hellwig wrote:
>> On Thu, Oct 03, 2013 at 05:19:31PM +0300, Benny Halevy wrote:
>>> The question is what is the minimum value for submitting upstream...
>>>
>>> The thing pnfs over dlm/gfs2 is missing mostly is supporting read/write layout.
>>> One could use them load balancing, e.g. by either redirecting to a node
>>> holding an exclusive lock on the file, if there is one, or dlm_ino_hash in its absence.
>> One could do a lot of things, especially given infinite time. But what
>> does the current code buy us? We need data on that to justify merging
>> such a large chunk of code.
>>
>
> I think that we can help figure out how to get something reasonably useful ready
> for upstream, just need to get some people spun up to help test out some of
> these ideas...
That would be very helpful. Much appreciated!
Benny
>
> ric
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
On Thu, Oct 03, 2013 at 08:58:08AM -0400, Matt W. Benjamin wrote:
> Hi,
>
> I think saying exofs is a proof of concept/toy is missing the point.
> Exofs is an implementation baseline that provides insight into the
> scalability/performance values that a pnfs implementation can achieve,
> and potentially how to achieve them.
Speaking like a true diplomat..
What amount of that data are we going to get by merging an exofs based
pnfs server that we haven't been able to gather with it out of tree for
the last 6 years? How is merging it and complicating the nfs servers
for it going to provide a benefit outside of the small group of about a
dozend people that actively care about the T10 OSD support in Linux?
On 10/03/2013 09:17 AM, Christoph Hellwig wrote:
> On Thu, Oct 03, 2013 at 09:12:24AM -0400, Ric Wheeler wrote:
>>>>> Which in-tree or soon in-tree filesystem do you care about? And why
>>>>> don't we see pnfs support for it submitted instead of the fairly useless
>>>>> gfs2 support?
>>> I picked gfs2 as the initial use case for simplicity and ease of review.
>>> If there is a rough consensus that it's useless and not worthy of inclusion
>>> then the one we care about the most is exofs that has a more complete pnfs
>>> implementation.
>>>
>>> Benny
>>>
>> I don't see having GFS2 supported as a base for pNFS as useless.
>> Christoph, is this a concern about GFS2 being too complicated for
>> normal deployment or a lack in the pNFS support on top of it?
> Fairly useless was specific to the particular implementation:
>
> - which in the stipped down version here only supports DS access for
> reads
> - which in the previous version showed worse performance than always
> going through the MDS
>
> I don't have a problem with using GFS2 by itself, but any implementation
> proposed should actually show signifiant real life benefits before it
> gets merged.
>
Makes sense, thanks!
Ric
On 2013-10-01 16:33, Christoph Hellwig wrote:
> On Mon, Sep 30, 2013 at 06:05:18PM -0700, Boaz Harrosh wrote:
>> The pnfs protocol and people have plans to, allow a multi typed
>> layouts from the same super-block. It is a per file attribute.
>> It even allows a multi protocol access to the same file.
>> The only flag should be the presence of the layout_get vector
>> that should indicate support or lack of it.
>
> The current method doesn't help with that as it can return a single type
> only anyway. So in principle I agree with you, but the way to fix it is
> not to keep the method, but to make sure it returns a bitmap of
> supported layouts.
>
>>> - there should be a struct pnfs_operations, but it should be confined
>>> to fs/nfsd: each layout can be a separate loadable module and gets
>>> registered there. For the initial file layout that module is
>>> self-contained, but for e.g. block or objects it would have
>>> call into the filesystem through export_ops, although way lower level
>>> than the NFS XDR level, e.g. for block there would be one of to get
>>> the extent map, and one to allocate an extent.
>>>
>>
>> No! This does not make any sense. What you say does not fit any model of any
>> cluster filesystem today.
>>
>> - Again the FS can support any protocol.
>> - Only the FS understand the structure and layout of the file access. Any
>> other model is a specific implementation and breaks abstraction. The only true
>> abstraction is the LO_GET LO_RETURN LO_COMMIT DEVICE_INFO and LO_CB_RECALL. anything
>> else is making assumptions.
>>
>> There is a pnfs vector and it is at this abstraction level exactly.
>
> No, the problem is that the pnfs_export_operations are entirely at the
> wrong level, as I tried to explain. The right level is very different
> for the different layouts:
>
> - for files it needs to boil down to a:
>
> - get a list of devices
> - given an inode/offset return the layout
>
For loosely clustered files-layout file systems the MDS would also
need to handle LAYOUTCOMMIT, and also generate and handle layout recalls,
depending on how much of that we can do in a generic way and how much is
file system specific.
> - for block it's get a block map for a file / create an unwritten
> extent / convert it to written
Besides converting to written there is also de-allocation of provisionally
allocated extents and more advanced stuff for client assisted copy-on-write
where after allocation of uninitialized extents the client can commit
extents to the block map by replacing existing initialized extents with
new ones.
>
> - for object it seems (not too familar):
>
> - get a list of devices for this fs
> - given an inode/offset return the layout
> - tell the fs that I/O has finished
>
> As all the layouts operate on different data structures it makes sense
> to make the methods operate on those, and keep the boilerplate code
> including the XDR encoding/decoding in one single place.
>
> Now how these pnfsd object layout drivers communicate with the fs I
> don't have an opinion on until we see the actual code, maybe we need a
> pnfs_<layout>_ops if it's complicated enough.
The original design this patchset builds on called for implementing the
layout type specifics as library code and let file systems supporting pnfs
implement the high level pnfs methods and use the library for encoding
and decoding layout-type specific XDR and use other helpers if needed.
You're calling for a "back-end layout driver" kind of layer that will serve
the layout type and will call into the file system using lower level methods.
> For the files case that
> can just call into dlm directly currently as we have no other
> interesting cluster fs in tree it's a mood point. For block it's simple
> enough that I'd just add it to export_ops, if not by that time we redo
> the current get_blocks mess in a way that we can simply piggyback it on
> that which would be even easier.
>
On the same lines, I think that for now calling directly into exofs makes
the most sense as we have no other object based pnfs implementation.
>>> This way we alsod avoid the dependcy on nfsd in the filesystems that the
>>> cureent version introduces.
Yeah, I agree that this dependency can and should be taken care of.
>>
>> There is no "dependency on nfsd in the filesystems"
>
> The patchset as pulled will created a depency on nfsd.ko from gfs2.ko
>
yup, fs/nfsd/nfs4pnfsdlm.c exports pnfs_dlm_export_ops and implements
the respective methods. This does not really belong to nfsd but rather
to the fs/dlm module.
Benny
>> The only dependency the FS has is an import of some library routines
>> at exportfs that take an abstract layout and device descriptions and encode
>> them into an XDR buffer. But the FS knows nothing of the XDR and the
>> NFSD is free to unload at any moment without forcing the FS to unload
>> first or at all.
>> This is actually tested, in fact I do this all the time when I want to
>> start fresh and have NFSD close all resources on the FS.
>
> That's not what happens with the file layout as posted, and it's not
> something I want to see happen every, btw. In Linux we're all about
> proper abstractions, and letting a fs control all pnfs aspects directly
> instead of having common code is a receipe for tons of copy & pasted
> code full of different bugs if we ever get additional implementations of
> a layout. Not that I really expect any in tree as all the other
> "interested partied" have shown to be leechers that just want to keep
> their filesystems out of tree and most of the time as illegal propritary
> modules anyway.
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
On Thu, Sep 26, 2013 at 02:41:16PM -0400, Benny Halevy wrote:
> idr_remove is about to be called before kmem_cache_free so unhashing it
> is redundant
>
> Signed-off-by: Benny Halevy <[email protected]>
This is probably something thast should go straight to Bruce, same for
the next one in the series.
On Thu, Oct 03, 2013 at 05:19:31PM +0300, Benny Halevy wrote:
> The question is what is the minimum value for submitting upstream...
>
> The thing pnfs over dlm/gfs2 is missing mostly is supporting read/write layout.
> One could use them load balancing, e.g. by either redirecting to a node
> holding an exclusive lock on the file, if there is one, or dlm_ino_hash in its absence.
One could do a lot of things, especially given infinite time. But what
does the current code buy us? We need data on that to justify merging
such a large chunk of code.
On 2013-09-29 15:19, Christoph Hellwig wrote:
> Should go straight to Al independent of this series.
OK.
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
On 2013-10-02 19:07, Christoph Hellwig wrote:
> On Wed, Oct 02, 2013 at 02:36:54PM +0300, Benny Halevy wrote:
>> I suggest moving the code in fs/nfsd/nfs4pnfsdlm.c to
>> fs/dlm.
>
> I don't reall care where in the tree it lives, but it needs to be
> neither part of nfsd or the filesystem (respectively dlm in this case).
Just that this is dlm specific logic.
For example, using dlm_ino_hash() in nfsd4_pnfs_dlm_layoutget().
Or even knowing that
layout->lg_stripe_type = STRIPE_SPARSE;
assumes knowledge of the underlying cluster fs implementation.
Benny
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
On Mon, Sep 30, 2013 at 06:15:17PM -0700, Boaz Harrosh wrote:
> Though I would like to keep it here, because I have a patchset
> which implements pNFS without NFSD at all. It enables a set
> of IOCTLs or syscalls and uses the same exact FS interface
> introduced here but sends the info to a user mode server.
If you want your userlevel fs use ganesha. The kernel is a
self-contained project and not a random library.
On Thu, Oct 03, 2013 at 09:12:24AM -0400, Ric Wheeler wrote:
> >>>Which in-tree or soon in-tree filesystem do you care about? And why
> >>>don't we see pnfs support for it submitted instead of the fairly useless
> >>>gfs2 support?
> >I picked gfs2 as the initial use case for simplicity and ease of review.
> >If there is a rough consensus that it's useless and not worthy of inclusion
> >then the one we care about the most is exofs that has a more complete pnfs
> >implementation.
> >
> >Benny
> >
>
> I don't see having GFS2 supported as a base for pNFS as useless.
> Christoph, is this a concern about GFS2 being too complicated for
> normal deployment or a lack in the pNFS support on top of it?
Fairly useless was specific to the particular implementation:
- which in the stipped down version here only supports DS access for
reads
- which in the previous version showed worse performance than always
going through the MDS
I don't have a problem with using GFS2 by itself, but any implementation
proposed should actually show signifiant real life benefits before it
gets merged.