2011-05-09 19:36:13

by Chuck Lever

[permalink] [raw]
Subject: [PATCH 00/16] Client-side migration support for 2.6.40 [take 3]

Hi-

Here is the latest pass at client-side support for NFS4.0 migration.
Some 4.1 support is thrown in, but is not tested (no extant 4.1
servers in the wild support migration). Patches are against
2.6.39-rc6.

This series adds support for both NFS4ERR_MOVED and
NFS4ERR_LEASE_MOVED. It can also re-establish a callback channel
with the destination server, post-migration. I've attempted to
address all comments and re-organization requests from pub night.

This series is what I am testing this week. I'm sure there are still
some bugs, but let's get the review process rolling.

---

Chuck Lever (12):
NFS: Implement support for NFS4ERR_LEASE_MOVED
NFS: Add migration recovery callouts in nfs4proc.c
NFS: Remove "const" from "struct nfs_server *" fields
NFS: Add basic migration support to state manager thread
NFS: Add functions to swap transports during migration recovery
NFS: Add an API for cloning an nfs_client
NFS: Add infrastructure for updating callback data
NFS: Introduce nfs4_proc_get_mig_status()
NFS: Introduce NFS_ATTR_FATTR_V4_LOCATIONS
NFS: Save root file handle in nfs_server
NFS: Add a client-side function to display file handles
SUNRPC: Add API to acquire source address

Trond Myklebust (4):
SUNRPC: Add a helper to switch the transport of the rpc_client
SUNRPC: Move clnt->cl_server into struct rpc_xprt
SUNRPC: Use RCU to dereference the rpc_clnt.cl_xprt field
SUNRPC: Allow temporary blocking of an rpc client


fs/nfs/callback.c | 3
fs/nfs/callback_proc.c | 9 -
fs/nfs/client.c | 88 +++++++-
fs/nfs/getroot.c | 5
fs/nfs/inode.c | 45 ++++
fs/nfs/internal.h | 6 +
fs/nfs/nfs4_fs.h | 8 +
fs/nfs/nfs4namespace.c | 202 ++++++++++++++++++
fs/nfs/nfs4proc.c | 218 +++++++++++++++++---
fs/nfs/nfs4state.c | 227 ++++++++++++++++++++-
fs/nfs/nfs4xdr.c | 50 +++--
fs/nfs/super.c | 5
include/linux/nfs_fs.h | 14 +
include/linux/nfs_fs_sb.h | 7 +
include/linux/nfs_xdr.h | 42 ++--
include/linux/sunrpc/clnt.h | 22 ++
include/linux/sunrpc/debug.h | 11 +
include/linux/sunrpc/xprt.h | 2
net/sunrpc/clnt.c | 461 +++++++++++++++++++++++++++++++++++++-----
net/sunrpc/rpc_pipe.c | 5
net/sunrpc/rpcb_clnt.c | 24 +-
net/sunrpc/stats.c | 6 -
net/sunrpc/xprt.c | 14 +
23 files changed, 1320 insertions(+), 154 deletions(-)

--
Chuck Lever


2011-05-12 17:30:19

by Chuck Lever

[permalink] [raw]
Subject: Re: [PATCH 11/16] NFS: Add an API for cloning an nfs_client


On May 9, 2011, at 3:38 PM, Chuck Lever wrote:

> After a migration event, we have to preserve the client ID the client
> used with the source server, and introduce it to the destination
> server, in case the migration transparently migrated state for the
> migrating FSID.
>
> Note that our RENEW and SETCLIENTID procs both take an nfs_client as
> an argument. Thus, after a successful migration recovery, we want to
> have a nfs_client with the correct long-form and short-form client ID
> for the destination server to pass these procs.
>
> To preserve the client IDs, we clone the source server's nfs_client.
> The migrated FSID is moved from the original nfs_client to the cloned
> one.
>
> This patch introduces an API for cloning an nfs_client and moving an
> FSID to it.
>
> Signed-off-by: Chuck Lever <[email protected]>
> ---
>
> fs/nfs/client.c | 69 +++++++++++++++++++++++++++++++++++++++++++++++++++--
> fs/nfs/internal.h | 4 +++
> 2 files changed, 71 insertions(+), 2 deletions(-)
>
> diff --git a/fs/nfs/client.c b/fs/nfs/client.c
> index 536b0ba..2f5e29f 100644
> --- a/fs/nfs/client.c
> +++ b/fs/nfs/client.c
> @@ -135,6 +135,7 @@ struct nfs_client_initdata {
> const struct nfs_rpc_ops *rpc_ops;
> int proto;
> u32 minorversion;
> + const char *long_clientid;
> };
>
> /*
> @@ -184,6 +185,9 @@ static struct nfs_client *nfs_alloc_client(const struct nfs_client_initdata *cl_
> clp->cl_state = 1 << NFS4CLNT_LEASE_EXPIRED;
> clp->cl_minorversion = cl_init->minorversion;
> clp->cl_mvops = nfs_v4_minor_ops[cl_init->minorversion];
> + if (cl_init->long_clientid != NULL)
> + clp->cl_cached_clientid = kstrdup(cl_init->long_clientid,
> + GFP_KERNEL);
> #endif
> cred = rpc_lookup_machine_cred();
> if (!IS_ERR(cred))
> @@ -476,6 +480,10 @@ static struct nfs_client *nfs_match_client(const struct nfs_client_initdata *dat
> /* Match the full socket address */
> if (!nfs_sockaddr_cmp(sap, clap))
> continue;
> + /* Match on long-form client ID */
> + if (data->long_clientid && clp->cl_cached_clientid &&
> + strcmp(data->long_clientid, clp->cl_cached_clientid))
> + continue;
>
> atomic_inc(&clp->cl_count);
> return clp;
> @@ -1426,8 +1434,65 @@ error:
> return error;
> }
>
> -/*
> - * Set up a pNFS Data Server client.
> +/**
> + * nfs4_clone_client - Clone a client after a migration event
> + * clp: nfs_client to clone
> + * sap: address of destination server
> + * salen: size of "sap" in bytes
> + * ip_addr: NUL-terminated string containing local presentation address
> + * server: nfs_server to move from "clp" to the new one
> + *
> + * Returns negative errno or zero. nfs_client field of "server" is
> + * updated to refer to a new or existing nfs_client that matches
> + * [server address, port, version, minorversion, client ID]. The
> + * nfs_server is moved from the old nfs_client's cl_superblocks list
> + * to the new nfs_client's list.
> + */
> +int nfs4_clone_client(struct nfs_client *clp, const struct sockaddr *sap,
> + size_t salen, const char *ip_addr,
> + struct nfs_server *server)
> +{
> + struct rpc_clnt *rpcclnt = clp->cl_rpcclient;
> + struct nfs_client_initdata cl_init = {
> + .addr = sap,
> + .addrlen = salen,
> + .rpc_ops = &nfs_v4_clientops,
> + .proto = rpc_protocol(rpcclnt),
> + .minorversion = clp->cl_minorversion,
> + .long_clientid = clp->cl_cached_clientid,
> + };
> + struct nfs_client *new;
> + int status = 0;
> +
> + dprintk("--> %s cloning \"%s\" (client ID %llx)\n",
> + __func__, clp->cl_hostname, clp->cl_clientid);
> +
> + new = nfs_get_client(&cl_init, rpcclnt->cl_timeout, ip_addr,
> + rpcclnt->cl_auth->au_flavor, 0);
> + if (IS_ERR(new)) {
> + dprintk("<-- %s nfs_get_client failed\n", __func__);
> + status = PTR_ERR(new);
> + goto out;
> + }
> +
> + nfs_server_remove_lists(server);
> + server->nfs_client = new;
> + nfs_server_insert_lists(server);
> +
> + dprintk("<-- %s moved (%llx:%llx) to nfs_client %p\n", __func__,
> + (unsigned long long)server->fsid.major,
> + (unsigned long long)server->fsid.minor, new);

We may be in trouble here.

Solaris servers use the cb_ident field to recognize a callback update rather than a full SETCLIENTID. This is because a migrate-reboot-migrate sequence can leave a destination server with a group of short form client IDs associated with the same long-form client ID.

Cloning an nfs_client creates a new nfs_client in many cases, which bumps cb_ident. On Linux, a callback with the original cb_ident would get us the old nfs_client anyway (via idr_find()).

They are proposing that we use the callback RPC program number instead to find the right state information.

> +
> +out:
> + return status;
> +}
> +
> +/**
> + * nfs4_set_ds_client - Set up a pNFS Data Server client
> + * mds_clp: nfs_client representing the MDS
> + * ds_addr: IP address of DS
> + * ds_addrlen: size of "ds_addr" in bytes
> + * ds_proto: transport protocol to use to contact DS
> *
> * Return any existing nfs_client that matches server address,port,version
> * and minorversion.
> diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
> index f6baf5b..0bf4e67 100644
> --- a/fs/nfs/internal.h
> +++ b/fs/nfs/internal.h
> @@ -154,6 +154,10 @@ extern struct nfs_server *nfs_clone_server(struct nfs_server *,
> struct nfs_fattr *);
> extern void nfs_mark_client_ready(struct nfs_client *clp, int state);
> extern int nfs4_check_client_ready(struct nfs_client *clp);
> +extern int nfs4_clone_client(struct nfs_client *clp,
> + const struct sockaddr *sap, size_t salen,
> + const char *ip_addr,
> + struct nfs_server *server);
> extern struct nfs_client *nfs4_set_ds_client(struct nfs_client* mds_clp,
> const struct sockaddr *ds_addr,
> int ds_addrlen, int ds_proto);
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com





2011-05-09 19:36:22

by Chuck Lever

[permalink] [raw]
Subject: [PATCH 01/16] SUNRPC: Allow temporary blocking of an rpc client

From: Trond Myklebust <[email protected]>

Add a mechanism to allow us to temporarily block an rpc client while
we do surgery on its transport and authentication code.

The new function rpc_lock_client() will block all new rpc calls from
starting, and then wait for existing rpc calls to complete. If the
wait times out before the rpc calls have completed, then the function
returns the number of outstanding active tasks, otherwise it returns 0.

In the event of a non-zero return value, it is up to the caller either
to cancel the lock (by calling rpc_unlock_client), or to take the
appropriate action to ensure the existing rpc calls complete (e.g.
by calling rpc_killall_tasks()).

Signed-off-by: Trond Myklebust <[email protected]>
---

include/linux/sunrpc/clnt.h | 11 +++++++
net/sunrpc/clnt.c | 72 +++++++++++++++++++++++++++++++++++++++++++
2 files changed, 83 insertions(+), 0 deletions(-)

diff --git a/include/linux/sunrpc/clnt.h b/include/linux/sunrpc/clnt.h
index db7bcaf..1cab257 100644
--- a/include/linux/sunrpc/clnt.h
+++ b/include/linux/sunrpc/clnt.h
@@ -23,6 +23,7 @@
#include <asm/signal.h>
#include <linux/path.h>
#include <net/ipv6.h>
+#include <linux/completion.h>

struct rpc_inode;

@@ -31,6 +32,7 @@ struct rpc_inode;
*/
struct rpc_clnt {
atomic_t cl_count; /* Number of references */
+ atomic_t cl_active_tasks;/* Number of active tasks */
struct list_head cl_clients; /* Global list of clients */
struct list_head cl_tasks; /* List of tasks */
spinlock_t cl_lock; /* spinlock */
@@ -46,6 +48,10 @@ struct rpc_clnt {
struct rpc_stat * cl_stats; /* per-program statistics */
struct rpc_iostats * cl_metrics; /* per-client statistics */

+ unsigned long cl_flags; /* Bit flags */
+ struct rpc_wait_queue cl_waitqueue;
+ struct completion cl_completion;
+
unsigned int cl_softrtry : 1,/* soft timeouts */
cl_discrtry : 1,/* disconnect before retry */
cl_autobind : 1,/* use getport() */
@@ -65,6 +71,8 @@ struct rpc_clnt {
char *cl_principal; /* target to authenticate to */
};

+#define RPC_CLIENT_LOCKED 0
+
/*
* General RPC program info
*/
@@ -135,6 +143,9 @@ void rpc_shutdown_client(struct rpc_clnt *);
void rpc_release_client(struct rpc_clnt *);
void rpc_task_release_client(struct rpc_task *);

+int rpc_lock_client(struct rpc_clnt *clnt, unsigned long timeout);
+void rpc_unlock_client(struct rpc_clnt *clnt);
+
int rpcb_register(u32, u32, int, unsigned short);
int rpcb_v4_register(const u32 program, const u32 version,
const struct sockaddr *address,
diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
index b84d739..3d6b1a9 100644
--- a/net/sunrpc/clnt.c
+++ b/net/sunrpc/clnt.c
@@ -226,6 +226,8 @@ static struct rpc_clnt * rpc_new_client(const struct rpc_create_args *args, stru

atomic_set(&clnt->cl_count, 1);

+ rpc_init_wait_queue(&clnt->cl_waitqueue, "client waitqueue");
+
err = rpc_setup_pipedir(clnt, program->pipe_dir_name);
if (err < 0)
goto out_no_path;
@@ -395,6 +397,8 @@ rpc_clone_client(struct rpc_clnt *clnt)
goto out_no_principal;
}
atomic_set(&new->cl_count, 1);
+ atomic_set(&new->cl_active_tasks, 0);
+ rpc_init_wait_queue(&new->cl_waitqueue, "client waitqueue");
err = rpc_setup_pipedir(new, clnt->cl_program->pipe_dir_name);
if (err != 0)
goto out_no_path;
@@ -571,11 +575,76 @@ out:
}
EXPORT_SYMBOL_GPL(rpc_bind_new_program);

+/**
+ * rpc_lock_client - lock the RPC client
+ * @clnt: pointer to a struct rpc_clnt
+ * @timeout: timeout parameter to pass to wait_for_completion_timeout()
+ *
+ * This function sets the RPC_CLIENT_LOCKED flag, which causes
+ * all new rpc_tasks to wait instead of executing. It then waits for
+ * any existing active tasks to complete.
+ */
+int rpc_lock_client(struct rpc_clnt *clnt, unsigned long timeout)
+{
+ if (!test_and_set_bit(RPC_CLIENT_LOCKED, &clnt->cl_flags))
+ init_completion(&clnt->cl_completion);
+
+ if (atomic_read(&clnt->cl_active_tasks) &&
+ !wait_for_completion_timeout(&clnt->cl_completion, timeout))
+ return -ETIMEDOUT;
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(rpc_lock_client);
+
+/**
+ * rpc_unlock_client
+ * @clnt: pointer to a struct rpc_clnt
+ *
+ * Clears the RPC_CLIENT_LOCKED flag, and starts any rpc_tasks that
+ * were waiting on it.
+ */
+void rpc_unlock_client(struct rpc_clnt *clnt)
+{
+ spin_lock(&clnt->cl_lock);
+ clear_bit(RPC_CLIENT_LOCKED, &clnt->cl_flags);
+ spin_unlock(&clnt->cl_lock);
+ rpc_wake_up(&clnt->cl_waitqueue);
+}
+EXPORT_SYMBOL_GPL(rpc_unlock_client);
+
+static void rpc_task_clear_active(struct rpc_task *task)
+{
+ struct rpc_clnt *clnt = task->tk_client;
+
+ if (atomic_dec_and_test(&clnt->cl_active_tasks) &&
+ test_bit(RPC_CLIENT_LOCKED, &clnt->cl_flags))
+ complete(&clnt->cl_completion);
+}
+
+static void rpc_task_set_active(struct rpc_task *task)
+{
+ struct rpc_clnt *clnt = task->tk_client;
+
+ atomic_inc(&clnt->cl_active_tasks);
+ if (unlikely(test_bit(RPC_CLIENT_LOCKED, &clnt->cl_flags))) {
+ spin_lock(&clnt->cl_lock);
+ if (test_bit(RPC_CLIENT_LOCKED, &clnt->cl_flags) &&
+ !RPC_ASSASSINATED(task)) {
+ rpc_sleep_on(&clnt->cl_waitqueue, task,
+ rpc_task_set_active);
+ rpc_task_clear_active(task);
+ }
+ spin_unlock(&clnt->cl_lock);
+ }
+}
+
void rpc_task_release_client(struct rpc_task *task)
{
struct rpc_clnt *clnt = task->tk_client;

if (clnt != NULL) {
+ rpc_task_clear_active(task);
/* Remove from client task list */
spin_lock(&clnt->cl_lock);
list_del(&task->tk_task);
@@ -599,6 +668,9 @@ void rpc_task_set_client(struct rpc_task *task, struct rpc_clnt *clnt)
spin_lock(&clnt->cl_lock);
list_add_tail(&task->tk_task, &clnt->cl_tasks);
spin_unlock(&clnt->cl_lock);
+
+ /* Notify the client when this task is activated */
+ task->tk_callback = rpc_task_set_active;
}
}



2011-05-09 19:38:26

by Chuck Lever

[permalink] [raw]
Subject: [PATCH 13/16] NFS: Add basic migration support to state manager thread

Migration recovery will be handled separately from the normal
synchronous and asynchronous NFS processes, much like the existing
state manager thread. In fact state and migration recovery will
have to be serialized.

Therefore add migration recovery support to the existing state manager
infrastructure, reusing its rendevous mechanism and finite state
machine.

Additional debugging is added so that, while we continue to shape our
migration recovery implementation, the operation of the state manager
is visible. If the extra clutter is objectionable, it can be removed
once we are confident of the migration recovery implementation.

Signed-off-by: Chuck Lever <[email protected]>
---

fs/nfs/nfs4_fs.h | 3 +
fs/nfs/nfs4state.c | 149 ++++++++++++++++++++++++++++++++++++++++++++-
include/linux/nfs_fs_sb.h | 3 +
3 files changed, 153 insertions(+), 2 deletions(-)

diff --git a/fs/nfs/nfs4_fs.h b/fs/nfs/nfs4_fs.h
index 1832fd6..c3e8641 100644
--- a/fs/nfs/nfs4_fs.h
+++ b/fs/nfs/nfs4_fs.h
@@ -49,6 +49,8 @@ enum nfs4_client_state {
NFS4CLNT_RECALL_SLOT,
NFS4CLNT_LEASE_CONFIRM,
NFS4CLNT_UPDATE_CALLBACK,
+ NFS4CLNT_CLONED_CLIENT,
+ NFS4CLNT_MOVED,
};

enum nfs4_session_state {
@@ -348,6 +350,7 @@ extern void nfs4_close_state(struct path *, struct nfs4_state *, fmode_t);
extern void nfs4_close_sync(struct path *, struct nfs4_state *, fmode_t);
extern void nfs4_state_set_mode_locked(struct nfs4_state *, fmode_t);
extern void nfs4_schedule_lease_recovery(struct nfs_client *);
+extern void nfs4_schedule_migration_recovery(struct nfs_server *);
extern void nfs4_schedule_state_manager(struct nfs_client *);
extern void nfs4_schedule_stateid_recovery(const struct nfs_server *, struct nfs4_state *);
extern void nfs41_handle_sequence_flag_errors(struct nfs_client *clp, u32 flags);
diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
index 3285e40..c7b414a 100644
--- a/fs/nfs/nfs4state.c
+++ b/fs/nfs/nfs4state.c
@@ -56,6 +56,8 @@
#include "internal.h"
#include "pnfs.h"

+#define NFSDBG_FACILITY NFSDBG_CLIENT
+
#define OPENOWNER_POOL_SIZE 8

const nfs4_stateid zero_stateid;
@@ -1041,9 +1043,32 @@ void nfs4_schedule_lease_recovery(struct nfs_client *clp)
{
if (!clp)
return;
+ dprintk("--> %s: \"%s\" (client ID %llx)\n",
+ __func__, clp->cl_hostname, clp->cl_clientid);
if (!test_bit(NFS4CLNT_LEASE_EXPIRED, &clp->cl_state))
set_bit(NFS4CLNT_CHECK_LEASE, &clp->cl_state);
nfs4_schedule_state_manager(clp);
+ dprintk("<-- %s\n", __func__);
+}
+
+/**
+ * nfs4_schedule_migration_recovery - start background migration recovery
+ *
+ * @server: nfs_server representing remote file system that is migrating
+ *
+ */
+void nfs4_schedule_migration_recovery(struct nfs_server *server)
+{
+ struct nfs_client *clp = server->nfs_client;
+
+ dprintk("--> %s(%llx:%llx)\n", __func__,
+ (unsigned long long)server->fsid.major,
+ (unsigned long long)server->fsid.minor);
+ if (test_and_set_bit(NFS4CLNT_MOVED, &clp->cl_state) == 0) {
+ clp->cl_moved_server = server;
+ nfs4_schedule_state_manager(clp);
+ }
+ dprintk("<-- %s\n", __func__);
}

static int nfs4_state_mark_reclaim_reboot(struct nfs_client *clp, struct nfs4_state *state)
@@ -1393,6 +1418,9 @@ static int nfs4_do_reclaim(struct nfs_client *clp, const struct nfs4_state_recov
struct rb_node *pos;
int status = 0;

+ dprintk("--> %s: \"%s\" (client ID %llx)\n",
+ __func__, clp->cl_hostname, clp->cl_clientid);
+
restart:
rcu_read_lock();
list_for_each_entry_rcu(server, &clp->cl_superblocks, client_link) {
@@ -1422,6 +1450,7 @@ restart:
spin_unlock(&clp->cl_lock);
}
rcu_read_unlock();
+ dprintk("<-- %s: %d\n", __func__, status);
return status;
}

@@ -1432,6 +1461,9 @@ static int nfs4_check_lease(struct nfs_client *clp)
clp->cl_mvops->state_renewal_ops;
int status = -NFS4ERR_EXPIRED;

+ dprintk("--> %s: \"%s\" (client ID %llx)\n",
+ __func__, clp->cl_hostname, clp->cl_clientid);
+
/* Is the client already known to have an expired lease? */
if (test_bit(NFS4CLNT_LEASE_EXPIRED, &clp->cl_state))
return 0;
@@ -1446,7 +1478,9 @@ static int nfs4_check_lease(struct nfs_client *clp)
status = ops->renew_lease(clp, cred);
put_rpccred(cred);
out:
- return nfs4_recovery_handle_error(clp, status);
+ status = nfs4_recovery_handle_error(clp, status);
+ dprintk("<-- %s: %d\n", __func__, status);
+ return status;
}

static int nfs4_reclaim_lease(struct nfs_client *clp)
@@ -1456,6 +1490,9 @@ static int nfs4_reclaim_lease(struct nfs_client *clp)
clp->cl_mvops->reboot_recovery_ops;
int status = -ENOENT;

+ dprintk("--> %s: \"%s\" (client ID %llx)\n",
+ __func__, clp->cl_hostname, clp->cl_clientid);
+
cred = ops->get_clid_cred(clp);
if (cred != NULL) {
status = ops->establish_clid(clp, cred);
@@ -1468,9 +1505,98 @@ static int nfs4_reclaim_lease(struct nfs_client *clp)
if (status == -NFS4ERR_MINOR_VERS_MISMATCH)
status = -EPROTONOSUPPORT;
}
+ dprintk("<-- %s: %d\n", __func__, status);
+ return status;
+}
+
+/*
+ * If cloning got us a shiny new nfs_client, a RENEW/SETCLIENTID sequence
+ * is needed. Kick off a state manager thread for the new nfs_client to
+ * handle this, and wait for it to finish.
+ */
+static int nfs4_init_cloned_client(struct nfs_client *clp)
+{
+ int status = 0;
+
+ dprintk("--> %s: \"%s\" (client ID %llx)\n",
+ __func__, clp->cl_hostname, clp->cl_clientid);
+
+ if (test_and_set_bit(NFS4CLNT_CLONED_CLIENT, &clp->cl_state) == 0) {
+ clear_bit(NFS4CLNT_LEASE_EXPIRED, &clp->cl_state);
+ nfs4_schedule_state_manager(clp);
+ status = wait_on_bit(&clp->cl_state, NFS4CLNT_MANAGER_RUNNING,
+ nfs_wait_bit_killable, TASK_KILLABLE);
+ }
+
+ dprintk("<-- %s: %d\n", __func__, status);
return status;
}

+/*
+ * Try remote migration of one FSID from a source server to a
+ * destination server. The source server provides a list of
+ * potential destinations.
+ */
+static void nfs4_try_migration(struct nfs_server *server)
+{
+ struct nfs_client *clp = server->nfs_client;
+ struct nfs4_fs_locations *locations = NULL;
+ struct page *page;
+ int status;
+
+ dprintk("--> %s: FSID %llx:%llx on \"%s\"\n", __func__,
+ (unsigned long long)server->fsid.major,
+ (unsigned long long)server->fsid.minor,
+ clp->cl_hostname);
+
+ status = -ENOMEM;
+ page = alloc_page(GFP_KERNEL);
+ locations = kmalloc(sizeof(struct nfs4_fs_locations), GFP_KERNEL);
+ if (page == NULL || locations == NULL) {
+ dprintk("<-- %s: no memory\n", __func__);
+ goto out_err;
+ }
+
+ status = nfs4_proc_get_mig_status(server, locations, page);
+ if (status != 0) {
+ dprintk("<-- %s: get migration status: %d\n",
+ __func__, status);
+ goto out_err;
+ }
+ if (!(locations->fattr.valid & NFS_ATTR_FATTR_V4_LOCATIONS)) {
+ dprintk("<-- %s: No fs_locations data available, "
+ "migration skipped\n", __func__);
+ goto out_err;
+ }
+
+ /* NB: if successful, nfs4_replace_transport() replaces
+ * server->nfs_client with the cloned nfs_client */
+ status = nfs4_replace_transport(server, locations);
+ if (status != 0) {
+ dprintk("<-- %s: failed to replace transport: %d\n",
+ __func__, status);
+ goto out_err;
+ }
+
+ if (server->nfs_client->cl_clientid == 0) {
+ server->nfs_client->cl_clientid = clp->cl_clientid;
+
+ status = nfs4_init_cloned_client(server->nfs_client);
+ if (status != 0) {
+ dprintk("<-- %s: failed to init nfs_client: %d\n",
+ __func__, status);
+ goto out_err;
+ }
+ }
+
+ dprintk("<-- %s: migration succeeded\n", __func__);
+
+out_err:
+ if (page != NULL)
+ __free_page(page);
+ kfree(locations);
+}
+
#ifdef CONFIG_NFS_V4_1
void nfs4_schedule_session_recovery(struct nfs4_session *session)
{
@@ -1631,8 +1757,22 @@ static void nfs4_state_manager(struct nfs_client *clp)
{
int status = 0;

+ dprintk("--> %s: \"%s\" (client ID %llx) state: %08lx\n",
+ __func__, clp->cl_hostname, clp->cl_clientid, clp->cl_state);
+
/* Ensure exclusive access to NFSv4 state */
do {
+ if (test_and_clear_bit(NFS4CLNT_CLONED_CLIENT,
+ &clp->cl_state)) {
+ /* If the server still recognizes the short-form
+ * client ID, ensure that the next SETCLIENTID doesn't
+ * cause the server to drop all that state */
+ if (nfs4_check_lease(clp) == 0)
+ set_bit(NFS4CLNT_UPDATE_CALLBACK,
+ &clp->cl_state);
+ set_bit(NFS4CLNT_LEASE_EXPIRED, &clp->cl_state);
+ }
+
if (test_and_clear_bit(NFS4CLNT_LEASE_EXPIRED, &clp->cl_state)) {
/* We're going to have to re-establish a clientid */
status = nfs4_reclaim_lease(clp);
@@ -1670,6 +1810,11 @@ static void nfs4_state_manager(struct nfs_client *clp)
goto out_error;
}

+ if (test_and_clear_bit(NFS4CLNT_MOVED, &clp->cl_state)) {
+ nfs4_try_migration(clp->cl_moved_server);
+ continue;
+ }
+
/* First recover reboot state... */
if (test_bit(NFS4CLNT_RECLAIM_REBOOT, &clp->cl_state)) {
status = nfs4_do_reclaim(clp,
@@ -1710,7 +1855,6 @@ static void nfs4_state_manager(struct nfs_client *clp)
continue;
}

-
nfs4_clear_state_manager_bit(clp);
/* Did we race with an attempt to give us more work? */
if (clp->cl_state == 0)
@@ -1718,6 +1862,7 @@ static void nfs4_state_manager(struct nfs_client *clp)
if (test_and_set_bit(NFS4CLNT_MANAGER_RUNNING, &clp->cl_state) != 0)
break;
} while (atomic_read(&clp->cl_count) > 1);
+ dprintk("<-- %s\n", __func__);
return;
out_error:
printk(KERN_WARNING "Error: state manager failed on NFSv4 server %s"
diff --git a/include/linux/nfs_fs_sb.h b/include/linux/nfs_fs_sb.h
index d0554c4..091abf0 100644
--- a/include/linux/nfs_fs_sb.h
+++ b/include/linux/nfs_fs_sb.h
@@ -58,6 +58,9 @@ struct nfs_client {

struct rpc_wait_queue cl_rpcwaitq;

+ /* accessed only when NFS4CLNT_MOVED bit is set */
+ struct nfs_server * cl_moved_server;
+
/* used for the setclientid verifier */
struct timespec cl_boot_time;



2011-05-09 19:36:53

by Chuck Lever

[permalink] [raw]
Subject: [PATCH 04/16] SUNRPC: Add a helper to switch the transport of the rpc_client

From: Trond Myklebust <[email protected]>

Signed-off-by: Trond Myklebust <[email protected]>
[ cel: fix whitespace ]
Signed-off-by: Chuck Lever <[email protected]>
---

include/linux/sunrpc/clnt.h | 3 ++
net/sunrpc/clnt.c | 76 ++++++++++++++++++++++++++++++++++++++++---
2 files changed, 73 insertions(+), 6 deletions(-)

diff --git a/include/linux/sunrpc/clnt.h b/include/linux/sunrpc/clnt.h
index 2eea0d7..d18d952 100644
--- a/include/linux/sunrpc/clnt.h
+++ b/include/linux/sunrpc/clnt.h
@@ -144,6 +144,9 @@ void rpc_task_release_client(struct rpc_task *);

int rpc_lock_client(struct rpc_clnt *clnt, unsigned long timeout);
void rpc_unlock_client(struct rpc_clnt *clnt);
+int rpc_switch_client_transport(struct rpc_clnt *,
+ struct xprt_create *,
+ const struct rpc_timeout *);

int rpcb_register(u32, u32, int, unsigned short);
int rpcb_v4_register(const u32 program, const u32 version,
diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
index 6479e1d..ac2d29e 100644
--- a/net/sunrpc/clnt.c
+++ b/net/sunrpc/clnt.c
@@ -26,6 +26,7 @@
#include <linux/namei.h>
#include <linux/mount.h>
#include <linux/slab.h>
+#include <linux/rcupdate.h>
#include <linux/utsname.h>
#include <linux/workqueue.h>
#include <linux/in.h>
@@ -143,12 +144,35 @@ err:
return error;
}

+static void rpc_set_client_transport(struct rpc_clnt *clnt,
+ struct rpc_xprt *xprt,
+ const struct rpc_timeout *timeout)
+{
+ struct rpc_xprt *old;
+
+ spin_lock(&clnt->cl_lock);
+ old = clnt->cl_xprt;
+
+ if (!xprt_bound(xprt))
+ clnt->cl_autobind = 1;
+
+ clnt->cl_timeout = timeout;
+ rcu_assign_pointer(clnt->cl_xprt, xprt);
+ spin_unlock(&clnt->cl_lock);
+
+ if (old != NULL) {
+ synchronize_rcu();
+ xprt_put(old);
+ }
+}
+
static struct rpc_clnt * rpc_new_client(const struct rpc_create_args *args, struct rpc_xprt *xprt)
{
struct rpc_program *program = args->program;
struct rpc_version *version;
struct rpc_clnt *clnt = NULL;
struct rpc_auth *auth;
+ const struct rpc_timeout *timeout;
int err;

/* sanity check the name before trying to print it */
@@ -174,7 +198,6 @@ static struct rpc_clnt * rpc_new_client(const struct rpc_create_args *args, stru
goto out_err;
clnt->cl_parent = clnt;

- rcu_assign_pointer(clnt->cl_xprt, xprt);
clnt->cl_procinfo = version->procs;
clnt->cl_maxproc = version->nrprocs;
clnt->cl_protname = program->name;
@@ -189,16 +212,15 @@ static struct rpc_clnt * rpc_new_client(const struct rpc_create_args *args, stru
INIT_LIST_HEAD(&clnt->cl_tasks);
spin_lock_init(&clnt->cl_lock);

- if (!xprt_bound(xprt))
- clnt->cl_autobind = 1;
-
- clnt->cl_timeout = xprt->timeout;
+ timeout = xprt->timeout;
if (args->timeout != NULL) {
memcpy(&clnt->cl_timeout_default, args->timeout,
sizeof(clnt->cl_timeout_default));
- clnt->cl_timeout = &clnt->cl_timeout_default;
+ timeout = &clnt->cl_timeout_default;
}

+ rpc_set_client_transport(clnt, xprt, timeout);
+
clnt->cl_rtt = &clnt->cl_rtt_default;
rpc_init_rtt(&clnt->cl_rtt_default, clnt->cl_timeout->to_initval);
clnt->cl_principal = NULL;
@@ -412,6 +434,48 @@ out_no_clnt:
}
EXPORT_SYMBOL_GPL(rpc_clone_client);

+/**
+ * rpc_switch_client_transport: switch the RPC transport on the fly
+ * @clnt: pointer to a struct rpc_clnt
+ * @args: pointer to the new transport arguments
+ * @timeout: pointer to the new timeout parameters
+ *
+ * This function allows the caller to switch the RPC transport for the
+ * rpc_clnt structure 'clnt' to allow it to connect to a mirrored NFS server,
+ * for instance. It assumes that the caller has ensured that there are no
+ * active tasks by using some form of locking.
+ */
+int rpc_switch_client_transport(struct rpc_clnt *clnt,
+ struct xprt_create *args,
+ const struct rpc_timeout *timeout)
+{
+ struct rpc_xprt *xprt;
+ struct rpc_auth *auth;
+ rpc_authflavor_t pseudoflavor;
+
+ xprt = xprt_create_transport(args);
+ if (IS_ERR(xprt))
+ return PTR_ERR(xprt);
+
+ pseudoflavor = clnt->cl_auth->au_flavor;
+
+ rpc_set_client_transport(clnt, xprt, timeout);
+
+ /*
+ * Note: we must always create a new rpc_auth cache
+ * when switching to a different server! RPCSEC_GSS sessions
+ * in particular are between a single client and server,
+ * so we cannot reuse the sessions in the cache when we switch
+ * servers.
+ */
+ auth = rpcauth_create(pseudoflavor, clnt);
+ if (IS_ERR(auth))
+ return PTR_ERR(auth);
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(rpc_switch_client_transport);
+
/*
* Kill all tasks for the given client.
* XXX: kill their descendants as well?


2011-05-09 19:38:36

by Chuck Lever

[permalink] [raw]
Subject: [PATCH 14/16] NFS: Remove "const" from "struct nfs_server *" fields

We're about to pass the nfs_server pointer in some NFSv4 argument and
result structures to functions that may change it, so it's no longer
"const".

The preference here is to maintain existing whitespace style rather
than answer all the nits called out by checkpatch.pl.

Signed-off-by: Chuck Lever <[email protected]>
---

fs/nfs/nfs4proc.c | 2 +-
include/linux/nfs_xdr.h | 28 ++++++++++++++--------------
2 files changed, 15 insertions(+), 15 deletions(-)

diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index bb6b128..1168bb2 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -4027,7 +4027,7 @@ struct nfs4_unlockdata {
struct nfs4_lock_state *lsp;
struct nfs_open_context *ctx;
struct file_lock fl;
- const struct nfs_server *server;
+ struct nfs_server *server;
unsigned long timestamp;
};

diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h
index 22e34d3..ab17345 100644
--- a/include/linux/nfs_xdr.h
+++ b/include/linux/nfs_xdr.h
@@ -288,7 +288,7 @@ struct nfs_openargs {
fmode_t delegation_type; /* CLAIM_PREVIOUS */
} u;
const struct qstr * name;
- const struct nfs_server *server; /* Needed for ID mapping */
+ struct nfs_server * server; /* Needed for ID mapping */
const u32 * bitmask;
__u32 claim;
struct nfs4_sequence_args seq_args;
@@ -302,7 +302,7 @@ struct nfs_openres {
struct nfs_fattr * f_attr;
struct nfs_fattr * dir_attr;
struct nfs_seqid * seqid;
- const struct nfs_server *server;
+ struct nfs_server * server;
fmode_t delegation_type;
nfs4_stateid delegation;
__u32 do_recall;
@@ -341,7 +341,7 @@ struct nfs_closeres {
nfs4_stateid stateid;
struct nfs_fattr * fattr;
struct nfs_seqid * seqid;
- const struct nfs_server *server;
+ struct nfs_server * server;
struct nfs4_sequence_res seq_res;
};
/*
@@ -413,7 +413,7 @@ struct nfs4_delegreturnargs {

struct nfs4_delegreturnres {
struct nfs_fattr * fattr;
- const struct nfs_server *server;
+ struct nfs_server * server;
struct nfs4_sequence_res seq_res;
};

@@ -463,7 +463,7 @@ struct nfs_writeres {
struct nfs_fattr * fattr;
struct nfs_writeverf * verf;
__u32 count;
- const struct nfs_server *server;
+ struct nfs_server * server;
struct nfs4_sequence_res seq_res;
};

@@ -478,7 +478,7 @@ struct nfs_removeargs {
};

struct nfs_removeres {
- const struct nfs_server *server;
+ struct nfs_server *server;
struct nfs_fattr *dir_attr;
struct nfs4_change_info cinfo;
struct nfs4_sequence_res seq_res;
@@ -497,7 +497,7 @@ struct nfs_renameargs {
};

struct nfs_renameres {
- const struct nfs_server *server;
+ struct nfs_server *server;
struct nfs4_change_info old_cinfo;
struct nfs_fattr *old_fattr;
struct nfs4_change_info new_cinfo;
@@ -578,7 +578,7 @@ struct nfs_getaclres {

struct nfs_setattrres {
struct nfs_fattr * fattr;
- const struct nfs_server * server;
+ struct nfs_server * server;
struct nfs4_sequence_res seq_res;
};

@@ -751,7 +751,7 @@ struct nfs4_accessargs {
};

struct nfs4_accessres {
- const struct nfs_server * server;
+ struct nfs_server * server;
struct nfs_fattr * fattr;
u32 supported;
u32 access;
@@ -779,7 +779,7 @@ struct nfs4_create_arg {
};

struct nfs4_create_res {
- const struct nfs_server * server;
+ struct nfs_server * server;
struct nfs_fh * fh;
struct nfs_fattr * fattr;
struct nfs4_change_info dir_cinfo;
@@ -805,7 +805,7 @@ struct nfs4_getattr_arg {
};

struct nfs4_getattr_res {
- const struct nfs_server * server;
+ struct nfs_server * server;
struct nfs_fattr * fattr;
struct nfs4_sequence_res seq_res;
};
@@ -819,7 +819,7 @@ struct nfs4_link_arg {
};

struct nfs4_link_res {
- const struct nfs_server * server;
+ struct nfs_server * server;
struct nfs_fattr * fattr;
struct nfs4_change_info cinfo;
struct nfs_fattr * dir_attr;
@@ -835,7 +835,7 @@ struct nfs4_lookup_arg {
};

struct nfs4_lookup_res {
- const struct nfs_server * server;
+ struct nfs_server * server;
struct nfs_fattr * fattr;
struct nfs_fh * fh;
struct nfs4_sequence_res seq_res;
@@ -950,7 +950,7 @@ struct nfs4_fs_location {
#define NFS4_FS_LOCATIONS_MAXENTRIES 10
struct nfs4_fs_locations {
struct nfs_fattr fattr;
- const struct nfs_server *server;
+ struct nfs_server *server;
struct nfs4_pathname fs_path;
int nlocations;
struct nfs4_fs_location locations[NFS4_FS_LOCATIONS_MAXENTRIES];


2011-05-09 19:36:32

by Chuck Lever

[permalink] [raw]
Subject: [PATCH 02/16] SUNRPC: Use RCU to dereference the rpc_clnt.cl_xprt field

From: Trond Myklebust <[email protected]>

A migration event will replace the rpc_xprt used by an rpc_clnt. To
ensure this can be done safely, all references to cl_xprt must now use
a form of rcu_dereference().

Special care is taken with rpc_peeraddr2str(), which returns a pointer
to memory whose lifetime is the same as the rpc_xprt.

Signed-off-by: Trond Myklebust <[email protected]>
[ cel: fix lockdep splats and layering violations ]
Signed-off-by: Chuck Lever <[email protected]>
---

fs/nfs/callback_proc.c | 9 ++--
fs/nfs/client.c | 16 +++++--
fs/nfs/nfs4proc.c | 15 +++++--
fs/nfs/nfs4state.c | 14 +++++-
fs/nfs/super.c | 5 ++
include/linux/sunrpc/clnt.h | 4 +-
include/linux/sunrpc/debug.h | 11 +++++
net/sunrpc/clnt.c | 94 +++++++++++++++++++++++++++++++++++-------
net/sunrpc/rpc_pipe.c | 3 +
net/sunrpc/rpcb_clnt.c | 15 +++++--
net/sunrpc/stats.c | 6 ++-
11 files changed, 155 insertions(+), 37 deletions(-)

diff --git a/fs/nfs/callback_proc.c b/fs/nfs/callback_proc.c
index 2f41dcce..d360b80 100644
--- a/fs/nfs/callback_proc.c
+++ b/fs/nfs/callback_proc.c
@@ -8,6 +8,7 @@
#include <linux/nfs4.h>
#include <linux/nfs_fs.h>
#include <linux/slab.h>
+#include <linux/rcupdate.h>
#include "nfs4_fs.h"
#include "callback.h"
#include "delegation.h"
@@ -33,7 +34,7 @@ __be32 nfs4_callback_getattr(struct cb_getattrargs *args,
res->bitmap[0] = res->bitmap[1] = 0;
res->status = htonl(NFS4ERR_BADHANDLE);

- dprintk("NFS: GETATTR callback request from %s\n",
+ dprintk_rcu("NFS: GETATTR callback request from %s\n",
rpc_peeraddr2str(cps->clp->cl_rpcclient, RPC_DISPLAY_ADDR));

inode = nfs_delegation_find_inode(cps->clp, &args->fh);
@@ -73,7 +74,7 @@ __be32 nfs4_callback_recall(struct cb_recallargs *args, void *dummy,
if (!cps->clp) /* Always set for v4.0. Set in cb_sequence for v4.1 */
goto out;

- dprintk("NFS: RECALL callback request from %s\n",
+ dprintk_rcu("NFS: RECALL callback request from %s\n",
rpc_peeraddr2str(cps->clp->cl_rpcclient, RPC_DISPLAY_ADDR));

res = htonl(NFS4ERR_BADHANDLE);
@@ -442,7 +443,7 @@ __be32 nfs4_callback_recallany(struct cb_recallanyargs *args, void *dummy,
if (!cps->clp) /* set in cb_sequence */
goto out;

- dprintk("NFS: RECALL_ANY callback request from %s\n",
+ dprintk_rcu("NFS: RECALL_ANY callback request from %s\n",
rpc_peeraddr2str(cps->clp->cl_rpcclient, RPC_DISPLAY_ADDR));

status = cpu_to_be32(NFS4ERR_INVAL);
@@ -477,7 +478,7 @@ __be32 nfs4_callback_recallslot(struct cb_recallslotargs *args, void *dummy,
if (!cps->clp) /* set in cb_sequence */
goto out;

- dprintk("NFS: CB_RECALL_SLOT request from %s target max slots %d\n",
+ dprintk_rcu("NFS: CB_RECALL_SLOT request from %s target max slots %d\n",
rpc_peeraddr2str(cps->clp->cl_rpcclient, RPC_DISPLAY_ADDR),
args->crsa_target_max_slots);

diff --git a/fs/nfs/client.c b/fs/nfs/client.c
index 139be96..b55ef58 100644
--- a/fs/nfs/client.c
+++ b/fs/nfs/client.c
@@ -1273,16 +1273,18 @@ static int nfs4_init_callback(struct nfs_client *clp)
int error;

if (clp->rpc_ops->version == 4) {
+ struct rpc_xprt *xprt;
+
+ xprt = rcu_dereference_raw(clp->cl_rpcclient->cl_xprt);
+
if (nfs4_has_session(clp)) {
- error = xprt_setup_backchannel(
- clp->cl_rpcclient->cl_xprt,
+ error = xprt_setup_backchannel(xprt,
NFS41_BC_MIN_CALLBACKS);
if (error < 0)
return error;
}

- error = nfs_callback_up(clp->cl_mvops->minor_version,
- clp->cl_rpcclient->cl_xprt);
+ error = nfs_callback_up(clp->cl_mvops->minor_version, xprt);
if (error < 0) {
dprintk("%s: failed to start callback. Error = %d\n",
__func__, error);
@@ -1663,7 +1665,7 @@ struct nfs_server *nfs4_create_referral_server(struct nfs_clone_mount *data,
data->addrlen,
parent_client->cl_ipaddr,
data->authflavor,
- parent_server->client->cl_xprt->prot,
+ rpc_protocol(parent_server->client),
parent_server->client->cl_timeout,
parent_client->cl_mvops->minor_version);
if (error < 0)
@@ -1863,12 +1865,14 @@ static int nfs_server_list_show(struct seq_file *m, void *v)
/* display one transport per line on subsequent lines */
clp = list_entry(v, struct nfs_client, cl_share_link);

+ rcu_read_lock();
seq_printf(m, "v%u %s %s %3d %s\n",
clp->rpc_ops->version,
rpc_peeraddr2str(clp->cl_rpcclient, RPC_DISPLAY_HEX_ADDR),
rpc_peeraddr2str(clp->cl_rpcclient, RPC_DISPLAY_HEX_PORT),
atomic_read(&clp->cl_count),
clp->cl_hostname);
+ rcu_read_unlock();

return 0;
}
@@ -1942,6 +1946,7 @@ static int nfs_volume_list_show(struct seq_file *m, void *v)
(unsigned long long) server->fsid.major,
(unsigned long long) server->fsid.minor);

+ rcu_read_lock();
seq_printf(m, "v%u %s %s %-7s %-17s %s\n",
clp->rpc_ops->version,
rpc_peeraddr2str(clp->cl_rpcclient, RPC_DISPLAY_HEX_ADDR),
@@ -1949,6 +1954,7 @@ static int nfs_volume_list_show(struct seq_file *m, void *v)
dev,
fsid,
nfs_server_fscache_state(server));
+ rcu_read_unlock();

return 0;
}
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 69c0f3c..5a87686 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -3734,6 +3734,7 @@ int nfs4_proc_setclientid(struct nfs_client *clp, u32 program,
*p = htonl((u32)clp->cl_boot_time.tv_nsec);

for(;;) {
+ rcu_read_lock();
setclientid.sc_name_len = scnprintf(setclientid.sc_name,
sizeof(setclientid.sc_name), "%s/%s %s %s %u",
clp->cl_ipaddr,
@@ -3750,6 +3751,7 @@ int nfs4_proc_setclientid(struct nfs_client *clp, u32 program,
setclientid.sc_uaddr_len = scnprintf(setclientid.sc_uaddr,
sizeof(setclientid.sc_uaddr), "%s.%u.%u",
clp->cl_ipaddr, port >> 8, port & 255);
+ rcu_read_unlock();

status = rpc_call_sync(clp->cl_rpcclient, &msg, RPC_TASK_TIMEOUT);
if (status != -NFS4ERR_CLID_INUSE)
@@ -5050,11 +5052,16 @@ struct nfs4_session *nfs4_alloc_session(struct nfs_client *clp)

void nfs4_destroy_session(struct nfs4_session *session)
{
+ struct rpc_xprt *xprt;
+
nfs4_proc_destroy_session(session);
+
+ rcu_read_lock();
+ xprt = rcu_dereference(session->clp->cl_rpcclient->cl_xprt);
+ rcu_read_unlock();
dprintk("%s Destroy backchannel for xprt %p\n",
- __func__, session->clp->cl_rpcclient->cl_xprt);
- xprt_destroy_backchannel(session->clp->cl_rpcclient->cl_xprt,
- NFS41_BC_MIN_CALLBACKS);
+ __func__, xprt);
+ xprt_destroy_backchannel(xprt, NFS41_BC_MIN_CALLBACKS);
nfs4_destroy_slot_tables(session);
kfree(session);
}
@@ -5083,7 +5090,7 @@ static void nfs4_init_channel_attrs(struct nfs41_create_session_args *args)
args->fc_attrs.max_rqst_sz = mxrqst_sz;
args->fc_attrs.max_resp_sz = mxresp_sz;
args->fc_attrs.max_ops = NFS4_MAX_OPS;
- args->fc_attrs.max_reqs = session->clp->cl_rpcclient->cl_xprt->max_reqs;
+ args->fc_attrs.max_reqs = rpc_max_reqs(session->clp->cl_rpcclient);

dprintk("%s: Fore Channel : max_rqst_sz=%u max_resp_sz=%u "
"max_ops=%u max_reqs=%u\n",
diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
index 036f5ad..f6b268f 100644
--- a/fs/nfs/nfs4state.c
+++ b/fs/nfs/nfs4state.c
@@ -1011,16 +1011,24 @@ static void nfs4_clear_state_manager_bit(struct nfs_client *clp)
void nfs4_schedule_state_manager(struct nfs_client *clp)
{
struct task_struct *task;
+ char buf[INET6_ADDRSTRLEN + sizeof("-manager") + 1];

if (test_and_set_bit(NFS4CLNT_MANAGER_RUNNING, &clp->cl_state) != 0)
return;
__module_get(THIS_MODULE);
atomic_inc(&clp->cl_count);
- task = kthread_run(nfs4_run_state_manager, clp, "%s-manager",
- rpc_peeraddr2str(clp->cl_rpcclient,
- RPC_DISPLAY_ADDR));
+
+ /* The rcu_read_lock() is not strictly necessary, as the state
+ * manager is the only thread that ever changes the rpc_xprt
+ * after it's initialized. At this point, we're single threaded. */
+ rcu_read_lock();
+ snprintf(buf, sizeof(buf), "%s-manager",
+ rpc_peeraddr2str(clp->cl_rpcclient, RPC_DISPLAY_ADDR));
+ rcu_read_unlock();
+ task = kthread_run(nfs4_run_state_manager, clp, buf);
if (!IS_ERR(task))
return;
+
nfs4_clear_state_manager_bit(clp);
nfs_put_client(clp);
module_put(THIS_MODULE);
diff --git a/fs/nfs/super.c b/fs/nfs/super.c
index e288f06..50c0482 100644
--- a/fs/nfs/super.c
+++ b/fs/nfs/super.c
@@ -53,6 +53,7 @@
#include <linux/nfs_xdr.h>
#include <linux/magic.h>
#include <linux/parser.h>
+#include <linux/rcupdate.h>

#include <asm/system.h>
#include <asm/uaccess.h>
@@ -676,8 +677,10 @@ static void nfs_show_mount_options(struct seq_file *m, struct nfs_server *nfss,
else
seq_puts(m, nfs_infop->nostr);
}
+ rcu_read_lock();
seq_printf(m, ",proto=%s",
rpc_peeraddr2str(nfss->client, RPC_DISPLAY_NETID));
+ rcu_read_unlock();
if (version == 4) {
if (nfss->port != NFS_PORT)
seq_printf(m, ",port=%u", nfss->port);
@@ -726,9 +729,11 @@ static int nfs_show_options(struct seq_file *m, struct vfsmount *mnt)

nfs_show_mount_options(m, nfss, 0);

+ rcu_read_lock();
seq_printf(m, ",addr=%s",
rpc_peeraddr2str(nfss->nfs_client->cl_rpcclient,
RPC_DISPLAY_ADDR));
+ rcu_read_unlock();

return 0;
}
diff --git a/include/linux/sunrpc/clnt.h b/include/linux/sunrpc/clnt.h
index 1cab257..7a1d124 100644
--- a/include/linux/sunrpc/clnt.h
+++ b/include/linux/sunrpc/clnt.h
@@ -36,7 +36,7 @@ struct rpc_clnt {
struct list_head cl_clients; /* Global list of clients */
struct list_head cl_tasks; /* List of tasks */
spinlock_t cl_lock; /* spinlock */
- struct rpc_xprt * cl_xprt; /* transport */
+ struct rpc_xprt __rcu * cl_xprt; /* transport */
struct rpc_procinfo * cl_procinfo; /* procedure info */
u32 cl_prog, /* RPC program number */
cl_vers, /* RPC version number */
@@ -164,6 +164,8 @@ struct rpc_task *rpc_call_null(struct rpc_clnt *clnt, struct rpc_cred *cred,
int rpc_restart_call_prepare(struct rpc_task *);
int rpc_restart_call(struct rpc_task *);
void rpc_setbufsize(struct rpc_clnt *, unsigned int, unsigned int);
+int rpc_protocol(struct rpc_clnt *);
+unsigned int rpc_max_reqs(struct rpc_clnt *);
size_t rpc_max_payload(struct rpc_clnt *);
void rpc_force_rebind(struct rpc_clnt *);
size_t rpc_peeraddr(struct rpc_clnt *, struct sockaddr *, size_t);
diff --git a/include/linux/sunrpc/debug.h b/include/linux/sunrpc/debug.h
index c2786f2..28136fd 100644
--- a/include/linux/sunrpc/debug.h
+++ b/include/linux/sunrpc/debug.h
@@ -47,15 +47,26 @@ extern unsigned int nlm_debug;
#endif

#define dprintk(args...) dfprintk(FACILITY, ## args)
+#define dprintk_rcu(args...) dfprintk_rcu(FACILITY, ## args)

#undef ifdebug
#ifdef RPC_DEBUG
# define ifdebug(fac) if (unlikely(rpc_debug & RPCDBG_##fac))
# define dfprintk(fac, args...) do { ifdebug(fac) printk(args); } while(0)
+
+# define dfprintk_rcu(fac, args...) \
+ do { \
+ ifdebug(fac) { \
+ rcu_read_lock(); \
+ printk(args); \
+ rcu_read_unlock(); \
+ } \
+ } while (0)
# define RPC_IFDEBUG(x) x
#else
# define ifdebug(fac) if (0)
# define dfprintk(fac, args...) do ; while (0)
+# define dfprintk_rcu(fac, args...) do ; while (0)
# define RPC_IFDEBUG(x)
#endif

diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
index 3d6b1a9..31ee4db 100644
--- a/net/sunrpc/clnt.c
+++ b/net/sunrpc/clnt.c
@@ -31,6 +31,7 @@
#include <linux/in.h>
#include <linux/in6.h>
#include <linux/un.h>
+#include <linux/rcupdate.h>

#include <linux/sunrpc/clnt.h>
#include <linux/sunrpc/rpc_pipe_fs.h>
@@ -190,7 +191,7 @@ static struct rpc_clnt * rpc_new_client(const struct rpc_create_args *args, stru
}
strlcpy(clnt->cl_server, args->servername, len);

- clnt->cl_xprt = xprt;
+ rcu_assign_pointer(clnt->cl_xprt, xprt);
clnt->cl_procinfo = version->procs;
clnt->cl_maxproc = version->nrprocs;
clnt->cl_protname = program->name;
@@ -205,7 +206,7 @@ static struct rpc_clnt * rpc_new_client(const struct rpc_create_args *args, stru
INIT_LIST_HEAD(&clnt->cl_tasks);
spin_lock_init(&clnt->cl_lock);

- if (!xprt_bound(clnt->cl_xprt))
+ if (!xprt_bound(xprt))
clnt->cl_autobind = 1;

clnt->cl_timeout = xprt->timeout;
@@ -377,6 +378,7 @@ struct rpc_clnt *
rpc_clone_client(struct rpc_clnt *clnt)
{
struct rpc_clnt *new;
+ struct rpc_xprt *xprt;
int err = -ENOMEM;

new = kmemdup(clnt, sizeof(*new), GFP_KERNEL);
@@ -396,6 +398,12 @@ rpc_clone_client(struct rpc_clnt *clnt)
if (new->cl_principal == NULL)
goto out_no_principal;
}
+ rcu_read_lock();
+ xprt = xprt_get(rcu_dereference(clnt->cl_xprt));
+ rcu_read_unlock();
+ if (xprt == NULL)
+ goto out_no_transport;
+ rcu_assign_pointer(new->cl_xprt, xprt);
atomic_set(&new->cl_count, 1);
atomic_set(&new->cl_active_tasks, 0);
rpc_init_wait_queue(&new->cl_waitqueue, "client waitqueue");
@@ -404,12 +412,13 @@ rpc_clone_client(struct rpc_clnt *clnt)
goto out_no_path;
if (new->cl_auth)
atomic_inc(&new->cl_auth->au_count);
- xprt_get(clnt->cl_xprt);
atomic_inc(&clnt->cl_count);
rpc_register_client(new);
rpciod_up();
return new;
out_no_path:
+ xprt_put(xprt);
+out_no_transport:
kfree(new->cl_principal);
out_no_principal:
rpc_free_iostats(new->cl_metrics);
@@ -494,7 +503,7 @@ out_free:
rpc_free_iostats(clnt->cl_metrics);
kfree(clnt->cl_principal);
clnt->cl_metrics = NULL;
- xprt_put(clnt->cl_xprt);
+ xprt_put(rcu_dereference_raw(clnt->cl_xprt));
rpciod_down();
kfree(clnt);
}
@@ -851,13 +860,18 @@ EXPORT_SYMBOL_GPL(rpc_call_start);
size_t rpc_peeraddr(struct rpc_clnt *clnt, struct sockaddr *buf, size_t bufsize)
{
size_t bytes;
- struct rpc_xprt *xprt = clnt->cl_xprt;
+ struct rpc_xprt *xprt;
+
+ rcu_read_lock();
+ xprt = rcu_dereference(clnt->cl_xprt);

- bytes = sizeof(xprt->addr);
+ bytes = xprt->addrlen;
if (bytes > bufsize)
bytes = bufsize;
- memcpy(buf, &clnt->cl_xprt->addr, bytes);
- return xprt->addrlen;
+ memcpy(buf, &xprt->addr, bytes);
+ rcu_read_unlock();
+
+ return bytes;
}
EXPORT_SYMBOL_GPL(rpc_peeraddr);

@@ -866,11 +880,16 @@ EXPORT_SYMBOL_GPL(rpc_peeraddr);
* @clnt: RPC client structure
* @format: address format
*
+ * NB: the lifetime of the memory referenced by the returned pointer is
+ * the same as the rpc_xprt itself. As long as the caller uses this
+ * pointer, it must hold the RCU read lock.
*/
const char *rpc_peeraddr2str(struct rpc_clnt *clnt,
enum rpc_display_format_t format)
{
- struct rpc_xprt *xprt = clnt->cl_xprt;
+ struct rpc_xprt *xprt;
+
+ xprt = rcu_dereference(clnt->cl_xprt);

if (xprt->address_strings[format] != NULL)
return xprt->address_strings[format];
@@ -882,14 +901,51 @@ EXPORT_SYMBOL_GPL(rpc_peeraddr2str);
void
rpc_setbufsize(struct rpc_clnt *clnt, unsigned int sndsize, unsigned int rcvsize)
{
- struct rpc_xprt *xprt = clnt->cl_xprt;
+ struct rpc_xprt *xprt;
+
+ rcu_read_lock();
+ xprt = rcu_dereference(clnt->cl_xprt);
if (xprt->ops->set_buffer_size)
xprt->ops->set_buffer_size(xprt, sndsize, rcvsize);
+ rcu_read_unlock();
}
EXPORT_SYMBOL_GPL(rpc_setbufsize);

-/*
- * Return size of largest payload RPC client can support, in bytes
+/**
+ * rpc_protocol - Get transport protocol number for an RPC client
+ * @clnt: RPC client to query
+ *
+ */
+int rpc_protocol(struct rpc_clnt *clnt)
+{
+ int protocol;
+
+ rcu_read_lock();
+ protocol = rcu_dereference(clnt->cl_xprt)->prot;
+ rcu_read_unlock();
+ return protocol;
+}
+EXPORT_SYMBOL_GPL(rpc_protocol);
+
+/**
+ * rpc_max_reqs - Get the maximum number of outstanding requests
+ * @clnt: RPC client to query
+ *
+ */
+unsigned int rpc_max_reqs(struct rpc_clnt *clnt)
+{
+ unsigned int max_reqs;
+
+ rcu_read_lock();
+ max_reqs = rcu_dereference(clnt->cl_xprt)->max_reqs;
+ rcu_read_unlock();
+ return max_reqs;
+}
+EXPORT_SYMBOL_GPL(rpc_max_reqs);
+
+/**
+ * rpc_max_payload - Get maximum payload size for a transport, in bytes
+ * @clnt: RPC client to query
*
* For stream transports, this is one RPC record fragment (see RFC
* 1831), as we don't support multi-record requests yet. For datagram
@@ -898,7 +954,12 @@ EXPORT_SYMBOL_GPL(rpc_setbufsize);
*/
size_t rpc_max_payload(struct rpc_clnt *clnt)
{
- return clnt->cl_xprt->max_payload;
+ size_t ret;
+
+ rcu_read_lock();
+ ret = rcu_dereference(clnt->cl_xprt)->max_payload;
+ rcu_read_unlock();
+ return ret;
}
EXPORT_SYMBOL_GPL(rpc_max_payload);

@@ -909,8 +970,11 @@ EXPORT_SYMBOL_GPL(rpc_max_payload);
*/
void rpc_force_rebind(struct rpc_clnt *clnt)
{
- if (clnt->cl_autobind)
- xprt_clear_bound(clnt->cl_xprt);
+ if (clnt->cl_autobind) {
+ rcu_read_lock();
+ xprt_clear_bound(rcu_dereference(clnt->cl_xprt));
+ rcu_read_unlock();
+ }
}
EXPORT_SYMBOL_GPL(rpc_force_rebind);

diff --git a/net/sunrpc/rpc_pipe.c b/net/sunrpc/rpc_pipe.c
index 72bc536..47053e5 100644
--- a/net/sunrpc/rpc_pipe.c
+++ b/net/sunrpc/rpc_pipe.c
@@ -16,6 +16,7 @@
#include <linux/namei.h>
#include <linux/fsnotify.h>
#include <linux/kernel.h>
+#include <linux/rcupdate.h>

#include <asm/ioctls.h>
#include <linux/fs.h>
@@ -359,12 +360,14 @@ rpc_show_info(struct seq_file *m, void *v)
{
struct rpc_clnt *clnt = m->private;

+ rcu_read_lock();
seq_printf(m, "RPC server: %s\n", clnt->cl_server);
seq_printf(m, "service: %s (%d) version %d\n", clnt->cl_protname,
clnt->cl_prog, clnt->cl_vers);
seq_printf(m, "address: %s\n", rpc_peeraddr2str(clnt, RPC_DISPLAY_ADDR));
seq_printf(m, "protocol: %s\n", rpc_peeraddr2str(clnt, RPC_DISPLAY_PROTO));
seq_printf(m, "port: %s\n", rpc_peeraddr2str(clnt, RPC_DISPLAY_PORT));
+ rcu_read_unlock();
return 0;
}

diff --git a/net/sunrpc/rpcb_clnt.c b/net/sunrpc/rpcb_clnt.c
index 9a80a92..a861e19 100644
--- a/net/sunrpc/rpcb_clnt.c
+++ b/net/sunrpc/rpcb_clnt.c
@@ -571,9 +571,10 @@ static struct rpc_task *rpcb_call_async(struct rpc_clnt *rpcb_clnt, struct rpcbi
static struct rpc_clnt *rpcb_find_transport_owner(struct rpc_clnt *clnt)
{
struct rpc_clnt *parent = clnt->cl_parent;
+ struct rpc_xprt *xprt = rcu_dereference(clnt->cl_xprt);

while (parent != clnt) {
- if (parent->cl_xprt != clnt->cl_xprt)
+ if (rcu_dereference(parent->cl_xprt) != xprt)
break;
if (clnt->cl_autobind)
break;
@@ -604,8 +605,12 @@ void rpcb_getport_async(struct rpc_task *task)
size_t salen;
int status;

- clnt = rpcb_find_transport_owner(task->tk_client);
- xprt = clnt->cl_xprt;
+ rcu_read_lock();
+ do {
+ clnt = rpcb_find_transport_owner(task->tk_client);
+ xprt = xprt_get(rcu_dereference(clnt->cl_xprt));
+ } while (xprt == NULL);
+ rcu_read_unlock();

dprintk("RPC: %5u %s(%s, %u, %u, %d)\n",
task->tk_pid, __func__,
@@ -618,6 +623,7 @@ void rpcb_getport_async(struct rpc_task *task)
if (xprt_test_and_set_binding(xprt)) {
dprintk("RPC: %5u %s: waiting for another binder\n",
task->tk_pid, __func__);
+ xprt_put(xprt);
return;
}

@@ -685,7 +691,7 @@ void rpcb_getport_async(struct rpc_task *task)
switch (bind_version) {
case RPCBVERS_4:
case RPCBVERS_3:
- map->r_netid = rpc_peeraddr2str(clnt, RPC_DISPLAY_NETID);
+ map->r_netid = xprt->address_strings[RPC_DISPLAY_NETID];
map->r_addr = rpc_sockaddr2uaddr(sap);
map->r_owner = "";
break;
@@ -714,6 +720,7 @@ bailout_release_client:
bailout_nofree:
rpcb_wake_rpcbind_waiters(xprt, status);
task->tk_status = status;
+ xprt_put(xprt);
}
EXPORT_SYMBOL_GPL(rpcb_getport_async);

diff --git a/net/sunrpc/stats.c b/net/sunrpc/stats.c
index 80df89d..4084255 100644
--- a/net/sunrpc/stats.c
+++ b/net/sunrpc/stats.c
@@ -22,6 +22,7 @@
#include <linux/sunrpc/clnt.h>
#include <linux/sunrpc/svcsock.h>
#include <linux/sunrpc/metrics.h>
+#include <linux/rcupdate.h>

#include "netns.h"

@@ -179,7 +180,7 @@ static void _print_name(struct seq_file *seq, unsigned int op,
void rpc_print_iostats(struct seq_file *seq, struct rpc_clnt *clnt)
{
struct rpc_iostats *stats = clnt->cl_metrics;
- struct rpc_xprt *xprt = clnt->cl_xprt;
+ struct rpc_xprt *xprt;
unsigned int op, maxproc = clnt->cl_maxproc;

if (!stats)
@@ -189,8 +190,11 @@ void rpc_print_iostats(struct seq_file *seq, struct rpc_clnt *clnt)
seq_printf(seq, "p/v: %u/%u (%s)\n",
clnt->cl_prog, clnt->cl_vers, clnt->cl_protname);

+ rcu_read_lock();
+ xprt = rcu_dereference(clnt->cl_xprt);
if (xprt)
xprt->ops->print_stats(xprt, seq);
+ rcu_read_unlock();

seq_printf(seq, "\tper-op statistics\n");
for (op = 0; op < maxproc; op++) {


2011-05-09 19:36:43

by Chuck Lever

[permalink] [raw]
Subject: [PATCH 03/16] SUNRPC: Move clnt->cl_server into struct rpc_xprt

From: Trond Myklebust <[email protected]>

When the cl_xprt field is updated, the cl_server field will also have
to change. Since the contents of cl_server follow the remote endpoint
of cl_xprt, just move that field to the rpc_xprt.

Signed-off-by: Trond Myklebust <[email protected]>
[ cel: simplify check_gss_callback_principal(), whitespace changes ]
Signed-off-by: Chuck Lever <[email protected]>
---

fs/nfs/callback.c | 3 +-
include/linux/sunrpc/clnt.h | 3 +-
include/linux/sunrpc/xprt.h | 2 +
net/sunrpc/clnt.c | 74 ++++++++++++++++++++-----------------------
net/sunrpc/rpc_pipe.c | 2 +
net/sunrpc/rpcb_clnt.c | 9 +++--
net/sunrpc/xprt.c | 14 ++++++++
7 files changed, 58 insertions(+), 49 deletions(-)

diff --git a/fs/nfs/callback.c b/fs/nfs/callback.c
index e3d2942..1253540 100644
--- a/fs/nfs/callback.c
+++ b/fs/nfs/callback.c
@@ -332,7 +332,6 @@ void nfs_callback_down(int minorversion)
int
check_gss_callback_principal(struct nfs_client *clp, struct svc_rqst *rqstp)
{
- struct rpc_clnt *r = clp->cl_rpcclient;
char *p = svc_gss_principal(rqstp);

if (rqstp->rq_authop->flavour != RPC_AUTH_GSS)
@@ -353,7 +352,7 @@ check_gss_callback_principal(struct nfs_client *clp, struct svc_rqst *rqstp)
if (memcmp(p, "nfs@", 4) != 0)
return 0;
p += 4;
- if (strcmp(p, r->cl_server) != 0)
+ if (strcmp(p, clp->cl_hostname) != 0)
return 0;
return 1;
}
diff --git a/include/linux/sunrpc/clnt.h b/include/linux/sunrpc/clnt.h
index 7a1d124..2eea0d7 100644
--- a/include/linux/sunrpc/clnt.h
+++ b/include/linux/sunrpc/clnt.h
@@ -42,7 +42,6 @@ struct rpc_clnt {
cl_vers, /* RPC version number */
cl_maxproc; /* max procedure number */

- char * cl_server; /* server machine name */
char * cl_protname; /* protocol name */
struct rpc_auth * cl_auth; /* authenticator */
struct rpc_stat * cl_stats; /* per-program statistics */
@@ -116,7 +115,7 @@ struct rpc_create_args {
size_t addrsize;
struct sockaddr *saddress;
const struct rpc_timeout *timeout;
- char *servername;
+ const char *servername;
struct rpc_program *program;
u32 prognumber; /* overrides program->number */
u32 version;
diff --git a/include/linux/sunrpc/xprt.h b/include/linux/sunrpc/xprt.h
index 81cce3b..5a75b4b 100644
--- a/include/linux/sunrpc/xprt.h
+++ b/include/linux/sunrpc/xprt.h
@@ -225,6 +225,7 @@ struct rpc_xprt {
} stat;

struct net *xprt_net;
+ const char *servername;
const char *address_strings[RPC_DISPLAY_MAX];
};

@@ -254,6 +255,7 @@ struct xprt_create {
struct sockaddr * srcaddr; /* optional local address */
struct sockaddr * dstaddr; /* remote peer address */
size_t addrlen;
+ const char * servername;
struct svc_xprt *bc_xprt; /* NFSv4.1 backchannel */
};

diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
index 31ee4db..6479e1d 100644
--- a/net/sunrpc/clnt.c
+++ b/net/sunrpc/clnt.c
@@ -150,15 +150,8 @@ static struct rpc_clnt * rpc_new_client(const struct rpc_create_args *args, stru
struct rpc_clnt *clnt = NULL;
struct rpc_auth *auth;
int err;
- size_t len;

/* sanity check the name before trying to print it */
- err = -EINVAL;
- len = strlen(args->servername);
- if (len > RPC_MAXNETNAMELEN)
- goto out_no_rpciod;
- len++;
-
dprintk("RPC: creating %s client for %s (xprt %p)\n",
program->name, args->servername, xprt);

@@ -181,16 +174,6 @@ static struct rpc_clnt * rpc_new_client(const struct rpc_create_args *args, stru
goto out_err;
clnt->cl_parent = clnt;

- clnt->cl_server = clnt->cl_inline_name;
- if (len > sizeof(clnt->cl_inline_name)) {
- char *buf = kmalloc(len, GFP_KERNEL);
- if (buf != NULL)
- clnt->cl_server = buf;
- else
- len = sizeof(clnt->cl_inline_name);
- }
- strlcpy(clnt->cl_server, args->servername, len);
-
rcu_assign_pointer(clnt->cl_xprt, xprt);
clnt->cl_procinfo = version->procs;
clnt->cl_maxproc = version->nrprocs;
@@ -259,8 +242,6 @@ out_no_path:
out_no_principal:
rpc_free_iostats(clnt->cl_metrics);
out_no_stats:
- if (clnt->cl_server != clnt->cl_inline_name)
- kfree(clnt->cl_server);
kfree(clnt);
out_err:
xprt_put(xprt);
@@ -290,6 +271,7 @@ struct rpc_clnt *rpc_create(struct rpc_create_args *args)
.srcaddr = args->saddress,
.dstaddr = args->address,
.addrlen = args->addrsize,
+ .servername = args->servername,
.bc_xprt = args->bc_xprt,
};
char servername[48];
@@ -298,7 +280,7 @@ struct rpc_clnt *rpc_create(struct rpc_create_args *args)
* If the caller chooses not to specify a hostname, whip
* up a string representation of the passed-in address.
*/
- if (args->servername == NULL) {
+ if (xprtargs.servername == NULL) {
struct sockaddr_un *sun =
(struct sockaddr_un *)args->address;
struct sockaddr_in *sin =
@@ -325,7 +307,7 @@ struct rpc_clnt *rpc_create(struct rpc_create_args *args)
* address family isn't recognized. */
return ERR_PTR(-EINVAL);
}
- args->servername = servername;
+ xprtargs.servername = servername;
}

xprt = xprt_create_transport(&xprtargs);
@@ -467,8 +449,9 @@ EXPORT_SYMBOL_GPL(rpc_killall_tasks);
*/
void rpc_shutdown_client(struct rpc_clnt *clnt)
{
- dprintk("RPC: shutting down %s client for %s\n",
- clnt->cl_protname, clnt->cl_server);
+ dprintk_rcu("RPC: shutting down %s client for %s\n",
+ clnt->cl_protname,
+ rcu_dereference(clnt->cl_xprt)->servername);

while (!list_empty(&clnt->cl_tasks)) {
rpc_killall_tasks(clnt);
@@ -486,8 +469,9 @@ EXPORT_SYMBOL_GPL(rpc_shutdown_client);
static void
rpc_free_client(struct rpc_clnt *clnt)
{
- dprintk("RPC: destroying %s client for %s\n",
- clnt->cl_protname, clnt->cl_server);
+ dprintk_rcu("RPC: destroying %s client for %s\n",
+ clnt->cl_protname,
+ rcu_dereference(clnt->cl_xprt)->servername);
if (!IS_ERR(clnt->cl_path.dentry)) {
rpc_remove_client_dir(clnt->cl_path.dentry);
rpc_put_mount();
@@ -496,8 +480,6 @@ rpc_free_client(struct rpc_clnt *clnt)
rpc_release_client(clnt->cl_parent);
goto out_free;
}
- if (clnt->cl_server != clnt->cl_inline_name)
- kfree(clnt->cl_server);
out_free:
rpc_unregister_client(clnt);
rpc_free_iostats(clnt->cl_metrics);
@@ -1645,8 +1627,11 @@ call_timeout(struct rpc_task *task)
}
if (RPC_IS_SOFT(task)) {
if (clnt->cl_chatty)
+ rcu_read_lock();
printk(KERN_NOTICE "%s: server %s not responding, timed out\n",
- clnt->cl_protname, clnt->cl_server);
+ clnt->cl_protname,
+ rcu_dereference(clnt->cl_xprt)->servername);
+ rcu_read_unlock();
if (task->tk_flags & RPC_TASK_TIMEOUT)
rpc_exit(task, -ETIMEDOUT);
else
@@ -1656,9 +1641,13 @@ call_timeout(struct rpc_task *task)

if (!(task->tk_flags & RPC_CALL_MAJORSEEN)) {
task->tk_flags |= RPC_CALL_MAJORSEEN;
- if (clnt->cl_chatty)
+ if (clnt->cl_chatty) {
+ rcu_read_lock();
printk(KERN_NOTICE "%s: server %s not responding, still trying\n",
- clnt->cl_protname, clnt->cl_server);
+ clnt->cl_protname,
+ rcu_dereference(clnt->cl_xprt)->servername);
+ rcu_read_unlock();
+ }
}
rpc_force_rebind(clnt);
/*
@@ -1688,9 +1677,13 @@ call_decode(struct rpc_task *task)
task->tk_pid, task->tk_status);

if (task->tk_flags & RPC_CALL_MAJORSEEN) {
- if (clnt->cl_chatty)
+ if (clnt->cl_chatty) {
+ rcu_read_lock();
printk(KERN_NOTICE "%s: server %s OK\n",
- clnt->cl_protname, clnt->cl_server);
+ clnt->cl_protname,
+ rcu_dereference(clnt->cl_xprt)->servername);
+ rcu_read_unlock();
+ }
task->tk_flags &= ~RPC_CALL_MAJORSEEN;
}

@@ -1841,8 +1834,11 @@ rpc_verify_header(struct rpc_task *task)
task->tk_action = call_bind;
goto out_retry;
case RPC_AUTH_TOOWEAK:
+ rcu_read_lock();
printk(KERN_NOTICE "RPC: server %s requires stronger "
- "authentication.\n", task->tk_client->cl_server);
+ "authentication.\n",
+ rcu_dereference(task->tk_client->cl_xprt)->servername);
+ rcu_read_unlock();
break;
default:
dprintk("RPC: %5u %s: unknown auth error: %x\n",
@@ -1865,28 +1861,28 @@ rpc_verify_header(struct rpc_task *task)
case RPC_SUCCESS:
return p;
case RPC_PROG_UNAVAIL:
- dprintk("RPC: %5u %s: program %u is unsupported by server %s\n",
+ dprintk_rcu("RPC: %5u %s: program %u is unsupported by server %s\n",
task->tk_pid, __func__,
(unsigned int)task->tk_client->cl_prog,
- task->tk_client->cl_server);
+ rcu_dereference(task->tk_client->cl_xprt)->servername);
error = -EPFNOSUPPORT;
goto out_err;
case RPC_PROG_MISMATCH:
- dprintk("RPC: %5u %s: program %u, version %u unsupported by "
+ dprintk_rcu("RPC: %5u %s: program %u, version %u unsupported by "
"server %s\n", task->tk_pid, __func__,
(unsigned int)task->tk_client->cl_prog,
(unsigned int)task->tk_client->cl_vers,
- task->tk_client->cl_server);
+ rcu_dereference(task->tk_client->cl_xprt)->servername);
error = -EPROTONOSUPPORT;
goto out_err;
case RPC_PROC_UNAVAIL:
- dprintk("RPC: %5u %s: proc %s unsupported by program %u, "
+ dprintk_rcu("RPC: %5u %s: proc %s unsupported by program %u, "
"version %u on server %s\n",
task->tk_pid, __func__,
rpc_proc_name(task),
task->tk_client->cl_prog,
task->tk_client->cl_vers,
- task->tk_client->cl_server);
+ rcu_dereference(task->tk_client->cl_xprt)->servername);
error = -EOPNOTSUPP;
goto out_err;
case RPC_GARBAGE_ARGS:
diff --git a/net/sunrpc/rpc_pipe.c b/net/sunrpc/rpc_pipe.c
index 47053e5..e7b12b5 100644
--- a/net/sunrpc/rpc_pipe.c
+++ b/net/sunrpc/rpc_pipe.c
@@ -361,7 +361,7 @@ rpc_show_info(struct seq_file *m, void *v)
struct rpc_clnt *clnt = m->private;

rcu_read_lock();
- seq_printf(m, "RPC server: %s\n", clnt->cl_server);
+ seq_printf(m, "RPC server: %s\n", rcu_dereference(clnt->cl_xprt)->servername);
seq_printf(m, "service: %s (%d) version %d\n", clnt->cl_protname,
clnt->cl_prog, clnt->cl_vers);
seq_printf(m, "address: %s\n", rpc_peeraddr2str(clnt, RPC_DISPLAY_ADDR));
diff --git a/net/sunrpc/rpcb_clnt.c b/net/sunrpc/rpcb_clnt.c
index a861e19..9fa4cb4 100644
--- a/net/sunrpc/rpcb_clnt.c
+++ b/net/sunrpc/rpcb_clnt.c
@@ -291,8 +291,9 @@ out:
return result;
}

-static struct rpc_clnt *rpcb_create(char *hostname, struct sockaddr *srvaddr,
- size_t salen, int proto, u32 version)
+static struct rpc_clnt *rpcb_create(const char *hostname,
+ struct sockaddr *srvaddr, size_t salen,
+ int proto, u32 version)
{
struct rpc_create_args args = {
.net = &init_net,
@@ -614,7 +615,7 @@ void rpcb_getport_async(struct rpc_task *task)

dprintk("RPC: %5u %s(%s, %u, %u, %d)\n",
task->tk_pid, __func__,
- clnt->cl_server, clnt->cl_prog, clnt->cl_vers, xprt->prot);
+ xprt->servername, clnt->cl_prog, clnt->cl_vers, xprt->prot);

/* Put self on the wait queue to ensure we get notified if
* some other task is already attempting to bind the port */
@@ -665,7 +666,7 @@ void rpcb_getport_async(struct rpc_task *task)
dprintk("RPC: %5u %s: trying rpcbind version %u\n",
task->tk_pid, __func__, bind_version);

- rpcb_clnt = rpcb_create(clnt->cl_server, sap, salen, xprt->prot,
+ rpcb_clnt = rpcb_create(xprt->servername, sap, salen, xprt->prot,
bind_version);
if (IS_ERR(rpcb_clnt)) {
status = PTR_ERR(rpcb_clnt);
diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c
index ce5eb68..76879bc 100644
--- a/net/sunrpc/xprt.c
+++ b/net/sunrpc/xprt.c
@@ -65,6 +65,7 @@
static void xprt_request_init(struct rpc_task *, struct rpc_xprt *);
static void xprt_connect_status(struct rpc_task *task);
static int __xprt_get_cong(struct rpc_xprt *, struct rpc_task *);
+static void xprt_destroy(struct rpc_xprt *xprt);

static DEFINE_SPINLOCK(xprt_list_lock);
static LIST_HEAD(xprt_list);
@@ -740,7 +741,7 @@ static void xprt_connect_status(struct rpc_task *task)
default:
dprintk("RPC: %5u xprt_connect_status: error %d connecting to "
"server %s\n", task->tk_pid, -task->tk_status,
- task->tk_client->cl_server);
+ xprt->servername);
xprt_release_write(xprt, task);
task->tk_status = -EIO;
}
@@ -1138,6 +1139,16 @@ found:

xprt_init_xid(xprt);

+ if (strlen(args->servername) > RPC_MAXNETNAMELEN) {
+ xprt_destroy(xprt);
+ return ERR_PTR(-EINVAL);
+ }
+ xprt->servername = kstrdup(args->servername, GFP_KERNEL);
+ if (xprt->servername == NULL) {
+ xprt_destroy(xprt);
+ return ERR_PTR(-ENOMEM);
+ }
+
dprintk("RPC: created transport %p with %u slots\n", xprt,
xprt->max_reqs);
return xprt;
@@ -1160,6 +1171,7 @@ static void xprt_destroy(struct rpc_xprt *xprt)
rpc_destroy_wait_queue(&xprt->resend);
rpc_destroy_wait_queue(&xprt->backlog);
cancel_work_sync(&xprt->task_cleanup);
+ kfree(xprt->servername);
/*
* Tear down transport state and free the rpc_xprt
*/


2011-05-09 19:37:03

by Chuck Lever

[permalink] [raw]
Subject: [PATCH 05/16] SUNRPC: Add API to acquire source address

NFSv4.0 clients must send endpoint information for their callback
service to NFSv4.0 servers during their first contact with a server.
Traditionally, user space provides the callback endpoint IP address
via the "clientaddr=" mount option.

During an NFSv4 migration event, it is possible that an FSID may be
migrated to a destination server that is accessible via a different
client-side NIC than the source server was. The client must update
callback endpoint information on the destination server so that it can
maintain leases and allow delegation.

Without a new "clientaddr=" option from user space, however, the
kernel itself must construct an appropriate IP address for the
callback service. Provide an API in the RPC client for upper layer
RPC consumers to acquire a source address for a remote.

The mechanism used by the mount command is copied: set up a connected
UDP socket to the remote, then scrape the source address off the
socket. We are careful to select the correct network namespace when
setting up the temporary UDP socket.

Signed-off-by: Chuck Lever <[email protected]>
---

include/linux/sunrpc/clnt.h | 1
net/sunrpc/clnt.c | 149 +++++++++++++++++++++++++++++++++++++++++++
2 files changed, 150 insertions(+), 0 deletions(-)

diff --git a/include/linux/sunrpc/clnt.h b/include/linux/sunrpc/clnt.h
index d18d952..c2d60c3 100644
--- a/include/linux/sunrpc/clnt.h
+++ b/include/linux/sunrpc/clnt.h
@@ -172,6 +172,7 @@ size_t rpc_max_payload(struct rpc_clnt *);
void rpc_force_rebind(struct rpc_clnt *);
size_t rpc_peeraddr(struct rpc_clnt *, struct sockaddr *, size_t);
const char *rpc_peeraddr2str(struct rpc_clnt *, enum rpc_display_format_t);
+int rpc_localaddr(struct rpc_clnt *, struct sockaddr *, size_t);

size_t rpc_ntop(const struct sockaddr *, char *, const size_t);
size_t rpc_pton(const char *, const size_t,
diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
index ac2d29e..efb5959 100644
--- a/net/sunrpc/clnt.c
+++ b/net/sunrpc/clnt.c
@@ -944,6 +944,155 @@ const char *rpc_peeraddr2str(struct rpc_clnt *clnt,
}
EXPORT_SYMBOL_GPL(rpc_peeraddr2str);

+static const struct sockaddr_in rpc_inaddr_loopback = {
+ .sin_family = AF_INET,
+ .sin_addr.s_addr = htonl(INADDR_ANY),
+};
+
+static const struct sockaddr_in6 rpc_in6addr_loopback = {
+ .sin6_family = AF_INET6,
+ .sin6_addr = IN6ADDR_ANY_INIT,
+};
+
+/*
+ * Try a getsockname() on a connected datagram socket. Using a
+ * connected datagram socket prevents leaving a socket in TIME_WAIT.
+ * This conserves the ephemeral port number space.
+ *
+ * Returns zero and fills in "buf" if successful; otherwise, a
+ * negative errno is returned.
+ */
+static int rpc_sockname(struct net *net, struct sockaddr *sap, size_t salen,
+ struct sockaddr *buf, int buflen)
+{
+ struct socket *sock;
+ int err;
+
+ err = __sock_create(net, sap->sa_family,
+ SOCK_DGRAM, IPPROTO_UDP, &sock, 1);
+ if (err < 0) {
+ dprintk("RPC: can't create UDP socket (%d)\n", err);
+ goto out;
+ }
+
+ switch (sap->sa_family) {
+ case AF_INET:
+ err = kernel_bind(sock,
+ (struct sockaddr *)&rpc_inaddr_loopback,
+ sizeof(rpc_inaddr_loopback));
+ break;
+ case AF_INET6:
+ err = kernel_bind(sock,
+ (struct sockaddr *)&rpc_in6addr_loopback,
+ sizeof(rpc_in6addr_loopback));
+ break;
+ default:
+ err = -EAFNOSUPPORT;
+ goto out;
+ }
+ if (err < 0) {
+ dprintk("RPC: can't bind UDP socket (%d)\n", err);
+ goto out_release;
+ }
+
+ err = kernel_connect(sock, sap, salen, 0);
+ if (err < 0) {
+ dprintk("RPC: can't connect UDP socket (%d)\n", err);
+ goto out_release;
+ }
+
+ err = kernel_getsockname(sock, buf, &buflen);
+ if (err < 0) {
+ dprintk("RPC: getsockname failed (%d)\n", err);
+ goto out_release;
+ }
+
+ err = 0;
+ if (buf->sa_family == AF_INET6) {
+ struct sockaddr_in6 *sin6 = (struct sockaddr_in6 *)buf;
+ sin6->sin6_scope_id = 0;
+ }
+ dprintk("RPC: %s succeeded\n", __func__);
+
+out_release:
+ sock_release(sock);
+out:
+ return err;
+}
+
+/*
+ * Scraping a connected socket failed, so we don't have a useable
+ * local address. Fallback: generate an address that will prevent
+ * the server from calling us back.
+ *
+ * Returns zero and fills in "buf" if successful; otherwise, a
+ * negative errno is returned.
+ */
+static int rpc_anyaddr(int family, struct sockaddr *buf, size_t buflen)
+{
+ switch (family) {
+ case AF_INET:
+ if (buflen < sizeof(rpc_inaddr_loopback))
+ return -EINVAL;
+ memcpy(buf, &rpc_inaddr_loopback,
+ sizeof(rpc_inaddr_loopback));
+ break;
+ case AF_INET6:
+ if (buflen < sizeof(rpc_in6addr_loopback))
+ return -EINVAL;
+ memcpy(buf, &rpc_in6addr_loopback,
+ sizeof(rpc_in6addr_loopback));
+ default:
+ dprintk("RPC: %s: address family not supported\n",
+ __func__);
+ return -EAFNOSUPPORT;
+ }
+ dprintk("RPC: %s: succeeded\n", __func__);
+ return 0;
+}
+
+/**
+ * rpc_localaddr - discover local endpoint address for an RPC client
+ * @clnt: RPC client structure
+ * @buf: target buffer
+ * @buflen: size of target buffer, in bytes
+ *
+ * Returns zero and fills in "buf" and "buflen" if successful;
+ * otherwise, a negative errno is returned.
+ *
+ * This works even if the underlying transport is not currently connected,
+ * or if the upper layer never previously provided a source address.
+ *
+ * The results of this function call are transient: multiple calls in
+ * succession may give different results, depending on how local
+ * networking configuration changes over time.
+ */
+int rpc_localaddr(struct rpc_clnt *clnt, struct sockaddr *buf, size_t buflen)
+{
+ struct sockaddr_storage address;
+ struct sockaddr *sap = (struct sockaddr *)&address;
+ struct rpc_xprt *xprt;
+ struct net *net;
+ size_t salen;
+ int err;
+
+ rcu_read_lock();
+ xprt = rcu_dereference(clnt->cl_xprt);
+ salen = xprt->addrlen;
+ memcpy(sap, &xprt->addr, salen);
+ net = get_net(xprt->xprt_net);
+ rcu_read_unlock();
+
+ rpc_set_port(sap, 0);
+ err = rpc_sockname(net, sap, salen, buf, buflen);
+ put_net(net);
+ if (err != 0)
+ /* Couldn't discover local address, return ANYADDR */
+ return rpc_anyaddr(sap->sa_family, buf, buflen);
+ return 0;
+}
+EXPORT_SYMBOL_GPL(rpc_localaddr);
+
void
rpc_setbufsize(struct rpc_clnt *clnt, unsigned int sndsize, unsigned int rcvsize)
{


2011-05-09 19:38:46

by Chuck Lever

[permalink] [raw]
Subject: [PATCH 15/16] NFS: Add migration recovery callouts in nfs4proc.c

Finally, to enable support for migration, insert migration recovery
callouts in the synchronous and asynchronous error handling paths for
NFSv4 procedures.

Signed-off-by: Chuck Lever <[email protected]>
---

fs/nfs/nfs4proc.c | 67 +++++++++++++++++++++++++++++++++++++++--------------
1 files changed, 49 insertions(+), 18 deletions(-)

diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 1168bb2..a545d46 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -72,10 +72,11 @@ struct nfs4_opendata;
static int _nfs4_proc_open(struct nfs4_opendata *data);
static int _nfs4_recover_proc_open(struct nfs4_opendata *data);
static int nfs4_do_fsinfo(struct nfs_server *, struct nfs_fh *, struct nfs_fsinfo *);
-static int nfs4_async_handle_error(struct rpc_task *, const struct nfs_server *, struct nfs4_state *);
-static int _nfs4_proc_lookup(struct rpc_clnt *client, struct inode *dir,
- const struct qstr *name, struct nfs_fh *fhandle,
- struct nfs_fattr *fattr);
+static int nfs4_async_handle_error(struct rpc_task *, struct nfs_server *,
+ struct nfs4_state *);
+static int _nfs4_proc_lookup(struct rpc_clnt *clnt, struct inode *dir,
+ const struct qstr *name, struct nfs_fh *fhandle,
+ struct nfs_fattr *fattr);
static int _nfs4_proc_getattr(struct nfs_server *server, struct nfs_fh *fhandle, struct nfs_fattr *fattr);
static int nfs4_do_setattr(struct inode *inode, struct rpc_cred *cred,
struct nfs_fattr *fattr, struct iattr *sattr,
@@ -247,10 +248,17 @@ static int nfs4_delay(struct rpc_clnt *clnt, long *timeout)
return res;
}

-/* This is the error handling routine for processes that are allowed
- * to sleep.
+/**
+ * nfs4_handle_exception - Common error handling for callers allowed to sleep
+ *
+ * @server: local state context for the server
+ * @errorcode: NFS4ERR value returned from the server
+ * @exception: exception handling state
+ *
+ * Returns zero on success, or a negative errno value.
*/
-static int nfs4_handle_exception(struct nfs_server *server, int errorcode, struct nfs4_exception *exception)
+static int nfs4_handle_exception(struct nfs_server *server, int errorcode,
+ struct nfs4_exception *exception)
{
struct nfs_client *clp = server->nfs_client;
struct nfs4_state *state = exception->state;
@@ -286,6 +294,9 @@ static int nfs4_handle_exception(struct nfs_server *server, int errorcode, struc
exception->retry = 1;
break;
#endif /* defined(CONFIG_NFS_V4_1) */
+ case -NFS4ERR_MOVED:
+ nfs4_schedule_migration_recovery(server);
+ goto wait_on_recovery;
case -NFS4ERR_FILE_OPEN:
if (exception->timeout > HZ) {
/* We have retried a decent amount, time to
@@ -2283,10 +2294,14 @@ static int nfs4_get_referral(struct inode *dir, const struct qstr *name, struct
status = nfs4_proc_fs_locations(dir, name, locations, page);
if (status != 0)
goto out;
- /* Make sure server returned a different fsid for the referral */
+
+ /*
+ * If the fsid didn't change, this is a migration event, not a
+ * referral. Cause us to drop into the exception handler, which
+ * will kick off migration recovery.
+ */
if (nfs_fsid_equal(&NFS_SERVER(dir)->fsid, &locations->fattr.fsid)) {
- dprintk("%s: server did not return a different fsid for a referral at %s\n", __func__, name->name);
- status = -EIO;
+ status = -NFS4ERR_MOVED;
goto out;
}

@@ -3655,8 +3670,18 @@ static int nfs4_proc_set_acl(struct inode *inode, const void *buf, size_t buflen
return err;
}

-static int
-nfs4_async_handle_error(struct rpc_task *task, const struct nfs_server *server, struct nfs4_state *state)
+/**
+ * nfs4_async_handle_error - Common error handling for callers who cannot sleep
+ *
+ * @task: active RPC task
+ * @server: local state for the server
+ * @state: NFSv4 state for active operation
+ *
+ * Returns zero on success, or a negative errno value.
+ */
+static int nfs4_async_handle_error(struct rpc_task *task,
+ struct nfs_server *server,
+ struct nfs4_state *state)
{
struct nfs_client *clp = server->nfs_client;

@@ -3686,19 +3711,24 @@ nfs4_async_handle_error(struct rpc_task *task, const struct nfs_server *server,
dprintk("%s ERROR %d, Reset session\n", __func__,
task->tk_status);
nfs4_schedule_session_recovery(clp->cl_session);
- task->tk_status = 0;
- return -EAGAIN;
+ goto restart_call;
#endif /* CONFIG_NFS_V4_1 */
+ case -NFS4ERR_MOVED:
+ rpc_sleep_on(&clp->cl_rpcwaitq, task, NULL);
+ nfs4_schedule_migration_recovery(server);
+ if (test_bit(NFS4CLNT_MANAGER_RUNNING,
+ &clp->cl_state) == 0)
+ rpc_wake_up_queued_task(&clp->cl_rpcwaitq,
+ task);
+ goto restart_call;
case -NFS4ERR_DELAY:
nfs_inc_server_stats(server, NFSIOS_DELAY);
case -NFS4ERR_GRACE:
case -EKEYEXPIRED:
rpc_delay(task, NFS4_POLL_RETRY_MAX);
- task->tk_status = 0;
- return -EAGAIN;
+ goto restart_call;
case -NFS4ERR_OLD_STATEID:
- task->tk_status = 0;
- return -EAGAIN;
+ goto restart_call;
}
task->tk_status = nfs4_map_errors(task->tk_status);
return 0;
@@ -3706,6 +3736,7 @@ wait_on_recovery:
rpc_sleep_on(&clp->cl_rpcwaitq, task, NULL);
if (test_bit(NFS4CLNT_MANAGER_RUNNING, &clp->cl_state) == 0)
rpc_wake_up_queued_task(&clp->cl_rpcwaitq, task);
+restart_call:
task->tk_status = 0;
return -EAGAIN;
}


2011-05-09 19:37:12

by Chuck Lever

[permalink] [raw]
Subject: [PATCH 06/16] NFS: Add a client-side function to display file handles

For debugging, introduce a simplistic function to print file handles
on the system console. It's hooked into the dprintk debugging
facility, but you can call _nfs_display_fhandle() directly if you
want to print a handle unconditionally.

Signed-off-by: Chuck Lever <[email protected]>
---

fs/nfs/inode.c | 45 +++++++++++++++++++++++++++++++++++++++++++++
include/linux/nfs_fs.h | 14 ++++++++++++++
2 files changed, 59 insertions(+), 0 deletions(-)

diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
index 57bb31a..9a444d0 100644
--- a/fs/nfs/inode.c
+++ b/fs/nfs/inode.c
@@ -1039,6 +1039,51 @@ struct nfs_fh *nfs_alloc_fhandle(void)
}

/**
+ * _nfs_display_fhandle - display an NFS file handle on the console
+ *
+ * @fh: file handle to display
+ * @caption: display caption
+ *
+ * For debugging only.
+ */
+#ifdef RPC_DEBUG
+void _nfs_display_fhandle(const struct nfs_fh *fh, const char *caption)
+{
+ unsigned short i;
+
+ if (fh->size == 0 || fh == NULL) {
+ printk(KERN_NOTICE "%s at %p is empty\n", caption, fh);
+ return;
+ }
+
+ printk(KERN_NOTICE "%s at %p is %u bytes:\n", caption, fh, fh->size);
+ for (i = 0; i < fh->size; i += 16) {
+ __be32 *pos = (__be32 *)&fh->data[i];
+
+ switch ((fh->size - i - 1) >> 2) {
+ case 0:
+ printk(KERN_NOTICE " %08x",
+ be32_to_cpup(pos));
+ break;
+ case 1:
+ printk(KERN_NOTICE " %08x %08x\n",
+ be32_to_cpup(pos), be32_to_cpup(pos + 1));
+ break;
+ case 2:
+ printk(KERN_NOTICE " %08x %08x %08x\n",
+ be32_to_cpup(pos), be32_to_cpup(pos + 1),
+ be32_to_cpup(pos + 2));
+ break;
+ default:
+ printk(KERN_NOTICE " %08x %08x %08x %08x\n",
+ be32_to_cpup(pos), be32_to_cpup(pos + 1),
+ be32_to_cpup(pos + 2), be32_to_cpup(pos + 3));
+ }
+ }
+}
+#endif
+
+/**
* nfs_inode_attrs_need_update - check if the inode attributes need updating
* @inode - pointer to inode
* @fattr - attributes
diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h
index 1b93b9c..0e3eb70 100644
--- a/include/linux/nfs_fs.h
+++ b/include/linux/nfs_fs.h
@@ -394,6 +394,20 @@ static inline void nfs_free_fhandle(const struct nfs_fh *fh)
kfree(fh);
}

+#ifdef RPC_DEBUG
+extern void _nfs_display_fhandle(const struct nfs_fh *fh, const char *caption);
+#define nfs_display_fhandle(fh, caption) \
+ do { \
+ if (unlikely(nfs_debug & NFSDBG_FACILITY)) \
+ _nfs_display_fhandle(fh, caption); \
+ } while (0)
+#else
+static inline void nfs_display_fhandle(const struct nfs_fh *fh,
+ const char *caption)
+{
+}
+#endif
+
/*
* linux/fs/nfs/nfsroot.c
*/


2011-05-09 19:37:57

by Chuck Lever

[permalink] [raw]
Subject: [PATCH 10/16] NFS: Add infrastructure for updating callback data

Currently the NFS client creates the long-form client ID in the XDR
encoder for SETCLIENTID, just before it is sent to the server.

With transparent state migration, the long-form client ID used with
the source server must be sent to the destination server when the
client uses SETCLIENTID to update the destination server with fresh
callback information. If a new long-form client ID is used here, the
destination server will drop all the NFSv4 state the servers so
carefully migrated for us.

So, the client must preserve the short-form and long-form client ID
that are generated at SETCLIENTID/EXCHANGE_ID time. When it comes
time to update the callback information, the preserved client IDs
are used so the server doesn't drop state for this client.

Signed-off-by: Chuck Lever <[email protected]>
---

fs/nfs/client.c | 1 +
fs/nfs/nfs4_fs.h | 1 +
fs/nfs/nfs4proc.c | 29 ++++++++++++++++++++++++++---
fs/nfs/nfs4state.c | 1 +
include/linux/nfs_fs_sb.h | 1 +
5 files changed, 30 insertions(+), 3 deletions(-)

diff --git a/fs/nfs/client.c b/fs/nfs/client.c
index bf40649..536b0ba 100644
--- a/fs/nfs/client.c
+++ b/fs/nfs/client.c
@@ -231,6 +231,7 @@ static void nfs4_shutdown_client(struct nfs_client *clp)
nfs4_destroy_callback(clp);
if (__test_and_clear_bit(NFS_CS_IDMAP, &clp->cl_res_state))
nfs_idmap_delete(clp);
+ kfree(clp->cl_cached_clientid);

rpc_destroy_wait_queue(&clp->cl_rpcwaitq);
}
diff --git a/fs/nfs/nfs4_fs.h b/fs/nfs/nfs4_fs.h
index 4038c5b..1832fd6 100644
--- a/fs/nfs/nfs4_fs.h
+++ b/fs/nfs/nfs4_fs.h
@@ -48,6 +48,7 @@ enum nfs4_client_state {
NFS4CLNT_SESSION_RESET,
NFS4CLNT_RECALL_SLOT,
NFS4CLNT_LEASE_CONFIRM,
+ NFS4CLNT_UPDATE_CALLBACK,
};

enum nfs4_session_state {
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 57b7279..bb6b128 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -3736,7 +3736,11 @@ int nfs4_proc_setclientid(struct nfs_client *clp, u32 program,

for(;;) {
rcu_read_lock();
- setclientid.sc_name_len = scnprintf(setclientid.sc_name,
+ if (test_bit(NFS4CLNT_UPDATE_CALLBACK, &clp->cl_state)) {
+ strcpy(setclientid.sc_name, clp->cl_cached_clientid);
+ setclientid.sc_name_len = strlen(setclientid.sc_name);
+ } else {
+ setclientid.sc_name_len = scnprintf(setclientid.sc_name,
sizeof(setclientid.sc_name), "%s/%s %s %s %u",
clp->cl_ipaddr,
rpc_peeraddr2str(clp->cl_rpcclient,
@@ -3745,6 +3749,7 @@ int nfs4_proc_setclientid(struct nfs_client *clp, u32 program,
RPC_DISPLAY_PROTO),
clp->cl_rpcclient->cl_auth->au_ops->au_name,
clp->cl_id_uniquifier);
+ }
setclientid.sc_netid_len = scnprintf(setclientid.sc_netid,
sizeof(setclientid.sc_netid),
rpc_peeraddr2str(clp->cl_rpcclient,
@@ -3755,7 +3760,8 @@ int nfs4_proc_setclientid(struct nfs_client *clp, u32 program,
rcu_read_unlock();

status = rpc_call_sync(clp->cl_rpcclient, &msg, RPC_TASK_TIMEOUT);
- if (status != -NFS4ERR_CLID_INUSE)
+ if (clp->cl_cached_clientid != NULL ||
+ status != -NFS4ERR_CLID_INUSE)
break;
if (loop != 0) {
++clp->cl_id_uniquifier;
@@ -3764,6 +3770,13 @@ int nfs4_proc_setclientid(struct nfs_client *clp, u32 program,
++loop;
ssleep(clp->cl_lease_time / HZ + 1);
}
+
+ if (status == 0 &&
+ !test_bit(NFS4CLNT_UPDATE_CALLBACK, &clp->cl_state)) {
+ kfree(clp->cl_cached_clientid);
+ clp->cl_cached_clientid = kstrdup(setclientid.sc_name,
+ GFP_KERNEL);
+ }
return status;
}

@@ -4874,16 +4887,26 @@ int nfs4_proc_exchange_id(struct nfs_client *clp, struct rpc_cred *cred)
*p = htonl((u32)clp->cl_boot_time.tv_nsec);
args.verifier = &verifier;

- args.id_len = scnprintf(args.id, sizeof(args.id),
+ if (test_bit(NFS4CLNT_UPDATE_CALLBACK, &clp->cl_state)) {
+ strcpy(args.id, clp->cl_cached_clientid);
+ args.id_len = strlen(args.id);
+ } else {
+ args.id_len = scnprintf(args.id, sizeof(args.id),
"%s/%s.%s/%u",
clp->cl_ipaddr,
init_utsname()->nodename,
init_utsname()->domainname,
clp->cl_rpcclient->cl_auth->au_flavor);
+ }

status = rpc_call_sync(clp->cl_rpcclient, &msg, RPC_TASK_TIMEOUT);
if (!status)
status = nfs4_check_cl_exchange_flags(clp->cl_exchange_flags);
+ if (!status &&
+ !test_bit(NFS4CLNT_UPDATE_CALLBACK, &clp->cl_state)) {
+ kfree(clp->cl_cached_clientid);
+ clp->cl_cached_clientid = kstrdup(args.id, GFP_KERNEL);
+ }
dprintk("<-- %s status= %d\n", __func__, status);
return status;
}
diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
index f6b268f..3285e40 100644
--- a/fs/nfs/nfs4state.c
+++ b/fs/nfs/nfs4state.c
@@ -1646,6 +1646,7 @@ static void nfs4_state_manager(struct nfs_client *clp)
nfs_mark_client_ready(clp, status);
goto out_error;
}
+ clear_bit(NFS4CLNT_UPDATE_CALLBACK, &clp->cl_state);
clear_bit(NFS4CLNT_CHECK_LEASE, &clp->cl_state);
set_bit(NFS4CLNT_RECLAIM_REBOOT, &clp->cl_state);
pnfs_destroy_all_layouts(clp);
diff --git a/include/linux/nfs_fs_sb.h b/include/linux/nfs_fs_sb.h
index aa3a912..d0554c4 100644
--- a/include/linux/nfs_fs_sb.h
+++ b/include/linux/nfs_fs_sb.h
@@ -70,6 +70,7 @@ struct nfs_client {
char cl_ipaddr[48];
unsigned char cl_id_uniquifier;
u32 cl_cb_ident; /* v4.0 callback identifier */
+ char *cl_cached_clientid;
const struct nfs4_minor_version_ops *cl_mvops;

/* The sequence id to use for the next CREATE_SESSION */


2011-05-09 19:37:47

by Chuck Lever

[permalink] [raw]
Subject: [PATCH 09/16] NFS: Introduce nfs4_proc_get_mig_status()

The nfs4_proc_fs_locations() function is invoked during referral
processing to perform a GETATTR(fs_locations) on an object's parent
directory in order to discover the target of the referral. It
performs a LOOKUP, so the client needs to know the parent's file
handle a priori.

o During migration recovery, we need to probe fs_locations informa-
tion on an FSID's root directory. The "parent" of a root
directory is not available for a LOOKUP operation.

o Recovering from NFS4ERR_LEASE_MOVED is a process of walking over a
list of known FSIDs that reside on the server, and probing whether
they have migrated. Once the server has detected that the client
has probed all migrated file systems, it stops returning
NFS4ERR_LEASE_MOVED.

A minor version zero server needs to know what client ID is
requesting fs_locations information so it can clear the flag that
forces it to continue returning NFS4ERR_LEASE_MOVED. This flag is
set per client ID and per FSID. However, the client ID is not an
argument of either the PUTFH or GETATTR operations. Later minor
versions have client ID information embedded in the underlying
session.

Therefore, by convention, minor version zero clients send a RENEW
operation in the same compound as the GETATTR(fs_locations), since
RENEW has one argument: the short-form client ID. This allows a
minor version zero server to identify correctly the client that is
probing for a migration.

o For various subtle reasons, servers can't return NFS4ERR_DELAY to
state-changing operations while they are actually doing the
migration. Instead, they put off clients during the brief window
when the data is actually unavailable by returning DELAY to the
GETATTR(fs_locations), since GETATTR doesn't mutate NFSv4 state.
So our client must be able to deal properly with an NFS4ERR_DELAY
reply during a migration status probe.

To handle all this random wackiness, we need a variant of
nfs4_proc_fs_locations() that can operate directly on a target file
handle, rather than taking a name and doing a LOOKUP as part of
retrieving fs_locations from the server. It also must properly append
a RENEW operation as needed, and it should retry NFS4ERR_DELAY as
appropriate.

Introduce nfs4_proc_get_mig_status() to fill this role, and add the
requisite XDR encoding and decoding paraphenalia. Under the covers,
both nfs4_proc_foo functions use the FS_LOCATIONS XDR routines. This
is a little awkward, but is necessary because it's currently not
straightforward to add new NFSv4 procedures due to a bug in nfsstat.

Signed-off-by: Chuck Lever <[email protected]>
---

fs/nfs/nfs4_fs.h | 2 +
fs/nfs/nfs4proc.c | 89 ++++++++++++++++++++++++++++++++++++++++++++++-
fs/nfs/nfs4xdr.c | 48 ++++++++++++++++++-------
include/linux/nfs_xdr.h | 7 +++-
4 files changed, 130 insertions(+), 16 deletions(-)

diff --git a/fs/nfs/nfs4_fs.h b/fs/nfs/nfs4_fs.h
index c4a6983..4038c5b 100644
--- a/fs/nfs/nfs4_fs.h
+++ b/fs/nfs/nfs4_fs.h
@@ -242,6 +242,8 @@ extern int nfs4_do_close(struct path *path, struct nfs4_state *state, gfp_t gfp_
extern int nfs4_server_capabilities(struct nfs_server *server, struct nfs_fh *fhandle);
extern int nfs4_proc_fs_locations(struct inode *dir, const struct qstr *name,
struct nfs4_fs_locations *fs_locations, struct page *page);
+extern int nfs4_proc_get_mig_status(struct nfs_server *server,
+ struct nfs4_fs_locations *fs_locations, struct page *page);
extern void nfs4_release_lockowner(const struct nfs4_lock_state *);
extern const struct xattr_handler *nfs4_xattr_handlers[];

diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 641691c..57b7279 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -4666,6 +4666,16 @@ static void nfs_fixup_referral_attributes(struct nfs_fattr *fattr)
fattr->nlink = 2;
}

+/**
+ * nfs4_proc_fs_locations - retrieve locations array for a named object
+ *
+ * @dir: inode of parent directory
+ * @name: qstr containing name of object to query
+ * @locations: result of query
+ * @page: buffer
+ *
+ * Returns zero on success, or a negative errno code
+ */
int nfs4_proc_fs_locations(struct inode *dir, const struct qstr *name,
struct nfs4_fs_locations *fs_locations, struct page *page)
{
@@ -4690,16 +4700,91 @@ int nfs4_proc_fs_locations(struct inode *dir, const struct qstr *name,
};
int status;

- dprintk("%s: start\n", __func__);
+ dprintk("--> %s: FSID %llx:%llx on \"%s\"\n", __func__,
+ (unsigned long long)server->fsid.major,
+ (unsigned long long)server->fsid.minor,
+ server->nfs_client->cl_hostname);
nfs_fattr_init(&fs_locations->fattr);
fs_locations->server = server;
fs_locations->nlocations = 0;
status = nfs4_call_sync(server->client, server, &msg, &args.seq_args, &res.seq_res, 0);
nfs_fixup_referral_attributes(&fs_locations->fattr);
- dprintk("%s: returned status = %d\n", __func__, status);
+ dprintk("<-- %s status=%d\n", __func__, status);
+ return status;
+}
+
+static int _nfs4_proc_get_mig_status(struct nfs_server *server,
+ struct nfs4_fs_locations *locations,
+ struct page *page)
+{
+ u32 bitmask[2] = {
+ [0] = FATTR4_WORD0_FSID |
+ FATTR4_WORD0_FS_LOCATIONS,
+ };
+ struct nfs4_fs_locations_arg args = {
+ .client = server->nfs_client,
+ .fh = server->rootfh,
+ .page = page,
+ .bitmask = bitmask,
+ };
+ struct nfs4_fs_locations_res res = {
+ .fs_locations = locations,
+ };
+ struct rpc_message msg = {
+ .rpc_proc = &nfs4_procedures[NFSPROC4_CLNT_FS_LOCATIONS],
+ .rpc_argp = &args,
+ .rpc_resp = &res,
+ };
+ int status;
+
+ dprintk("--> %s: FSID %llx:%llx on \"%s\"\n", __func__,
+ (unsigned long long)server->fsid.major,
+ (unsigned long long)server->fsid.minor,
+ args.client->cl_hostname);
+ nfs_display_fhandle(args.fh, "Probing file handle");
+
+ args.mig_status = res.mig_status = 1;
+ if (args.client->cl_mvops->minor_version == 0)
+ args.renew = res.renew = 1;
+ nfs_fattr_init(&locations->fattr);
+ locations->server = server;
+ locations->nlocations = 0;
+ status = nfs4_call_sync(server->client, server, &msg,
+ &args.seq_args, &res.seq_res, 0);
+ dprintk("<-- %s status=%d\n", __func__, status);
return status;
}

+/**
+ * nfs4_proc_get_mig_status - probe migration status of an export
+ *
+ * @server: local state for server
+ * @locations: result of query
+ * @page: buffer
+ *
+ * Returns zero on success, or a negative errno code
+ *
+ * Servers often return NFS4ERR_DELAY to our migration probe while a
+ * migration is in progress. This allows a server to delay client
+ * activity without returning NFS4ERR_DELAY on a sequence ID mutating
+ * operation.
+ */
+int nfs4_proc_get_mig_status(struct nfs_server *server,
+ struct nfs4_fs_locations *locations,
+ struct page *page)
+{
+ struct nfs4_exception exception = { };
+ int err;
+
+ do {
+ err = _nfs4_proc_get_mig_status(server, locations, page);
+ if (err != -NFS4ERR_DELAY)
+ break;
+ nfs4_handle_exception(server, err, &exception);
+ } while (exception.retry);
+ return err;
+}
+
static int _nfs4_proc_secinfo(struct inode *dir, const struct qstr *name, struct nfs4_secinfo_flavors *flavors)
{
int status;
diff --git a/fs/nfs/nfs4xdr.c b/fs/nfs/nfs4xdr.c
index be70be9..efb6094 100644
--- a/fs/nfs/nfs4xdr.c
+++ b/fs/nfs/nfs4xdr.c
@@ -684,13 +684,15 @@ static int nfs4_stat_to_errno(int);
encode_sequence_maxsz + \
encode_putfh_maxsz + \
encode_lookup_maxsz + \
- encode_fs_locations_maxsz)
+ encode_fs_locations_maxsz + \
+ encode_renew_maxsz)
#define NFS4_dec_fs_locations_sz \
(compound_decode_hdr_maxsz + \
decode_sequence_maxsz + \
decode_putfh_maxsz + \
decode_lookup_maxsz + \
- decode_fs_locations_maxsz)
+ decode_fs_locations_maxsz + \
+ encode_renew_maxsz)
#define NFS4_enc_secinfo_sz (compound_encode_hdr_maxsz + \
encode_sequence_maxsz + \
encode_putfh_maxsz + \
@@ -2529,11 +2531,20 @@ static void nfs4_xdr_enc_fs_locations(struct rpc_rqst *req,

encode_compound_hdr(xdr, req, &hdr);
encode_sequence(xdr, &args->seq_args, &hdr);
- encode_putfh(xdr, args->dir_fh, &hdr);
- encode_lookup(xdr, args->name, &hdr);
- replen = hdr.replen; /* get the attribute into args->page */
- encode_fs_locations(xdr, args->bitmask, &hdr);
+ if (args->mig_status) {
+ encode_putfh(xdr, args->fh, &hdr);
+ replen = hdr.replen;
+ encode_fs_locations(xdr, args->bitmask, &hdr);
+ if (args->renew)
+ encode_renew(xdr, args->client, &hdr);
+ } else {
+ encode_putfh(xdr, args->dir_fh, &hdr);
+ encode_lookup(xdr, args->name, &hdr);
+ replen = hdr.replen;
+ encode_fs_locations(xdr, args->bitmask, &hdr);
+ }

+ /* Set up reply kvec to capture returned fs_locations array. */
xdr_inline_pages(&req->rq_rcv_buf, replen << 2, &args->page,
0, PAGE_SIZE);
encode_nops(&hdr);
@@ -6134,13 +6145,24 @@ static int nfs4_xdr_dec_fs_locations(struct rpc_rqst *req,
status = decode_putfh(xdr);
if (status)
goto out;
- status = decode_lookup(xdr);
- if (status)
- goto out;
- xdr_enter_page(xdr, PAGE_SIZE);
- status = decode_getfattr(xdr, &res->fs_locations->fattr,
- res->fs_locations->server,
- !RPC_IS_ASYNC(req->rq_task));
+ if (res->mig_status) {
+ xdr_enter_page(xdr, PAGE_SIZE);
+ status = decode_getfattr(xdr, &res->fs_locations->fattr,
+ res->fs_locations->server,
+ !RPC_IS_ASYNC(req->rq_task));
+ if (status)
+ goto out;
+ if (res->renew)
+ status = decode_renew(xdr);
+ } else {
+ status = decode_lookup(xdr);
+ if (status)
+ goto out;
+ xdr_enter_page(xdr, PAGE_SIZE);
+ status = decode_getfattr(xdr, &res->fs_locations->fattr,
+ res->fs_locations->server,
+ !RPC_IS_ASYNC(req->rq_task));
+ }
out:
return status;
}
diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h
index 26165a5..22e34d3 100644
--- a/include/linux/nfs_xdr.h
+++ b/include/linux/nfs_xdr.h
@@ -957,16 +957,21 @@ struct nfs4_fs_locations {
};

struct nfs4_fs_locations_arg {
+ const struct nfs_client *client;
const struct nfs_fh *dir_fh;
+ const struct nfs_fh *fh;
const struct qstr *name;
struct page *page;
const u32 *bitmask;
- struct nfs4_sequence_args seq_args;
+ struct nfs4_sequence_args seq_args;
+ unsigned char mig_status:1, renew:1;
};

struct nfs4_fs_locations_res {
struct nfs4_fs_locations *fs_locations;
struct nfs4_sequence_res seq_res;
+ unsigned char mig_status:1,
+ renew:1;
};

struct nfs4_secinfo_oid {


2011-05-09 19:38:56

by Chuck Lever

[permalink] [raw]
Subject: [PATCH 16/16] NFS: Implement support for NFS4ERR_LEASE_MOVED

To recover from NFS4ERR_LEASE_MOVED, walk the cl_superblocks list and
invoke nfs4_try_migration() on each server's root file handle.
nfs4_try_migration() should automatically determine whether that file
system has migrated, and then perform recovery for it.

The per-filesystem migration probe also informs minor version zero
servers that this client should no longer receive NFS4ERR_LEASE_MOVED.

Signed-off-by: Chuck Lever <[email protected]>
---

fs/nfs/client.c | 1 +
fs/nfs/nfs4_fs.h | 2 +
fs/nfs/nfs4proc.c | 11 ++++++++
fs/nfs/nfs4state.c | 63 ++++++++++++++++++++++++++++++++++++++++++++-
include/linux/nfs_fs_sb.h | 2 +
5 files changed, 77 insertions(+), 2 deletions(-)

diff --git a/fs/nfs/client.c b/fs/nfs/client.c
index 2f5e29f..b89af4d 100644
--- a/fs/nfs/client.c
+++ b/fs/nfs/client.c
@@ -188,6 +188,7 @@ static struct nfs_client *nfs_alloc_client(const struct nfs_client_initdata *cl_
if (cl_init->long_clientid != NULL)
clp->cl_cached_clientid = kstrdup(cl_init->long_clientid,
GFP_KERNEL);
+ clp->cl_mig_counter = 1;
#endif
cred = rpc_lookup_machine_cred();
if (!IS_ERR(cred))
diff --git a/fs/nfs/nfs4_fs.h b/fs/nfs/nfs4_fs.h
index c3e8641..2ad6c9b 100644
--- a/fs/nfs/nfs4_fs.h
+++ b/fs/nfs/nfs4_fs.h
@@ -51,6 +51,7 @@ enum nfs4_client_state {
NFS4CLNT_UPDATE_CALLBACK,
NFS4CLNT_CLONED_CLIENT,
NFS4CLNT_MOVED,
+ NFS4CLNT_LEASE_MOVED,
};

enum nfs4_session_state {
@@ -351,6 +352,7 @@ extern void nfs4_close_sync(struct path *, struct nfs4_state *, fmode_t);
extern void nfs4_state_set_mode_locked(struct nfs4_state *, fmode_t);
extern void nfs4_schedule_lease_recovery(struct nfs_client *);
extern void nfs4_schedule_migration_recovery(struct nfs_server *);
+extern void nfs4_schedule_lease_moved_recovery(struct nfs_client *);
extern void nfs4_schedule_state_manager(struct nfs_client *);
extern void nfs4_schedule_stateid_recovery(const struct nfs_server *, struct nfs4_state *);
extern void nfs41_handle_sequence_flag_errors(struct nfs_client *clp, u32 flags);
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index a545d46..f4e07ba 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -297,6 +297,9 @@ static int nfs4_handle_exception(struct nfs_server *server, int errorcode,
case -NFS4ERR_MOVED:
nfs4_schedule_migration_recovery(server);
goto wait_on_recovery;
+ case -NFS4ERR_LEASE_MOVED:
+ nfs4_schedule_lease_moved_recovery(clp);
+ goto wait_on_recovery;
case -NFS4ERR_FILE_OPEN:
if (exception->timeout > HZ) {
/* We have retried a decent amount, time to
@@ -3721,6 +3724,14 @@ static int nfs4_async_handle_error(struct rpc_task *task,
rpc_wake_up_queued_task(&clp->cl_rpcwaitq,
task);
goto restart_call;
+ case -NFS4ERR_LEASE_MOVED:
+ rpc_sleep_on(&clp->cl_rpcwaitq, task, NULL);
+ nfs4_schedule_lease_moved_recovery(clp);
+ if (test_bit(NFS4CLNT_MANAGER_RUNNING,
+ &clp->cl_state) == 0)
+ rpc_wake_up_queued_task(&clp->cl_rpcwaitq,
+ task);
+ goto restart_call;
case -NFS4ERR_DELAY:
nfs_inc_server_stats(server, NFSIOS_DELAY);
case -NFS4ERR_GRACE:
diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
index c7b414a..8bef9d8 100644
--- a/fs/nfs/nfs4state.c
+++ b/fs/nfs/nfs4state.c
@@ -1071,7 +1071,32 @@ void nfs4_schedule_migration_recovery(struct nfs_server *server)
dprintk("<-- %s\n", __func__);
}

-static int nfs4_state_mark_reclaim_reboot(struct nfs_client *clp, struct nfs4_state *state)
+/**
+ * nfs4_schedule_lease_moved_recovery - start lease moved recovery
+ *
+ * @clp: nfs_client of server that may have migrated file systems
+ *
+ */
+void nfs4_schedule_lease_moved_recovery(struct nfs_client *clp)
+{
+ dprintk("--> %s: \"%s\" (client ID %llx)\n",
+ __func__, clp->cl_hostname, clp->cl_clientid);
+
+ if (test_and_set_bit(NFS4CLNT_LEASE_MOVED, &clp->cl_state) == 0)
+ nfs4_schedule_state_manager(clp);
+
+ dprintk("<-- %s\n", __func__);
+}
+
+/**
+ * nfs4_state_mark_reclaim_reboot - Mark nfs_client for reboot recovery
+ * @clp: nfs_client of server that may have rebooted
+ * @state: state flags to test
+ *
+ * Returns 1 if reboot recovery is needed.
+ */
+int nfs4_state_mark_reclaim_reboot(struct nfs_client *clp,
+ struct nfs4_state *state)
{

set_bit(NFS_STATE_RECLAIM_REBOOT, &state->flags);
@@ -1384,7 +1409,6 @@ static int nfs4_recovery_handle_error(struct nfs_client *clp, int error)
nfs4_state_end_reclaim_reboot(clp);
return 0;
case -NFS4ERR_STALE_CLIENTID:
- case -NFS4ERR_LEASE_MOVED:
set_bit(NFS4CLNT_LEASE_EXPIRED, &clp->cl_state);
nfs4_state_clear_reclaim_reboot(clp);
nfs4_state_start_reclaim_reboot(clp);
@@ -1597,6 +1621,37 @@ out_err:
kfree(locations);
}

+static void nfs4_handle_lease_moved(struct nfs_client *clp)
+{
+ struct nfs_server *server;
+
+ dprintk("--> %s: \"%s\" (client ID %llx)\n",
+ __func__, clp->cl_hostname, clp->cl_clientid);
+
+ /*
+ * rcu_read_lock() must be dropped before trying each individual
+ * migration. cl_mig_counter is used to skip servers that have
+ * already been visited for this lease_moved event when the list
+ * walk is restarted.
+ */
+ clp->cl_mig_counter++;
+
+restart:
+ rcu_read_lock();
+ list_for_each_entry_rcu(server, &clp->cl_superblocks, client_link)
+ if (server->mig_counter != clp->cl_mig_counter) {
+ server->mig_counter = clp->cl_mig_counter;
+ rcu_read_unlock();
+ nfs4_try_migration(server);
+ /* Ask the server if there's more work to do */
+ if (nfs4_check_lease(clp) == NFS4ERR_LEASE_MOVED)
+ goto restart;
+ break;
+ }
+ rcu_read_unlock();
+ dprintk("<-- %s\n", __func__);
+}
+
#ifdef CONFIG_NFS_V4_1
void nfs4_schedule_session_recovery(struct nfs4_session *session)
{
@@ -1814,6 +1869,10 @@ static void nfs4_state_manager(struct nfs_client *clp)
nfs4_try_migration(clp->cl_moved_server);
continue;
}
+ if (test_and_clear_bit(NFS4CLNT_LEASE_MOVED, &clp->cl_state)) {
+ nfs4_handle_lease_moved(clp);
+ continue;
+ }

/* First recover reboot state... */
if (test_bit(NFS4CLNT_RECLAIM_REBOOT, &clp->cl_state)) {
diff --git a/include/linux/nfs_fs_sb.h b/include/linux/nfs_fs_sb.h
index 091abf0..58050db 100644
--- a/include/linux/nfs_fs_sb.h
+++ b/include/linux/nfs_fs_sb.h
@@ -60,6 +60,7 @@ struct nfs_client {

/* accessed only when NFS4CLNT_MOVED bit is set */
struct nfs_server * cl_moved_server;
+ unsigned long cl_mig_counter;

/* used for the setclientid verifier */
struct timespec cl_boot_time;
@@ -156,6 +157,7 @@ struct nfs_server {
struct list_head delegations;
void (*destroy)(struct nfs_server *);
struct nfs_fh *rootfh;
+ unsigned long mig_counter;

atomic_t active; /* Keep trace of any activity to this server */



2011-05-09 19:37:22

by Chuck Lever

[permalink] [raw]
Subject: [PATCH 07/16] NFS: Save root file handle in nfs_server

Save each FSID's root directory file handle in the export's local
nfs_server structure on the client. This file handle can later be
used for migration recovery.

NB: Saving the root FH is done only for NFSv4 mounts.

Signed-off-by: Chuck Lever <[email protected]>
---

fs/nfs/client.c | 1 +
fs/nfs/getroot.c | 5 +++++
include/linux/nfs_fs_sb.h | 1 +
3 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/fs/nfs/client.c b/fs/nfs/client.c
index b55ef58..bf40649 100644
--- a/fs/nfs/client.c
+++ b/fs/nfs/client.c
@@ -1101,6 +1101,7 @@ void nfs_free_server(struct nfs_server *server)
nfs_put_client(server->nfs_client);

nfs_free_iostats(server->io_stats);
+ nfs_free_fhandle(server->rootfh);
bdi_destroy(&server->backing_dev_info);
kfree(server);
nfs_release_automount_timer();
diff --git a/fs/nfs/getroot.c b/fs/nfs/getroot.c
index dcb6154..8d4fbe1 100644
--- a/fs/nfs/getroot.c
+++ b/fs/nfs/getroot.c
@@ -232,6 +232,11 @@ struct dentry *nfs4_get_root(struct super_block *sb, struct nfs_fh *mntfh,
ret = ERR_CAST(inode);
goto out;
}
+ server->rootfh = nfs_alloc_fhandle();
+ if (server->rootfh != NULL) {
+ nfs_display_fhandle(mntfh, "nfs_get_root: new root FH");
+ nfs_copy_fh(server->rootfh, mntfh);
+ }

error = nfs_superblock_set_dummy_root(sb, inode);
if (error != 0) {
diff --git a/include/linux/nfs_fs_sb.h b/include/linux/nfs_fs_sb.h
index 87694ca..aa3a912 100644
--- a/include/linux/nfs_fs_sb.h
+++ b/include/linux/nfs_fs_sb.h
@@ -151,6 +151,7 @@ struct nfs_server {
#endif
struct list_head delegations;
void (*destroy)(struct nfs_server *);
+ struct nfs_fh *rootfh;

atomic_t active; /* Keep trace of any activity to this server */



2011-05-12 15:37:40

by Chuck Lever

[permalink] [raw]
Subject: Re: [PATCH 16/16] NFS: Implement support for NFS4ERR_LEASE_MOVED


On May 10, 2011, at 8:20 PM, Tom Haynes wrote:

> On 5/9/11 5:48 PM, Chuck Lever wrote:
>> On May 9, 2011, at 3:38 PM, Chuck Lever wrote:
>>
>>> To recover from NFS4ERR_LEASE_MOVED, walk the cl_superblocks list and
>>> invoke nfs4_try_migration() on each server's root file handle.
>>> nfs4_try_migration() should automatically determine whether that file
>>> system has migrated, and then perform recovery for it.
>>>
>>> The per-filesystem migration probe also informs minor version zero
>>> servers that this client should no longer receive NFS4ERR_LEASE_MOVED.
>> I see one thing that may be missing here.
>>
>> RFC 3530, section 8.14.3, lists OPEN, CLOSE, READ, WRITE, RENEW, LOCK, LOCKU, and LOCKT as the only procedures that return NFS4ERR_LEASE_MOVED. A code audit suggests that handling NFS4ERR_LEASE_MOVED in the two generic error handlers in nfs4proc.c is sufficient for all of these but RENEW and OPEN.
>
> 3503bis, section 13.4:
>
> | NFS4ERR_LEASE_MOVED | CLOSE, DELEGPURGE, DELEGRETURN, LOCK, |
> | | LOCKT, LOCKU, OPEN_CONFIRM, |
> | | OPEN_DOWNGRADE, READ, |
> | | RELEASE_LOCKOWNER, RENEW, SETATTR, |
> | | WRITE |
>
> And the equivalent text as to what is in 3530 is section 9.14.3:
>
> To accomplish this, all
> operations which implicitly renew leases for a client (such as OPEN,
> CLOSE, READ, WRITE, RENEW, LOCK, and others), will return the error
> NFS4ERR_LEASE_MOVED if responsibility for any of the leases to be
> renewed has been transferred to a new server.


DELEGPURGE doesn't appear to be implemented.

DELEGRETURN uses nfs4_handle_exception(), so it's covered.

OPEN_CONFIRM is in the same boat as OPEN. Error handling is ad hoc, both MOVED and LEASE_MOVED are ignored.

OPEN_DOWNGRADE is done only during close processing, and the returned status code is ignored.

RELEASE_LOCKOWNER ignores the returned status code. It's entirely asynchronous. This is probably OK to leave alone.

SETATTR uses nfs4_handle_exception(), so it's covered.


So we have OPEN, OPEN_CONFIRM, and OPEN_DOWNGRADE to consider.


>> RENEW is part of the lease_moved recovery logic, so I've left NFS4ERR_LEASE_MOVED handling there pretty sparse. The caller wants to deal with it.
>>
>> OPEN, as far as I can tell, won't deal with it at all, at least not directly. Should we look for it in nfs4_open_done() and invoke nfs4_schedule_lease_moved_recovery()?
>>
>>> Signed-off-by: Chuck Lever<[email protected]>
>>> ---
>>>
>>> fs/nfs/client.c | 1 +
>>> fs/nfs/nfs4_fs.h | 2 +
>>> fs/nfs/nfs4proc.c | 11 ++++++++
>>> fs/nfs/nfs4state.c | 63 ++++++++++++++++++++++++++++++++++++++++++++-
>>> include/linux/nfs_fs_sb.h | 2 +
>>> 5 files changed, 77 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/fs/nfs/client.c b/fs/nfs/client.c
>>> index 2f5e29f..b89af4d 100644
>>> --- a/fs/nfs/client.c
>>> +++ b/fs/nfs/client.c
>>> @@ -188,6 +188,7 @@ static struct nfs_client *nfs_alloc_client(const struct nfs_client_initdata *cl_
>>> if (cl_init->long_clientid != NULL)
>>> clp->cl_cached_clientid = kstrdup(cl_init->long_clientid,
>>> GFP_KERNEL);
>>> + clp->cl_mig_counter = 1;
>>> #endif
>>> cred = rpc_lookup_machine_cred();
>>> if (!IS_ERR(cred))
>>> diff --git a/fs/nfs/nfs4_fs.h b/fs/nfs/nfs4_fs.h
>>> index c3e8641..2ad6c9b 100644
>>> --- a/fs/nfs/nfs4_fs.h
>>> +++ b/fs/nfs/nfs4_fs.h
>>> @@ -51,6 +51,7 @@ enum nfs4_client_state {
>>> NFS4CLNT_UPDATE_CALLBACK,
>>> NFS4CLNT_CLONED_CLIENT,
>>> NFS4CLNT_MOVED,
>>> + NFS4CLNT_LEASE_MOVED,
>>> };
>>>
>>> enum nfs4_session_state {
>>> @@ -351,6 +352,7 @@ extern void nfs4_close_sync(struct path *, struct nfs4_state *, fmode_t);
>>> extern void nfs4_state_set_mode_locked(struct nfs4_state *, fmode_t);
>>> extern void nfs4_schedule_lease_recovery(struct nfs_client *);
>>> extern void nfs4_schedule_migration_recovery(struct nfs_server *);
>>> +extern void nfs4_schedule_lease_moved_recovery(struct nfs_client *);
>>> extern void nfs4_schedule_state_manager(struct nfs_client *);
>>> extern void nfs4_schedule_stateid_recovery(const struct nfs_server *, struct nfs4_state *);
>>> extern void nfs41_handle_sequence_flag_errors(struct nfs_client *clp, u32 flags);
>>> diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
>>> index a545d46..f4e07ba 100644
>>> --- a/fs/nfs/nfs4proc.c
>>> +++ b/fs/nfs/nfs4proc.c
>>> @@ -297,6 +297,9 @@ static int nfs4_handle_exception(struct nfs_server *server, int errorcode,
>>> case -NFS4ERR_MOVED:
>>> nfs4_schedule_migration_recovery(server);
>>> goto wait_on_recovery;
>>> + case -NFS4ERR_LEASE_MOVED:
>>> + nfs4_schedule_lease_moved_recovery(clp);
>>> + goto wait_on_recovery;
>>> case -NFS4ERR_FILE_OPEN:
>>> if (exception->timeout> HZ) {
>>> /* We have retried a decent amount, time to
>>> @@ -3721,6 +3724,14 @@ static int nfs4_async_handle_error(struct rpc_task *task,
>>> rpc_wake_up_queued_task(&clp->cl_rpcwaitq,
>>> task);
>>> goto restart_call;
>>> + case -NFS4ERR_LEASE_MOVED:
>>> + rpc_sleep_on(&clp->cl_rpcwaitq, task, NULL);
>>> + nfs4_schedule_lease_moved_recovery(clp);
>>> + if (test_bit(NFS4CLNT_MANAGER_RUNNING,
>>> + &clp->cl_state) == 0)
>>> + rpc_wake_up_queued_task(&clp->cl_rpcwaitq,
>>> + task);
>>> + goto restart_call;
>>> case -NFS4ERR_DELAY:
>>> nfs_inc_server_stats(server, NFSIOS_DELAY);
>>> case -NFS4ERR_GRACE:
>>> diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
>>> index c7b414a..8bef9d8 100644
>>> --- a/fs/nfs/nfs4state.c
>>> +++ b/fs/nfs/nfs4state.c
>>> @@ -1071,7 +1071,32 @@ void nfs4_schedule_migration_recovery(struct nfs_server *server)
>>> dprintk("<-- %s\n", __func__);
>>> }
>>>
>>> -static int nfs4_state_mark_reclaim_reboot(struct nfs_client *clp, struct nfs4_state *state)
>>> +/**
>>> + * nfs4_schedule_lease_moved_recovery - start lease moved recovery
>>> + *
>>> + * @clp: nfs_client of server that may have migrated file systems
>>> + *
>>> + */
>>> +void nfs4_schedule_lease_moved_recovery(struct nfs_client *clp)
>>> +{
>>> + dprintk("--> %s: \"%s\" (client ID %llx)\n",
>>> + __func__, clp->cl_hostname, clp->cl_clientid);
>>> +
>>> + if (test_and_set_bit(NFS4CLNT_LEASE_MOVED,&clp->cl_state) == 0)
>>> + nfs4_schedule_state_manager(clp);
>>> +
>>> + dprintk("<-- %s\n", __func__);
>>> +}
>>> +
>>> +/**
>>> + * nfs4_state_mark_reclaim_reboot - Mark nfs_client for reboot recovery
>>> + * @clp: nfs_client of server that may have rebooted
>>> + * @state: state flags to test
>>> + *
>>> + * Returns 1 if reboot recovery is needed.
>>> + */
>>> +int nfs4_state_mark_reclaim_reboot(struct nfs_client *clp,
>>> + struct nfs4_state *state)
>>> {
>>>
>>> set_bit(NFS_STATE_RECLAIM_REBOOT,&state->flags);
>>> @@ -1384,7 +1409,6 @@ static int nfs4_recovery_handle_error(struct nfs_client *clp, int error)
>>> nfs4_state_end_reclaim_reboot(clp);
>>> return 0;
>>> case -NFS4ERR_STALE_CLIENTID:
>>> - case -NFS4ERR_LEASE_MOVED:
>>> set_bit(NFS4CLNT_LEASE_EXPIRED,&clp->cl_state);
>>> nfs4_state_clear_reclaim_reboot(clp);
>>> nfs4_state_start_reclaim_reboot(clp);
>>> @@ -1597,6 +1621,37 @@ out_err:
>>> kfree(locations);
>>> }
>>>
>>> +static void nfs4_handle_lease_moved(struct nfs_client *clp)
>>> +{
>>> + struct nfs_server *server;
>>> +
>>> + dprintk("--> %s: \"%s\" (client ID %llx)\n",
>>> + __func__, clp->cl_hostname, clp->cl_clientid);
>>> +
>>> + /*
>>> + * rcu_read_lock() must be dropped before trying each individual
>>> + * migration. cl_mig_counter is used to skip servers that have
>>> + * already been visited for this lease_moved event when the list
>>> + * walk is restarted.
>>> + */
>>> + clp->cl_mig_counter++;
>>> +
>>> +restart:
>>> + rcu_read_lock();
>>> + list_for_each_entry_rcu(server,&clp->cl_superblocks, client_link)
>>> + if (server->mig_counter != clp->cl_mig_counter) {
>>> + server->mig_counter = clp->cl_mig_counter;
>>> + rcu_read_unlock();
>>> + nfs4_try_migration(server);
>>> + /* Ask the server if there's more work to do */
>>> + if (nfs4_check_lease(clp) == NFS4ERR_LEASE_MOVED)
>>> + goto restart;
>>> + break;
>>> + }
>>> + rcu_read_unlock();
>>> + dprintk("<-- %s\n", __func__);
>>> +}
>>> +
>>> #ifdef CONFIG_NFS_V4_1
>>> void nfs4_schedule_session_recovery(struct nfs4_session *session)
>>> {
>>> @@ -1814,6 +1869,10 @@ static void nfs4_state_manager(struct nfs_client *clp)
>>> nfs4_try_migration(clp->cl_moved_server);
>>> continue;
>>> }
>>> + if (test_and_clear_bit(NFS4CLNT_LEASE_MOVED,&clp->cl_state)) {
>>> + nfs4_handle_lease_moved(clp);
>>> + continue;
>>> + }
>>>
>>> /* First recover reboot state... */
>>> if (test_bit(NFS4CLNT_RECLAIM_REBOOT,&clp->cl_state)) {
>>> diff --git a/include/linux/nfs_fs_sb.h b/include/linux/nfs_fs_sb.h
>>> index 091abf0..58050db 100644
>>> --- a/include/linux/nfs_fs_sb.h
>>> +++ b/include/linux/nfs_fs_sb.h
>>> @@ -60,6 +60,7 @@ struct nfs_client {
>>>
>>> /* accessed only when NFS4CLNT_MOVED bit is set */
>>> struct nfs_server * cl_moved_server;
>>> + unsigned long cl_mig_counter;
>>>
>>> /* used for the setclientid verifier */
>>> struct timespec cl_boot_time;
>>> @@ -156,6 +157,7 @@ struct nfs_server {
>>> struct list_head delegations;
>>> void (*destroy)(struct nfs_server *);
>>> struct nfs_fh *rootfh;
>>> + unsigned long mig_counter;
>>>
>>> atomic_t active; /* Keep trace of any activity to this server */
>>>
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>>> the body of a message to [email protected]
>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com





2011-05-12 19:30:40

by Myklebust, Trond

[permalink] [raw]
Subject: Re: [PATCH 11/16] NFS: Add an API for cloning an nfs_client

On Thu, 2011-05-12 at 13:30 -0400, Chuck Lever wrote:
> On May 9, 2011, at 3:38 PM, Chuck Lever wrote:
>
> > After a migration event, we have to preserve the client ID the client
> > used with the source server, and introduce it to the destination
> > server, in case the migration transparently migrated state for the
> > migrating FSID.
> >
> > Note that our RENEW and SETCLIENTID procs both take an nfs_client as
> > an argument. Thus, after a successful migration recovery, we want to
> > have a nfs_client with the correct long-form and short-form client ID
> > for the destination server to pass these procs.
> >
> > To preserve the client IDs, we clone the source server's nfs_client.
> > The migrated FSID is moved from the original nfs_client to the cloned
> > one.
> >
> > This patch introduces an API for cloning an nfs_client and moving an
> > FSID to it.
> >
> > Signed-off-by: Chuck Lever <[email protected]>
> > ---
> >
> > fs/nfs/client.c | 69 +++++++++++++++++++++++++++++++++++++++++++++++++++--
> > fs/nfs/internal.h | 4 +++
> > 2 files changed, 71 insertions(+), 2 deletions(-)
> >
> > diff --git a/fs/nfs/client.c b/fs/nfs/client.c
> > index 536b0ba..2f5e29f 100644
> > --- a/fs/nfs/client.c
> > +++ b/fs/nfs/client.c
> > @@ -135,6 +135,7 @@ struct nfs_client_initdata {
> > const struct nfs_rpc_ops *rpc_ops;
> > int proto;
> > u32 minorversion;
> > + const char *long_clientid;
> > };
> >
> > /*
> > @@ -184,6 +185,9 @@ static struct nfs_client *nfs_alloc_client(const struct nfs_client_initdata *cl_
> > clp->cl_state = 1 << NFS4CLNT_LEASE_EXPIRED;
> > clp->cl_minorversion = cl_init->minorversion;
> > clp->cl_mvops = nfs_v4_minor_ops[cl_init->minorversion];
> > + if (cl_init->long_clientid != NULL)
> > + clp->cl_cached_clientid = kstrdup(cl_init->long_clientid,
> > + GFP_KERNEL);
> > #endif
> > cred = rpc_lookup_machine_cred();
> > if (!IS_ERR(cred))
> > @@ -476,6 +480,10 @@ static struct nfs_client *nfs_match_client(const struct nfs_client_initdata *dat
> > /* Match the full socket address */
> > if (!nfs_sockaddr_cmp(sap, clap))
> > continue;
> > + /* Match on long-form client ID */
> > + if (data->long_clientid && clp->cl_cached_clientid &&
> > + strcmp(data->long_clientid, clp->cl_cached_clientid))
> > + continue;
> >
> > atomic_inc(&clp->cl_count);
> > return clp;
> > @@ -1426,8 +1434,65 @@ error:
> > return error;
> > }
> >
> > -/*
> > - * Set up a pNFS Data Server client.
> > +/**
> > + * nfs4_clone_client - Clone a client after a migration event
> > + * clp: nfs_client to clone
> > + * sap: address of destination server
> > + * salen: size of "sap" in bytes
> > + * ip_addr: NUL-terminated string containing local presentation address
> > + * server: nfs_server to move from "clp" to the new one
> > + *
> > + * Returns negative errno or zero. nfs_client field of "server" is
> > + * updated to refer to a new or existing nfs_client that matches
> > + * [server address, port, version, minorversion, client ID]. The
> > + * nfs_server is moved from the old nfs_client's cl_superblocks list
> > + * to the new nfs_client's list.
> > + */
> > +int nfs4_clone_client(struct nfs_client *clp, const struct sockaddr *sap,
> > + size_t salen, const char *ip_addr,
> > + struct nfs_server *server)
> > +{
> > + struct rpc_clnt *rpcclnt = clp->cl_rpcclient;
> > + struct nfs_client_initdata cl_init = {
> > + .addr = sap,
> > + .addrlen = salen,
> > + .rpc_ops = &nfs_v4_clientops,
> > + .proto = rpc_protocol(rpcclnt),
> > + .minorversion = clp->cl_minorversion,
> > + .long_clientid = clp->cl_cached_clientid,
> > + };
> > + struct nfs_client *new;
> > + int status = 0;
> > +
> > + dprintk("--> %s cloning \"%s\" (client ID %llx)\n",
> > + __func__, clp->cl_hostname, clp->cl_clientid);
> > +
> > + new = nfs_get_client(&cl_init, rpcclnt->cl_timeout, ip_addr,
> > + rpcclnt->cl_auth->au_flavor, 0);
> > + if (IS_ERR(new)) {
> > + dprintk("<-- %s nfs_get_client failed\n", __func__);
> > + status = PTR_ERR(new);
> > + goto out;
> > + }
> > +
> > + nfs_server_remove_lists(server);
> > + server->nfs_client = new;
> > + nfs_server_insert_lists(server);
> > +
> > + dprintk("<-- %s moved (%llx:%llx) to nfs_client %p\n", __func__,
> > + (unsigned long long)server->fsid.major,
> > + (unsigned long long)server->fsid.minor, new);
>
> We may be in trouble here.
>
> Solaris servers use the cb_ident field to recognize a callback update rather than a full SETCLIENTID. This is because a migrate-reboot-migrate sequence can leave a destination server with a group of short form client IDs associated with the same long-form client ID.

What part of the spec justifies that assumption?

I can't see any mention of this kind of use of callback_ident in section
14.2.33 (or anywhere else). On the contrary, that section states
explicitly that the client is free at any time to modify both the
callback and callback_ident by means of a SETCLIENTID call that
preserves the client.id and client.verifier fields.

Worse: section 14.2.34 (SETCLIENTID_CONFIRM) says that if you confirm a
given short clientid, then _all_ state associated with another short
clientid value for that same long clientid is wiped.

IOW: I'm having trouble seeing how the 'multiple short clientid' model
can work within the framework of the current spec. As far as I can see,
it would require significant spec changes.

> Cloning an nfs_client creates a new nfs_client in many cases, which bumps cb_ident. On Linux, a callback with the original cb_ident would get us the old nfs_client anyway (via idr_find()).

Right. This is intentional.

> They are proposing that we use the callback RPC program number instead to find the right state information.

I'm very sceptical to that. For one thing, it is hard to implement
within the framework of the Linux server model: we work much better with
the single callback RPC program number and multiple callback_idents.

--
Trond Myklebust
Linux NFS client maintainer

NetApp
[email protected]
http://www.netapp.com


2011-05-09 19:38:16

by Chuck Lever

[permalink] [raw]
Subject: [PATCH 12/16] NFS: Add functions to swap transports during migration recovery

Introduce functions that can walk through an array of returned
fs_locations information and connect a transport to one of the
destination servers listed therein.

Note that NFS minor version 1 introduces "fs_locations_info" which
extends the locations array sorting criteria available to clients.
This is not supported yet.

Signed-off-by: Chuck Lever <[email protected]>
---

fs/nfs/internal.h | 2
fs/nfs/nfs4namespace.c | 202 ++++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 204 insertions(+), 0 deletions(-)

diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
index 0bf4e67..191c5b4 100644
--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -177,6 +177,8 @@ static inline void nfs_fs_proc_exit(void)
/* nfs4namespace.c */
#ifdef CONFIG_NFS_V4
extern struct vfsmount *nfs_do_refmount(struct dentry *dentry);
+extern int nfs4_replace_transport(struct nfs_server *server,
+ const struct nfs4_fs_locations *locations);
#else
static inline
struct vfsmount *nfs_do_refmount(struct dentry *dentry)
diff --git a/fs/nfs/nfs4namespace.c b/fs/nfs/nfs4namespace.c
index bb80c49..2fa024c 100644
--- a/fs/nfs/nfs4namespace.c
+++ b/fs/nfs/nfs4namespace.c
@@ -22,6 +22,8 @@

#define NFSDBG_FACILITY NFSDBG_VFS

+#undef USE_RPC_LOCK_CLIENT
+
/*
* Convert the NFSv4 pathname components into a standard posix path.
*
@@ -263,3 +265,203 @@ out:
dprintk("%s: done\n", __func__);
return mnt;
}
+
+#undef NFSDBG_FACILITY
+#define NFSDBG_FACILITY NFSDBG_CLIENT
+
+/*
+ * Returns zero on success, or a negative errno value.
+ */
+static int nfs4_update_server(struct nfs_server *server, const char *hostname,
+ struct sockaddr *sap, size_t salen)
+{
+ struct nfs_client *clp = server->nfs_client;
+ struct rpc_clnt *clnt = server->client;
+ struct xprt_create xargs = {
+ .ident = clp->cl_proto,
+ .net = &init_net,
+ .dstaddr = sap,
+ .addrlen = salen,
+ .servername = hostname,
+ };
+ char buf[INET6_ADDRSTRLEN + 1];
+ struct sockaddr_storage address;
+ struct sockaddr *localaddr = (struct sockaddr *)&address;
+ int error;
+
+ dprintk("--> %s: move FSID %llx:%llx to \"%s\")\n", __func__,
+ (unsigned long long)server->fsid.major,
+ (unsigned long long)server->fsid.minor,
+ hostname);
+
+ /*
+ * rpc_lock_client() deadlocks here. This is because the tasks
+ * that received NFS4ERR_MOVED are waiting for us to wake them
+ * when we are done recovering. But they have bumped
+ * cl_active_tasks for this clnt, so rpc_lock_client() can't make
+ * any progress.
+ */
+#ifdef USE_RPC_LOCK_CLIENT
+ error = rpc_lock_client(clnt, clnt->cl_timeout->to_maxval);
+ if (error != 0) {
+ dprintk("<-- %s(): rpc_lock_client returned %d\n",
+ __func__, error);
+ goto out;
+ }
+#endif /* USE_RPC_LOCK_CLIENT */
+
+ error = rpc_switch_client_transport(clnt, &xargs, clnt->cl_timeout);
+ if (error != 0) {
+ dprintk("<-- %s(): rpc_switch_client_transport returned %d\n",
+ __func__, error);
+ goto out;
+ }
+
+ /*
+ * If we were able to contact the server at @sap, set up a new
+ * nfs_client and move @server to it.
+ */
+ error = rpc_localaddr(clnt, localaddr, sizeof(address));
+ if (error != 0) {
+ dprintk("<-- %s(): rpc_localaddr returned %d\n",
+ __func__, error);
+ goto out;
+ }
+ error = -EAFNOSUPPORT;
+ if (rpc_ntop(localaddr, buf, sizeof(buf)) == 0) {
+ dprintk("<-- %s(): rpc_ntop returned %d\n",
+ __func__, error);
+ goto out;
+ }
+ error = nfs4_clone_client(clp, sap, salen, buf, server);
+ if (error != 0) {
+ dprintk("<-- %s(): nfs4_clone_client returned %d\n",
+ __func__, error);
+ goto out;
+ }
+ if (server->nfs_client->cl_hostname == NULL)
+ server->nfs_client->cl_hostname = kstrdup(hostname, GFP_KERNEL);
+
+ dprintk("<-- %s() succeeded\n", __func__);
+
+out:
+#ifdef USE_RPC_LOCK_CLIENT
+ rpc_unlock_client(clnt);
+#endif /* USE_RPC_LOCK_CLIENT */
+ return error;
+}
+
+/*
+ * Try one location from the fs_locations array.
+ *
+ * Returns zero on success, or a negative errno value.
+ */
+static int nfs4_try_replacing_one_location(struct nfs_server *server,
+ char *page, char *page2,
+ const struct nfs4_fs_location *location)
+{
+ const size_t addr_bufsize = sizeof(struct sockaddr_storage);
+ struct sockaddr *sap;
+ unsigned int s;
+ size_t salen;
+ int error;
+
+ dprintk("--> %s(%llx:%llx)\n", __func__,
+ (unsigned long long)server->fsid.major,
+ (unsigned long long)server->fsid.minor);
+
+ error = -ENOMEM;
+ sap = kmalloc(addr_bufsize, GFP_KERNEL);
+ if (sap == NULL)
+ goto out;
+
+ error = -ENOENT;
+ for (s = 0; s < location->nservers; s++) {
+ const struct nfs4_string *buf = &location->servers[s];
+ char *hostname;
+
+ if (buf->len <= 0 || buf->len > PAGE_SIZE)
+ continue;
+
+ /* XXX: IPv6 not supported? */
+ if (memchr(buf->data, IPV6_SCOPE_DELIMITER, buf->len) != NULL)
+ continue;
+
+ salen = nfs_parse_server_name(buf->data, buf->len,
+ sap, addr_bufsize);
+ if (salen == 0)
+ continue;
+ rpc_set_port(sap, NFS_PORT);
+
+ error = -ENOMEM;
+ hostname = kstrndup(buf->data, buf->len, GFP_KERNEL);
+ if (hostname == NULL)
+ break;
+
+ error = nfs4_update_server(server, hostname, sap, salen);
+ kfree(hostname);
+ if (error == 0)
+ break;
+ }
+
+ kfree(sap);
+out:
+ dprintk("<-- %s() = %d\n", __func__, error);
+ return error;
+}
+
+/**
+ * nfs4_replace_transport - set up transport to destination server
+ *
+ * @server: export being migrated
+ * @locations: fs_locations array
+ *
+ * Returns zero on success, or a negative errno value.
+ *
+ * The client tries all the entries in the "locations" array, in the
+ * order returned by the server, until one works or the end of the
+ * array is reached.
+ */
+int nfs4_replace_transport(struct nfs_server *server,
+ const struct nfs4_fs_locations *locations)
+{
+ char *page = NULL, *page2 = NULL;
+ int loc, error;
+
+ dprintk("--> %s(%llx:%llx)\n", __func__,
+ (unsigned long long)server->fsid.major,
+ (unsigned long long)server->fsid.minor);
+
+ error = -ENOENT;
+ if (locations == NULL || locations->nlocations <= 0)
+ goto out;
+
+ error = -ENOMEM;
+ page = (char *) __get_free_page(GFP_USER);
+ if (!page)
+ goto out;
+ page2 = (char *) __get_free_page(GFP_USER);
+ if (!page2)
+ goto out;
+
+ for (loc = 0; loc < locations->nlocations; loc++) {
+ const struct nfs4_fs_location *location =
+ &locations->locations[loc];
+
+ if (location == NULL || location->nservers <= 0 ||
+ location->rootpath.ncomponents == 0)
+ continue;
+
+ error = nfs4_try_replacing_one_location(server, page,
+ page2, location);
+ if (error == 0)
+ break;
+ }
+
+out:
+ free_page((unsigned long)page);
+ free_page((unsigned long)page2);
+
+ dprintk("<-- %s() = %d\n", __func__, error);
+ return error;
+}


2011-05-09 19:37:36

by Chuck Lever

[permalink] [raw]
Subject: [PATCH 08/16] NFS: Introduce NFS_ATTR_FATTR_V4_LOCATIONS

Our NFS client must distinguish between referral events (which it
currently supports) and migration events (which it does not yet
support).

In both types of events, an fs_locations array is returned. But upper
layers should make the distinction between a referral and a migration,
not the XDR layer. There really isn't a generic way for an XDR
decoder function to tell one from the other.

Slightly adjust the FATTR flags returned by decode_fs_locations()
to set NFS_ATTR_FATTR_V4_LOCATIONS only if a non-empty locations
array was returned from the server. Then have logic in nfs4proc.c
distinguish whether the locations array is for a referral or
something else.

Signed-off-by: Chuck Lever <[email protected]>
---

fs/nfs/nfs4proc.c | 5 +++--
fs/nfs/nfs4xdr.c | 2 +-
include/linux/nfs_xdr.h | 7 ++++---
3 files changed, 8 insertions(+), 6 deletions(-)

diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 5a87686..641691c 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -2291,7 +2291,8 @@ static int nfs4_get_referral(struct inode *dir, const struct qstr *name, struct
}

memcpy(fattr, &locations->fattr, sizeof(struct nfs_fattr));
- fattr->valid |= NFS_ATTR_FATTR_V4_REFERRAL;
+ if (fattr->valid & NFS_ATTR_FATTR_V4_LOCATIONS)
+ fattr->valid |= NFS_ATTR_FATTR_V4_REFERRAL;
if (!fattr->mode)
fattr->mode = S_IFDIR;
memset(fhandle, 0, sizeof(struct nfs_fh));
@@ -4656,7 +4657,7 @@ static void nfs_fixup_referral_attributes(struct nfs_fattr *fattr)
{
if (!((fattr->valid & NFS_ATTR_FATTR_FILEID) &&
(fattr->valid & NFS_ATTR_FATTR_FSID) &&
- (fattr->valid & NFS_ATTR_FATTR_V4_REFERRAL)))
+ (fattr->valid & NFS_ATTR_FATTR_V4_LOCATIONS)))
return;

fattr->valid |= NFS_ATTR_FATTR_TYPE | NFS_ATTR_FATTR_MODE |
diff --git a/fs/nfs/nfs4xdr.c b/fs/nfs/nfs4xdr.c
index c3ccd2c..be70be9 100644
--- a/fs/nfs/nfs4xdr.c
+++ b/fs/nfs/nfs4xdr.c
@@ -3326,7 +3326,7 @@ static int decode_attr_fs_locations(struct xdr_stream *xdr, uint32_t *bitmap, st
res->nlocations++;
}
if (res->nlocations != 0)
- status = NFS_ATTR_FATTR_V4_REFERRAL;
+ status = NFS_ATTR_FATTR_V4_LOCATIONS;
out:
dprintk("%s: fs_locations done, error = %d\n", __func__, status);
return status;
diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h
index 890dce2..26165a5 100644
--- a/include/linux/nfs_xdr.h
+++ b/include/linux/nfs_xdr.h
@@ -82,9 +82,10 @@ struct nfs_fattr {
#define NFS_ATTR_FATTR_PRECTIME (1U << 16)
#define NFS_ATTR_FATTR_CHANGE (1U << 17)
#define NFS_ATTR_FATTR_PRECHANGE (1U << 18)
-#define NFS_ATTR_FATTR_V4_REFERRAL (1U << 19) /* NFSv4 referral */
-#define NFS_ATTR_FATTR_MOUNTPOINT (1U << 20) /* Treat as mountpoint */
-#define NFS_ATTR_FATTR_MOUNTED_ON_FILEID (1U << 21)
+#define NFS_ATTR_FATTR_V4_LOCATIONS (1U << 19)
+#define NFS_ATTR_FATTR_V4_REFERRAL (1U << 20) /* NFSv4 referral */
+#define NFS_ATTR_FATTR_MOUNTPOINT (1U << 21) /* Treat as mountpoint */
+#define NFS_ATTR_FATTR_MOUNTED_ON_FILEID (1U << 22)

#define NFS_ATTR_FATTR (NFS_ATTR_FATTR_TYPE \
| NFS_ATTR_FATTR_MODE \


2011-05-11 00:20:12

by Tom Haynes

[permalink] [raw]
Subject: Re: [PATCH 16/16] NFS: Implement support for NFS4ERR_LEASE_MOVED

On 5/9/11 5:48 PM, Chuck Lever wrote:
> On May 9, 2011, at 3:38 PM, Chuck Lever wrote:
>
>> To recover from NFS4ERR_LEASE_MOVED, walk the cl_superblocks list and
>> invoke nfs4_try_migration() on each server's root file handle.
>> nfs4_try_migration() should automatically determine whether that file
>> system has migrated, and then perform recovery for it.
>>
>> The per-filesystem migration probe also informs minor version zero
>> servers that this client should no longer receive NFS4ERR_LEASE_MOVED.
> I see one thing that may be missing here.
>
> RFC 3530, section 8.14.3, lists OPEN, CLOSE, READ, WRITE, RENEW, LOCK, LOCKU, and LOCKT as the only procedures that return NFS4ERR_LEASE_MOVED. A code audit suggests that handling NFS4ERR_LEASE_MOVED in the two generic error handlers in nfs4proc.c is sufficient for all of these but RENEW and OPEN.

3503bis, section 13.4:

| NFS4ERR_LEASE_MOVED | CLOSE, DELEGPURGE, DELEGRETURN, LOCK, |
| | LOCKT, LOCKU, OPEN_CONFIRM, |
| | OPEN_DOWNGRADE, READ, |
| | RELEASE_LOCKOWNER, RENEW, SETATTR, |
| | WRITE |

And the equivalent text as to what is in 3530 is section 9.14.3:

To accomplish this, all
operations which implicitly renew leases for a client (such as OPEN,
CLOSE, READ, WRITE, RENEW, LOCK, and others), will return the error
NFS4ERR_LEASE_MOVED if responsibility for any of the leases to be
renewed has been transferred to a new server.




> RENEW is part of the lease_moved recovery logic, so I've left NFS4ERR_LEASE_MOVED handling there pretty sparse. The caller wants to deal with it.
>
> OPEN, as far as I can tell, won't deal with it at all, at least not directly. Should we look for it in nfs4_open_done() and invoke nfs4_schedule_lease_moved_recovery()?
>
>> Signed-off-by: Chuck Lever<[email protected]>
>> ---
>>
>> fs/nfs/client.c | 1 +
>> fs/nfs/nfs4_fs.h | 2 +
>> fs/nfs/nfs4proc.c | 11 ++++++++
>> fs/nfs/nfs4state.c | 63 ++++++++++++++++++++++++++++++++++++++++++++-
>> include/linux/nfs_fs_sb.h | 2 +
>> 5 files changed, 77 insertions(+), 2 deletions(-)
>>
>> diff --git a/fs/nfs/client.c b/fs/nfs/client.c
>> index 2f5e29f..b89af4d 100644
>> --- a/fs/nfs/client.c
>> +++ b/fs/nfs/client.c
>> @@ -188,6 +188,7 @@ static struct nfs_client *nfs_alloc_client(const struct nfs_client_initdata *cl_
>> if (cl_init->long_clientid != NULL)
>> clp->cl_cached_clientid = kstrdup(cl_init->long_clientid,
>> GFP_KERNEL);
>> + clp->cl_mig_counter = 1;
>> #endif
>> cred = rpc_lookup_machine_cred();
>> if (!IS_ERR(cred))
>> diff --git a/fs/nfs/nfs4_fs.h b/fs/nfs/nfs4_fs.h
>> index c3e8641..2ad6c9b 100644
>> --- a/fs/nfs/nfs4_fs.h
>> +++ b/fs/nfs/nfs4_fs.h
>> @@ -51,6 +51,7 @@ enum nfs4_client_state {
>> NFS4CLNT_UPDATE_CALLBACK,
>> NFS4CLNT_CLONED_CLIENT,
>> NFS4CLNT_MOVED,
>> + NFS4CLNT_LEASE_MOVED,
>> };
>>
>> enum nfs4_session_state {
>> @@ -351,6 +352,7 @@ extern void nfs4_close_sync(struct path *, struct nfs4_state *, fmode_t);
>> extern void nfs4_state_set_mode_locked(struct nfs4_state *, fmode_t);
>> extern void nfs4_schedule_lease_recovery(struct nfs_client *);
>> extern void nfs4_schedule_migration_recovery(struct nfs_server *);
>> +extern void nfs4_schedule_lease_moved_recovery(struct nfs_client *);
>> extern void nfs4_schedule_state_manager(struct nfs_client *);
>> extern void nfs4_schedule_stateid_recovery(const struct nfs_server *, struct nfs4_state *);
>> extern void nfs41_handle_sequence_flag_errors(struct nfs_client *clp, u32 flags);
>> diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
>> index a545d46..f4e07ba 100644
>> --- a/fs/nfs/nfs4proc.c
>> +++ b/fs/nfs/nfs4proc.c
>> @@ -297,6 +297,9 @@ static int nfs4_handle_exception(struct nfs_server *server, int errorcode,
>> case -NFS4ERR_MOVED:
>> nfs4_schedule_migration_recovery(server);
>> goto wait_on_recovery;
>> + case -NFS4ERR_LEASE_MOVED:
>> + nfs4_schedule_lease_moved_recovery(clp);
>> + goto wait_on_recovery;
>> case -NFS4ERR_FILE_OPEN:
>> if (exception->timeout> HZ) {
>> /* We have retried a decent amount, time to
>> @@ -3721,6 +3724,14 @@ static int nfs4_async_handle_error(struct rpc_task *task,
>> rpc_wake_up_queued_task(&clp->cl_rpcwaitq,
>> task);
>> goto restart_call;
>> + case -NFS4ERR_LEASE_MOVED:
>> + rpc_sleep_on(&clp->cl_rpcwaitq, task, NULL);
>> + nfs4_schedule_lease_moved_recovery(clp);
>> + if (test_bit(NFS4CLNT_MANAGER_RUNNING,
>> + &clp->cl_state) == 0)
>> + rpc_wake_up_queued_task(&clp->cl_rpcwaitq,
>> + task);
>> + goto restart_call;
>> case -NFS4ERR_DELAY:
>> nfs_inc_server_stats(server, NFSIOS_DELAY);
>> case -NFS4ERR_GRACE:
>> diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
>> index c7b414a..8bef9d8 100644
>> --- a/fs/nfs/nfs4state.c
>> +++ b/fs/nfs/nfs4state.c
>> @@ -1071,7 +1071,32 @@ void nfs4_schedule_migration_recovery(struct nfs_server *server)
>> dprintk("<-- %s\n", __func__);
>> }
>>
>> -static int nfs4_state_mark_reclaim_reboot(struct nfs_client *clp, struct nfs4_state *state)
>> +/**
>> + * nfs4_schedule_lease_moved_recovery - start lease moved recovery
>> + *
>> + * @clp: nfs_client of server that may have migrated file systems
>> + *
>> + */
>> +void nfs4_schedule_lease_moved_recovery(struct nfs_client *clp)
>> +{
>> + dprintk("--> %s: \"%s\" (client ID %llx)\n",
>> + __func__, clp->cl_hostname, clp->cl_clientid);
>> +
>> + if (test_and_set_bit(NFS4CLNT_LEASE_MOVED,&clp->cl_state) == 0)
>> + nfs4_schedule_state_manager(clp);
>> +
>> + dprintk("<-- %s\n", __func__);
>> +}
>> +
>> +/**
>> + * nfs4_state_mark_reclaim_reboot - Mark nfs_client for reboot recovery
>> + * @clp: nfs_client of server that may have rebooted
>> + * @state: state flags to test
>> + *
>> + * Returns 1 if reboot recovery is needed.
>> + */
>> +int nfs4_state_mark_reclaim_reboot(struct nfs_client *clp,
>> + struct nfs4_state *state)
>> {
>>
>> set_bit(NFS_STATE_RECLAIM_REBOOT,&state->flags);
>> @@ -1384,7 +1409,6 @@ static int nfs4_recovery_handle_error(struct nfs_client *clp, int error)
>> nfs4_state_end_reclaim_reboot(clp);
>> return 0;
>> case -NFS4ERR_STALE_CLIENTID:
>> - case -NFS4ERR_LEASE_MOVED:
>> set_bit(NFS4CLNT_LEASE_EXPIRED,&clp->cl_state);
>> nfs4_state_clear_reclaim_reboot(clp);
>> nfs4_state_start_reclaim_reboot(clp);
>> @@ -1597,6 +1621,37 @@ out_err:
>> kfree(locations);
>> }
>>
>> +static void nfs4_handle_lease_moved(struct nfs_client *clp)
>> +{
>> + struct nfs_server *server;
>> +
>> + dprintk("--> %s: \"%s\" (client ID %llx)\n",
>> + __func__, clp->cl_hostname, clp->cl_clientid);
>> +
>> + /*
>> + * rcu_read_lock() must be dropped before trying each individual
>> + * migration. cl_mig_counter is used to skip servers that have
>> + * already been visited for this lease_moved event when the list
>> + * walk is restarted.
>> + */
>> + clp->cl_mig_counter++;
>> +
>> +restart:
>> + rcu_read_lock();
>> + list_for_each_entry_rcu(server,&clp->cl_superblocks, client_link)
>> + if (server->mig_counter != clp->cl_mig_counter) {
>> + server->mig_counter = clp->cl_mig_counter;
>> + rcu_read_unlock();
>> + nfs4_try_migration(server);
>> + /* Ask the server if there's more work to do */
>> + if (nfs4_check_lease(clp) == NFS4ERR_LEASE_MOVED)
>> + goto restart;
>> + break;
>> + }
>> + rcu_read_unlock();
>> + dprintk("<-- %s\n", __func__);
>> +}
>> +
>> #ifdef CONFIG_NFS_V4_1
>> void nfs4_schedule_session_recovery(struct nfs4_session *session)
>> {
>> @@ -1814,6 +1869,10 @@ static void nfs4_state_manager(struct nfs_client *clp)
>> nfs4_try_migration(clp->cl_moved_server);
>> continue;
>> }
>> + if (test_and_clear_bit(NFS4CLNT_LEASE_MOVED,&clp->cl_state)) {
>> + nfs4_handle_lease_moved(clp);
>> + continue;
>> + }
>>
>> /* First recover reboot state... */
>> if (test_bit(NFS4CLNT_RECLAIM_REBOOT,&clp->cl_state)) {
>> diff --git a/include/linux/nfs_fs_sb.h b/include/linux/nfs_fs_sb.h
>> index 091abf0..58050db 100644
>> --- a/include/linux/nfs_fs_sb.h
>> +++ b/include/linux/nfs_fs_sb.h
>> @@ -60,6 +60,7 @@ struct nfs_client {
>>
>> /* accessed only when NFS4CLNT_MOVED bit is set */
>> struct nfs_server * cl_moved_server;
>> + unsigned long cl_mig_counter;
>>
>> /* used for the setclientid verifier */
>> struct timespec cl_boot_time;
>> @@ -156,6 +157,7 @@ struct nfs_server {
>> struct list_head delegations;
>> void (*destroy)(struct nfs_server *);
>> struct nfs_fh *rootfh;
>> + unsigned long mig_counter;
>>
>> atomic_t active; /* Keep trace of any activity to this server */
>>
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>> the body of a message to [email protected]
>> More majordomo info at http://vger.kernel.org/majordomo-info.html


2011-05-09 19:38:06

by Chuck Lever

[permalink] [raw]
Subject: [PATCH 11/16] NFS: Add an API for cloning an nfs_client

After a migration event, we have to preserve the client ID the client
used with the source server, and introduce it to the destination
server, in case the migration transparently migrated state for the
migrating FSID.

Note that our RENEW and SETCLIENTID procs both take an nfs_client as
an argument. Thus, after a successful migration recovery, we want to
have a nfs_client with the correct long-form and short-form client ID
for the destination server to pass these procs.

To preserve the client IDs, we clone the source server's nfs_client.
The migrated FSID is moved from the original nfs_client to the cloned
one.

This patch introduces an API for cloning an nfs_client and moving an
FSID to it.

Signed-off-by: Chuck Lever <[email protected]>
---

fs/nfs/client.c | 69 +++++++++++++++++++++++++++++++++++++++++++++++++++--
fs/nfs/internal.h | 4 +++
2 files changed, 71 insertions(+), 2 deletions(-)

diff --git a/fs/nfs/client.c b/fs/nfs/client.c
index 536b0ba..2f5e29f 100644
--- a/fs/nfs/client.c
+++ b/fs/nfs/client.c
@@ -135,6 +135,7 @@ struct nfs_client_initdata {
const struct nfs_rpc_ops *rpc_ops;
int proto;
u32 minorversion;
+ const char *long_clientid;
};

/*
@@ -184,6 +185,9 @@ static struct nfs_client *nfs_alloc_client(const struct nfs_client_initdata *cl_
clp->cl_state = 1 << NFS4CLNT_LEASE_EXPIRED;
clp->cl_minorversion = cl_init->minorversion;
clp->cl_mvops = nfs_v4_minor_ops[cl_init->minorversion];
+ if (cl_init->long_clientid != NULL)
+ clp->cl_cached_clientid = kstrdup(cl_init->long_clientid,
+ GFP_KERNEL);
#endif
cred = rpc_lookup_machine_cred();
if (!IS_ERR(cred))
@@ -476,6 +480,10 @@ static struct nfs_client *nfs_match_client(const struct nfs_client_initdata *dat
/* Match the full socket address */
if (!nfs_sockaddr_cmp(sap, clap))
continue;
+ /* Match on long-form client ID */
+ if (data->long_clientid && clp->cl_cached_clientid &&
+ strcmp(data->long_clientid, clp->cl_cached_clientid))
+ continue;

atomic_inc(&clp->cl_count);
return clp;
@@ -1426,8 +1434,65 @@ error:
return error;
}

-/*
- * Set up a pNFS Data Server client.
+/**
+ * nfs4_clone_client - Clone a client after a migration event
+ * clp: nfs_client to clone
+ * sap: address of destination server
+ * salen: size of "sap" in bytes
+ * ip_addr: NUL-terminated string containing local presentation address
+ * server: nfs_server to move from "clp" to the new one
+ *
+ * Returns negative errno or zero. nfs_client field of "server" is
+ * updated to refer to a new or existing nfs_client that matches
+ * [server address, port, version, minorversion, client ID]. The
+ * nfs_server is moved from the old nfs_client's cl_superblocks list
+ * to the new nfs_client's list.
+ */
+int nfs4_clone_client(struct nfs_client *clp, const struct sockaddr *sap,
+ size_t salen, const char *ip_addr,
+ struct nfs_server *server)
+{
+ struct rpc_clnt *rpcclnt = clp->cl_rpcclient;
+ struct nfs_client_initdata cl_init = {
+ .addr = sap,
+ .addrlen = salen,
+ .rpc_ops = &nfs_v4_clientops,
+ .proto = rpc_protocol(rpcclnt),
+ .minorversion = clp->cl_minorversion,
+ .long_clientid = clp->cl_cached_clientid,
+ };
+ struct nfs_client *new;
+ int status = 0;
+
+ dprintk("--> %s cloning \"%s\" (client ID %llx)\n",
+ __func__, clp->cl_hostname, clp->cl_clientid);
+
+ new = nfs_get_client(&cl_init, rpcclnt->cl_timeout, ip_addr,
+ rpcclnt->cl_auth->au_flavor, 0);
+ if (IS_ERR(new)) {
+ dprintk("<-- %s nfs_get_client failed\n", __func__);
+ status = PTR_ERR(new);
+ goto out;
+ }
+
+ nfs_server_remove_lists(server);
+ server->nfs_client = new;
+ nfs_server_insert_lists(server);
+
+ dprintk("<-- %s moved (%llx:%llx) to nfs_client %p\n", __func__,
+ (unsigned long long)server->fsid.major,
+ (unsigned long long)server->fsid.minor, new);
+
+out:
+ return status;
+}
+
+/**
+ * nfs4_set_ds_client - Set up a pNFS Data Server client
+ * mds_clp: nfs_client representing the MDS
+ * ds_addr: IP address of DS
+ * ds_addrlen: size of "ds_addr" in bytes
+ * ds_proto: transport protocol to use to contact DS
*
* Return any existing nfs_client that matches server address,port,version
* and minorversion.
diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
index f6baf5b..0bf4e67 100644
--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -154,6 +154,10 @@ extern struct nfs_server *nfs_clone_server(struct nfs_server *,
struct nfs_fattr *);
extern void nfs_mark_client_ready(struct nfs_client *clp, int state);
extern int nfs4_check_client_ready(struct nfs_client *clp);
+extern int nfs4_clone_client(struct nfs_client *clp,
+ const struct sockaddr *sap, size_t salen,
+ const char *ip_addr,
+ struct nfs_server *server);
extern struct nfs_client *nfs4_set_ds_client(struct nfs_client* mds_clp,
const struct sockaddr *ds_addr,
int ds_addrlen, int ds_proto);


2011-05-09 22:48:20

by Chuck Lever

[permalink] [raw]
Subject: Re: [PATCH 16/16] NFS: Implement support for NFS4ERR_LEASE_MOVED


On May 9, 2011, at 3:38 PM, Chuck Lever wrote:

> To recover from NFS4ERR_LEASE_MOVED, walk the cl_superblocks list and
> invoke nfs4_try_migration() on each server's root file handle.
> nfs4_try_migration() should automatically determine whether that file
> system has migrated, and then perform recovery for it.
>
> The per-filesystem migration probe also informs minor version zero
> servers that this client should no longer receive NFS4ERR_LEASE_MOVED.

I see one thing that may be missing here.

RFC 3530, section 8.14.3, lists OPEN, CLOSE, READ, WRITE, RENEW, LOCK, LOCKU, and LOCKT as the only procedures that return NFS4ERR_LEASE_MOVED. A code audit suggests that handling NFS4ERR_LEASE_MOVED in the two generic error handlers in nfs4proc.c is sufficient for all of these but RENEW and OPEN.

RENEW is part of the lease_moved recovery logic, so I've left NFS4ERR_LEASE_MOVED handling there pretty sparse. The caller wants to deal with it.

OPEN, as far as I can tell, won't deal with it at all, at least not directly. Should we look for it in nfs4_open_done() and invoke nfs4_schedule_lease_moved_recovery()?

> Signed-off-by: Chuck Lever <[email protected]>
> ---
>
> fs/nfs/client.c | 1 +
> fs/nfs/nfs4_fs.h | 2 +
> fs/nfs/nfs4proc.c | 11 ++++++++
> fs/nfs/nfs4state.c | 63 ++++++++++++++++++++++++++++++++++++++++++++-
> include/linux/nfs_fs_sb.h | 2 +
> 5 files changed, 77 insertions(+), 2 deletions(-)
>
> diff --git a/fs/nfs/client.c b/fs/nfs/client.c
> index 2f5e29f..b89af4d 100644
> --- a/fs/nfs/client.c
> +++ b/fs/nfs/client.c
> @@ -188,6 +188,7 @@ static struct nfs_client *nfs_alloc_client(const struct nfs_client_initdata *cl_
> if (cl_init->long_clientid != NULL)
> clp->cl_cached_clientid = kstrdup(cl_init->long_clientid,
> GFP_KERNEL);
> + clp->cl_mig_counter = 1;
> #endif
> cred = rpc_lookup_machine_cred();
> if (!IS_ERR(cred))
> diff --git a/fs/nfs/nfs4_fs.h b/fs/nfs/nfs4_fs.h
> index c3e8641..2ad6c9b 100644
> --- a/fs/nfs/nfs4_fs.h
> +++ b/fs/nfs/nfs4_fs.h
> @@ -51,6 +51,7 @@ enum nfs4_client_state {
> NFS4CLNT_UPDATE_CALLBACK,
> NFS4CLNT_CLONED_CLIENT,
> NFS4CLNT_MOVED,
> + NFS4CLNT_LEASE_MOVED,
> };
>
> enum nfs4_session_state {
> @@ -351,6 +352,7 @@ extern void nfs4_close_sync(struct path *, struct nfs4_state *, fmode_t);
> extern void nfs4_state_set_mode_locked(struct nfs4_state *, fmode_t);
> extern void nfs4_schedule_lease_recovery(struct nfs_client *);
> extern void nfs4_schedule_migration_recovery(struct nfs_server *);
> +extern void nfs4_schedule_lease_moved_recovery(struct nfs_client *);
> extern void nfs4_schedule_state_manager(struct nfs_client *);
> extern void nfs4_schedule_stateid_recovery(const struct nfs_server *, struct nfs4_state *);
> extern void nfs41_handle_sequence_flag_errors(struct nfs_client *clp, u32 flags);
> diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
> index a545d46..f4e07ba 100644
> --- a/fs/nfs/nfs4proc.c
> +++ b/fs/nfs/nfs4proc.c
> @@ -297,6 +297,9 @@ static int nfs4_handle_exception(struct nfs_server *server, int errorcode,
> case -NFS4ERR_MOVED:
> nfs4_schedule_migration_recovery(server);
> goto wait_on_recovery;
> + case -NFS4ERR_LEASE_MOVED:
> + nfs4_schedule_lease_moved_recovery(clp);
> + goto wait_on_recovery;
> case -NFS4ERR_FILE_OPEN:
> if (exception->timeout > HZ) {
> /* We have retried a decent amount, time to
> @@ -3721,6 +3724,14 @@ static int nfs4_async_handle_error(struct rpc_task *task,
> rpc_wake_up_queued_task(&clp->cl_rpcwaitq,
> task);
> goto restart_call;
> + case -NFS4ERR_LEASE_MOVED:
> + rpc_sleep_on(&clp->cl_rpcwaitq, task, NULL);
> + nfs4_schedule_lease_moved_recovery(clp);
> + if (test_bit(NFS4CLNT_MANAGER_RUNNING,
> + &clp->cl_state) == 0)
> + rpc_wake_up_queued_task(&clp->cl_rpcwaitq,
> + task);
> + goto restart_call;
> case -NFS4ERR_DELAY:
> nfs_inc_server_stats(server, NFSIOS_DELAY);
> case -NFS4ERR_GRACE:
> diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
> index c7b414a..8bef9d8 100644
> --- a/fs/nfs/nfs4state.c
> +++ b/fs/nfs/nfs4state.c
> @@ -1071,7 +1071,32 @@ void nfs4_schedule_migration_recovery(struct nfs_server *server)
> dprintk("<-- %s\n", __func__);
> }
>
> -static int nfs4_state_mark_reclaim_reboot(struct nfs_client *clp, struct nfs4_state *state)
> +/**
> + * nfs4_schedule_lease_moved_recovery - start lease moved recovery
> + *
> + * @clp: nfs_client of server that may have migrated file systems
> + *
> + */
> +void nfs4_schedule_lease_moved_recovery(struct nfs_client *clp)
> +{
> + dprintk("--> %s: \"%s\" (client ID %llx)\n",
> + __func__, clp->cl_hostname, clp->cl_clientid);
> +
> + if (test_and_set_bit(NFS4CLNT_LEASE_MOVED, &clp->cl_state) == 0)
> + nfs4_schedule_state_manager(clp);
> +
> + dprintk("<-- %s\n", __func__);
> +}
> +
> +/**
> + * nfs4_state_mark_reclaim_reboot - Mark nfs_client for reboot recovery
> + * @clp: nfs_client of server that may have rebooted
> + * @state: state flags to test
> + *
> + * Returns 1 if reboot recovery is needed.
> + */
> +int nfs4_state_mark_reclaim_reboot(struct nfs_client *clp,
> + struct nfs4_state *state)
> {
>
> set_bit(NFS_STATE_RECLAIM_REBOOT, &state->flags);
> @@ -1384,7 +1409,6 @@ static int nfs4_recovery_handle_error(struct nfs_client *clp, int error)
> nfs4_state_end_reclaim_reboot(clp);
> return 0;
> case -NFS4ERR_STALE_CLIENTID:
> - case -NFS4ERR_LEASE_MOVED:
> set_bit(NFS4CLNT_LEASE_EXPIRED, &clp->cl_state);
> nfs4_state_clear_reclaim_reboot(clp);
> nfs4_state_start_reclaim_reboot(clp);
> @@ -1597,6 +1621,37 @@ out_err:
> kfree(locations);
> }
>
> +static void nfs4_handle_lease_moved(struct nfs_client *clp)
> +{
> + struct nfs_server *server;
> +
> + dprintk("--> %s: \"%s\" (client ID %llx)\n",
> + __func__, clp->cl_hostname, clp->cl_clientid);
> +
> + /*
> + * rcu_read_lock() must be dropped before trying each individual
> + * migration. cl_mig_counter is used to skip servers that have
> + * already been visited for this lease_moved event when the list
> + * walk is restarted.
> + */
> + clp->cl_mig_counter++;
> +
> +restart:
> + rcu_read_lock();
> + list_for_each_entry_rcu(server, &clp->cl_superblocks, client_link)
> + if (server->mig_counter != clp->cl_mig_counter) {
> + server->mig_counter = clp->cl_mig_counter;
> + rcu_read_unlock();
> + nfs4_try_migration(server);
> + /* Ask the server if there's more work to do */
> + if (nfs4_check_lease(clp) == NFS4ERR_LEASE_MOVED)
> + goto restart;
> + break;
> + }
> + rcu_read_unlock();
> + dprintk("<-- %s\n", __func__);
> +}
> +
> #ifdef CONFIG_NFS_V4_1
> void nfs4_schedule_session_recovery(struct nfs4_session *session)
> {
> @@ -1814,6 +1869,10 @@ static void nfs4_state_manager(struct nfs_client *clp)
> nfs4_try_migration(clp->cl_moved_server);
> continue;
> }
> + if (test_and_clear_bit(NFS4CLNT_LEASE_MOVED, &clp->cl_state)) {
> + nfs4_handle_lease_moved(clp);
> + continue;
> + }
>
> /* First recover reboot state... */
> if (test_bit(NFS4CLNT_RECLAIM_REBOOT, &clp->cl_state)) {
> diff --git a/include/linux/nfs_fs_sb.h b/include/linux/nfs_fs_sb.h
> index 091abf0..58050db 100644
> --- a/include/linux/nfs_fs_sb.h
> +++ b/include/linux/nfs_fs_sb.h
> @@ -60,6 +60,7 @@ struct nfs_client {
>
> /* accessed only when NFS4CLNT_MOVED bit is set */
> struct nfs_server * cl_moved_server;
> + unsigned long cl_mig_counter;
>
> /* used for the setclientid verifier */
> struct timespec cl_boot_time;
> @@ -156,6 +157,7 @@ struct nfs_server {
> struct list_head delegations;
> void (*destroy)(struct nfs_server *);
> struct nfs_fh *rootfh;
> + unsigned long mig_counter;
>
> atomic_t active; /* Keep trace of any activity to this server */
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com





2011-05-11 18:35:44

by Chuck Lever

[permalink] [raw]
Subject: Re: [PATCH 16/16] NFS: Implement support for NFS4ERR_LEASE_MOVED


On May 10, 2011, at 8:20 PM, Tom Haynes wrote:

> On 5/9/11 5:48 PM, Chuck Lever wrote:
>> On May 9, 2011, at 3:38 PM, Chuck Lever wrote:
>>
>>> To recover from NFS4ERR_LEASE_MOVED, walk the cl_superblocks list and
>>> invoke nfs4_try_migration() on each server's root file handle.
>>> nfs4_try_migration() should automatically determine whether that file
>>> system has migrated, and then perform recovery for it.
>>>
>>> The per-filesystem migration probe also informs minor version zero
>>> servers that this client should no longer receive NFS4ERR_LEASE_MOVED.
>> I see one thing that may be missing here.
>>
>> RFC 3530, section 8.14.3, lists OPEN, CLOSE, READ, WRITE, RENEW, LOCK, LOCKU, and LOCKT as the only procedures that return NFS4ERR_LEASE_MOVED. A code audit suggests that handling NFS4ERR_LEASE_MOVED in the two generic error handlers in nfs4proc.c is sufficient for all of these but RENEW and OPEN.
>
> 3503bis, section 13.4:
>
> | NFS4ERR_LEASE_MOVED | CLOSE, DELEGPURGE, DELEGRETURN, LOCK, |
> | | LOCKT, LOCKU, OPEN_CONFIRM, |
> | | OPEN_DOWNGRADE, READ, |
> | | RELEASE_LOCKOWNER, RENEW, SETATTR, |
> | | WRITE |
>
> And the equivalent text as to what is in 3530 is section 9.14.3:
>
> To accomplish this, all
> operations which implicitly renew leases for a client (such as OPEN,
> CLOSE, READ, WRITE, RENEW, LOCK, and others), will return the error
> NFS4ERR_LEASE_MOVED if responsibility for any of the leases to be
> renewed has been transferred to a new server.

Thanks, will take a look (and refresh my copy of 3530... oops).

>
>
>
>
>> RENEW is part of the lease_moved recovery logic, so I've left NFS4ERR_LEASE_MOVED handling there pretty sparse. The caller wants to deal with it.
>>
>> OPEN, as far as I can tell, won't deal with it at all, at least not directly. Should we look for it in nfs4_open_done() and invoke nfs4_schedule_lease_moved_recovery()?
>>
>>> Signed-off-by: Chuck Lever<[email protected]>
>>> ---
>>>
>>> fs/nfs/client.c | 1 +
>>> fs/nfs/nfs4_fs.h | 2 +
>>> fs/nfs/nfs4proc.c | 11 ++++++++
>>> fs/nfs/nfs4state.c | 63 ++++++++++++++++++++++++++++++++++++++++++++-
>>> include/linux/nfs_fs_sb.h | 2 +
>>> 5 files changed, 77 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/fs/nfs/client.c b/fs/nfs/client.c
>>> index 2f5e29f..b89af4d 100644
>>> --- a/fs/nfs/client.c
>>> +++ b/fs/nfs/client.c
>>> @@ -188,6 +188,7 @@ static struct nfs_client *nfs_alloc_client(const struct nfs_client_initdata *cl_
>>> if (cl_init->long_clientid != NULL)
>>> clp->cl_cached_clientid = kstrdup(cl_init->long_clientid,
>>> GFP_KERNEL);
>>> + clp->cl_mig_counter = 1;
>>> #endif
>>> cred = rpc_lookup_machine_cred();
>>> if (!IS_ERR(cred))
>>> diff --git a/fs/nfs/nfs4_fs.h b/fs/nfs/nfs4_fs.h
>>> index c3e8641..2ad6c9b 100644
>>> --- a/fs/nfs/nfs4_fs.h
>>> +++ b/fs/nfs/nfs4_fs.h
>>> @@ -51,6 +51,7 @@ enum nfs4_client_state {
>>> NFS4CLNT_UPDATE_CALLBACK,
>>> NFS4CLNT_CLONED_CLIENT,
>>> NFS4CLNT_MOVED,
>>> + NFS4CLNT_LEASE_MOVED,
>>> };
>>>
>>> enum nfs4_session_state {
>>> @@ -351,6 +352,7 @@ extern void nfs4_close_sync(struct path *, struct nfs4_state *, fmode_t);
>>> extern void nfs4_state_set_mode_locked(struct nfs4_state *, fmode_t);
>>> extern void nfs4_schedule_lease_recovery(struct nfs_client *);
>>> extern void nfs4_schedule_migration_recovery(struct nfs_server *);
>>> +extern void nfs4_schedule_lease_moved_recovery(struct nfs_client *);
>>> extern void nfs4_schedule_state_manager(struct nfs_client *);
>>> extern void nfs4_schedule_stateid_recovery(const struct nfs_server *, struct nfs4_state *);
>>> extern void nfs41_handle_sequence_flag_errors(struct nfs_client *clp, u32 flags);
>>> diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
>>> index a545d46..f4e07ba 100644
>>> --- a/fs/nfs/nfs4proc.c
>>> +++ b/fs/nfs/nfs4proc.c
>>> @@ -297,6 +297,9 @@ static int nfs4_handle_exception(struct nfs_server *server, int errorcode,
>>> case -NFS4ERR_MOVED:
>>> nfs4_schedule_migration_recovery(server);
>>> goto wait_on_recovery;
>>> + case -NFS4ERR_LEASE_MOVED:
>>> + nfs4_schedule_lease_moved_recovery(clp);
>>> + goto wait_on_recovery;
>>> case -NFS4ERR_FILE_OPEN:
>>> if (exception->timeout> HZ) {
>>> /* We have retried a decent amount, time to
>>> @@ -3721,6 +3724,14 @@ static int nfs4_async_handle_error(struct rpc_task *task,
>>> rpc_wake_up_queued_task(&clp->cl_rpcwaitq,
>>> task);
>>> goto restart_call;
>>> + case -NFS4ERR_LEASE_MOVED:
>>> + rpc_sleep_on(&clp->cl_rpcwaitq, task, NULL);
>>> + nfs4_schedule_lease_moved_recovery(clp);
>>> + if (test_bit(NFS4CLNT_MANAGER_RUNNING,
>>> + &clp->cl_state) == 0)
>>> + rpc_wake_up_queued_task(&clp->cl_rpcwaitq,
>>> + task);
>>> + goto restart_call;
>>> case -NFS4ERR_DELAY:
>>> nfs_inc_server_stats(server, NFSIOS_DELAY);
>>> case -NFS4ERR_GRACE:
>>> diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
>>> index c7b414a..8bef9d8 100644
>>> --- a/fs/nfs/nfs4state.c
>>> +++ b/fs/nfs/nfs4state.c
>>> @@ -1071,7 +1071,32 @@ void nfs4_schedule_migration_recovery(struct nfs_server *server)
>>> dprintk("<-- %s\n", __func__);
>>> }
>>>
>>> -static int nfs4_state_mark_reclaim_reboot(struct nfs_client *clp, struct nfs4_state *state)
>>> +/**
>>> + * nfs4_schedule_lease_moved_recovery - start lease moved recovery
>>> + *
>>> + * @clp: nfs_client of server that may have migrated file systems
>>> + *
>>> + */
>>> +void nfs4_schedule_lease_moved_recovery(struct nfs_client *clp)
>>> +{
>>> + dprintk("--> %s: \"%s\" (client ID %llx)\n",
>>> + __func__, clp->cl_hostname, clp->cl_clientid);
>>> +
>>> + if (test_and_set_bit(NFS4CLNT_LEASE_MOVED,&clp->cl_state) == 0)
>>> + nfs4_schedule_state_manager(clp);
>>> +
>>> + dprintk("<-- %s\n", __func__);
>>> +}
>>> +
>>> +/**
>>> + * nfs4_state_mark_reclaim_reboot - Mark nfs_client for reboot recovery
>>> + * @clp: nfs_client of server that may have rebooted
>>> + * @state: state flags to test
>>> + *
>>> + * Returns 1 if reboot recovery is needed.
>>> + */
>>> +int nfs4_state_mark_reclaim_reboot(struct nfs_client *clp,
>>> + struct nfs4_state *state)
>>> {
>>>
>>> set_bit(NFS_STATE_RECLAIM_REBOOT,&state->flags);
>>> @@ -1384,7 +1409,6 @@ static int nfs4_recovery_handle_error(struct nfs_client *clp, int error)
>>> nfs4_state_end_reclaim_reboot(clp);
>>> return 0;
>>> case -NFS4ERR_STALE_CLIENTID:
>>> - case -NFS4ERR_LEASE_MOVED:
>>> set_bit(NFS4CLNT_LEASE_EXPIRED,&clp->cl_state);
>>> nfs4_state_clear_reclaim_reboot(clp);
>>> nfs4_state_start_reclaim_reboot(clp);
>>> @@ -1597,6 +1621,37 @@ out_err:
>>> kfree(locations);
>>> }
>>>
>>> +static void nfs4_handle_lease_moved(struct nfs_client *clp)
>>> +{
>>> + struct nfs_server *server;
>>> +
>>> + dprintk("--> %s: \"%s\" (client ID %llx)\n",
>>> + __func__, clp->cl_hostname, clp->cl_clientid);
>>> +
>>> + /*
>>> + * rcu_read_lock() must be dropped before trying each individual
>>> + * migration. cl_mig_counter is used to skip servers that have
>>> + * already been visited for this lease_moved event when the list
>>> + * walk is restarted.
>>> + */
>>> + clp->cl_mig_counter++;
>>> +
>>> +restart:
>>> + rcu_read_lock();
>>> + list_for_each_entry_rcu(server,&clp->cl_superblocks, client_link)
>>> + if (server->mig_counter != clp->cl_mig_counter) {
>>> + server->mig_counter = clp->cl_mig_counter;
>>> + rcu_read_unlock();
>>> + nfs4_try_migration(server);
>>> + /* Ask the server if there's more work to do */
>>> + if (nfs4_check_lease(clp) == NFS4ERR_LEASE_MOVED)
>>> + goto restart;
>>> + break;
>>> + }
>>> + rcu_read_unlock();
>>> + dprintk("<-- %s\n", __func__);
>>> +}
>>> +
>>> #ifdef CONFIG_NFS_V4_1
>>> void nfs4_schedule_session_recovery(struct nfs4_session *session)
>>> {
>>> @@ -1814,6 +1869,10 @@ static void nfs4_state_manager(struct nfs_client *clp)
>>> nfs4_try_migration(clp->cl_moved_server);
>>> continue;
>>> }
>>> + if (test_and_clear_bit(NFS4CLNT_LEASE_MOVED,&clp->cl_state)) {
>>> + nfs4_handle_lease_moved(clp);
>>> + continue;
>>> + }
>>>
>>> /* First recover reboot state... */
>>> if (test_bit(NFS4CLNT_RECLAIM_REBOOT,&clp->cl_state)) {
>>> diff --git a/include/linux/nfs_fs_sb.h b/include/linux/nfs_fs_sb.h
>>> index 091abf0..58050db 100644
>>> --- a/include/linux/nfs_fs_sb.h
>>> +++ b/include/linux/nfs_fs_sb.h
>>> @@ -60,6 +60,7 @@ struct nfs_client {
>>>
>>> /* accessed only when NFS4CLNT_MOVED bit is set */
>>> struct nfs_server * cl_moved_server;
>>> + unsigned long cl_mig_counter;
>>>
>>> /* used for the setclientid verifier */
>>> struct timespec cl_boot_time;
>>> @@ -156,6 +157,7 @@ struct nfs_server {
>>> struct list_head delegations;
>>> void (*destroy)(struct nfs_server *);
>>> struct nfs_fh *rootfh;
>>> + unsigned long mig_counter;
>>>
>>> atomic_t active; /* Keep trace of any activity to this server */
>>>
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>>> the body of a message to [email protected]
>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com