2012-01-30 19:30:15

by Malahal Naineni

[permalink] [raw]
Subject: [RFC] [PATCH 00/13] NFS4 replication support

The following patch set supports NFS4 replication for read only file
systems. The beginning set of patches are derived from Chuck Lever's
migration tree. The later set of patches support replication from the
interface built from the migration patches. Simple testing shows that it
works!

Here are some issues that I already know. Hopefully all of them
including any new issues will be resolved with the help of this list.

1. Replacing a transport (rpc_xprt) needs waiting for all the pending
RPC tasks. This doesn't work for now.
2. In order to use the replicated server, in addition to replacing the
transport, we also clone a new nfs_client. Currently the old
nfs_client structure is not released. We can wait for all the active
commands and then release the nfs_client (needs more work!). Or
simply keep all old nfs_clients until nfs_free_server() is called and
then release them.

Regards, Malahal.



Chuck Lever (3):
SUNRPC: Add API to acquire source address
NFS: Add an API for cloning an nfs_client
NFS: Save root file handle in nfs_server

Malahal Naineni (6):
NFS: Store server locations for replication
NFS: Add replica servers to volumes proc file.
NFS: Add replace transport infrastructure for replication
NFS: Add replication capability to state manager thread.
NFS: Handle replication on a timeout error
NFS: Avoid spurious replication recoveries

Trond Myklebust (4):
SUNRPC: Allow temporary blocking of an rpc client
SUNRPC: Use RCU to dereference the rpc_clnt.cl_xprt field
SUNRPC: Move clnt->cl_server into struct rpc_xprt
SUNRPC: Add a helper to switch the transport of the rpc_client

fs/nfs/callback.c | 3 +-
fs/nfs/callback_proc.c | 9 +-
fs/nfs/client.c | 84 +++++++-
fs/nfs/getroot.c | 3 +
fs/nfs/internal.h | 4 +
fs/nfs/nfs4_fs.h | 6 +
fs/nfs/nfs4namespace.c | 148 +++++++++++++
fs/nfs/nfs4proc.c | 69 ++++++-
fs/nfs/nfs4state.c | 39 +++-
fs/nfs/nfs4xdr.c | 16 +-
fs/nfs/super.c | 56 +++++
include/linux/nfs_fs_sb.h | 8 +
include/linux/nfs_xdr.h | 3 +
include/linux/sunrpc/clnt.h | 22 ++-
include/linux/sunrpc/debug.h | 11 +
include/linux/sunrpc/xprt.h | 2 +
net/sunrpc/clnt.c | 473 ++++++++++++++++++++++++++++++++++++-----
net/sunrpc/rpc_pipe.c | 5 +-
net/sunrpc/rpcb_clnt.c | 24 ++-
net/sunrpc/stats.c | 6 +-
net/sunrpc/xprt.c | 15 ++-
21 files changed, 902 insertions(+), 104 deletions(-)

--
1.7.8.3



2012-01-30 19:30:06

by Malahal Naineni

[permalink] [raw]
Subject: [PATCH 12/13] NFS: Handle replication on a timeout error

nfs4_handle_exception and nfs4_async_handle_error now handle ETIMEDOUT
errors by replacing the transport with a replicated server.

The RPC layer tries to handle timeouts by itself in most cases. It
should be made aware of presence of replicated servers so that it can
return time out failures sooner for replication. Right, now it is a
hack, it returns tasks that encounter first timeout.

Signed-off-by: Malahal Naineni <[email protected]>
---
fs/nfs/nfs4proc.c | 14 ++++++++++++++
net/sunrpc/clnt.c | 12 ++++++++++++
2 files changed, 26 insertions(+), 0 deletions(-)

diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 775adb3..2198b13 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -265,6 +265,9 @@ static int nfs4_handle_exception(struct nfs_server *server, int errorcode, struc
switch(errorcode) {
case 0:
return 0;
+ case -ETIMEDOUT:
+ nfs4_schedule_replication_recovery(server);
+ goto wait_on_recovery;
case -NFS4ERR_ADMIN_REVOKED:
case -NFS4ERR_BAD_STATEID:
case -NFS4ERR_OPENMODE:
@@ -3716,6 +3719,16 @@ nfs4_async_handle_error(struct rpc_task *task, const struct nfs_server *server,
if (task->tk_status >= 0)
return 0;
switch(task->tk_status) {
+ case -ETIMEDOUT:
+ printk(KERN_ERR "%s ERROR: %d calling replicate recovery\n",
+ __func__, task->tk_status);
+ rpc_sleep_on(&clp->cl_rpcwaitq, task, NULL);
+ nfs4_schedule_replication_recovery(server);
+ if (test_bit(NFS4CLNT_MANAGER_RUNNING,
+ &clp->cl_state) == 0)
+ rpc_wake_up_queued_task(&clp->cl_rpcwaitq,
+ task);
+ goto restart_call;
case -NFS4ERR_ADMIN_REVOKED:
case -NFS4ERR_BAD_STATEID:
case -NFS4ERR_OPENMODE:
@@ -3762,6 +3775,7 @@ wait_on_recovery:
rpc_sleep_on(&clp->cl_rpcwaitq, task, NULL);
if (test_bit(NFS4CLNT_MANAGER_RUNNING, &clp->cl_state) == 0)
rpc_wake_up_queued_task(&clp->cl_rpcwaitq, task);
+restart_call:
task->tk_status = 0;
return -EAGAIN;
}
diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
index e9e8097..ed15b44 100644
--- a/net/sunrpc/clnt.c
+++ b/net/sunrpc/clnt.c
@@ -1830,6 +1830,18 @@ call_timeout(struct rpc_task *task)
{
struct rpc_clnt *clnt = task->tk_client;

+ /*
+ * TODO: If replicated server is present, propagate timeout
+ * failures as soon as possible to upper layers. We just
+ * assume that replicated server is present in this RFC patch.
+ * RPC client should be made aware of replication later.
+ */
+ if (1) {
+
+ rpc_exit(task, -ETIMEDOUT);
+ return;
+ }
+
if (xprt_adjust_timeout(task->tk_rqstp) == 0) {
dprintk("RPC: %5u call_timeout (minor)\n", task->tk_pid);
goto retry;
--
1.7.8.3


2012-01-30 19:30:22

by Malahal Naineni

[permalink] [raw]
Subject: [PATCH 11/13] NFS: Add replication capability to state manager thread.

State manager now handles replication. It will replce the failed
server with new replicated server.

Signed-off-by: Malahal Naineni <[email protected]>
---
fs/nfs/nfs4_fs.h | 2 ++
fs/nfs/nfs4state.c | 14 ++++++++++++++
include/linux/nfs_fs_sb.h | 3 +++
3 files changed, 19 insertions(+), 0 deletions(-)

diff --git a/fs/nfs/nfs4_fs.h b/fs/nfs/nfs4_fs.h
index b2a973b..afc19cc 100644
--- a/fs/nfs/nfs4_fs.h
+++ b/fs/nfs/nfs4_fs.h
@@ -25,6 +25,7 @@ enum nfs4_client_state {
NFS4CLNT_RECALL_SLOT,
NFS4CLNT_LEASE_CONFIRM,
NFS4CLNT_SERVER_SCOPE_MISMATCH,
+ NFS4CLNT_REPLICATE,
};

enum nfs4_session_state {
@@ -332,6 +333,7 @@ extern void nfs4_close_sync(struct nfs4_state *, fmode_t);
extern void nfs4_state_set_mode_locked(struct nfs4_state *, fmode_t);
extern void nfs4_schedule_lease_recovery(struct nfs_client *);
int nfs4_replace_transport(struct nfs_server *server);
+void nfs4_schedule_replication_recovery(const struct nfs_server *server);
extern void nfs4_schedule_state_manager(struct nfs_client *);
extern void nfs4_schedule_path_down_recovery(struct nfs_client *clp);
extern void nfs4_schedule_stateid_recovery(const struct nfs_server *, struct nfs4_state *);
diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
index c97bbc7..3214376 100644
--- a/fs/nfs/nfs4state.c
+++ b/fs/nfs/nfs4state.c
@@ -1709,6 +1709,15 @@ static void nfs4_set_lease_expired(struct nfs_client *clp, int status)
set_bit(NFS4CLNT_LEASE_EXPIRED, &clp->cl_state);
}

+void nfs4_schedule_replication_recovery(const struct nfs_server *server)
+{
+ struct nfs_client *clp = server->nfs_client;
+ if (test_and_set_bit(NFS4CLNT_REPLICATE, &clp->cl_state) == 0) {
+ clp->cl_failed_server = (struct nfs_server *)server;
+ nfs4_schedule_state_manager(clp);
+ }
+}
+
static void nfs4_state_manager(struct nfs_client *clp)
{
int status = 0;
@@ -1758,6 +1767,11 @@ static void nfs4_state_manager(struct nfs_client *clp)
goto out_error;
}

+ if (test_and_clear_bit(NFS4CLNT_REPLICATE, &clp->cl_state)) {
+ BUG_ON(clp->cl_failed_server == 0);
+ nfs4_replace_transport(clp->cl_failed_server);
+ }
+
/* First recover reboot state... */
if (test_bit(NFS4CLNT_RECLAIM_REBOOT, &clp->cl_state)) {
status = nfs4_do_reclaim(clp,
diff --git a/include/linux/nfs_fs_sb.h b/include/linux/nfs_fs_sb.h
index f5fd4bb..8c16ec5 100644
--- a/include/linux/nfs_fs_sb.h
+++ b/include/linux/nfs_fs_sb.h
@@ -78,6 +78,9 @@ struct nfs_client {
/* The flags used for obtaining the clientid during EXCHANGE_ID */
u32 cl_exchange_flags;
struct nfs4_session *cl_session; /* sharred session */
+
+ /* Valid only under replicate state */
+ struct nfs_server *cl_failed_server;
#endif /* CONFIG_NFS_V4 */

#ifdef CONFIG_NFS_FSCACHE
--
1.7.8.3


2012-01-30 19:30:02

by Malahal Naineni

[permalink] [raw]
Subject: [PATCH 07/13] NFS: Save root file handle in nfs_server

From: Chuck Lever <[email protected]>

Save each FSID's root directory file handle in the FSID's nfs_server
structure on the client. For now, only NFSv4 mounts save the root FH.
This is needed for migration recovery.

Signed-off-by: Chuck Lever <[email protected]>
Signed-off-by: Malahal Naineni <[email protected]>
---
fs/nfs/client.c | 1 +
fs/nfs/getroot.c | 3 +++
include/linux/nfs_fs_sb.h | 1 +
3 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/fs/nfs/client.c b/fs/nfs/client.c
index 3e3c2ff..2045baa 100644
--- a/fs/nfs/client.c
+++ b/fs/nfs/client.c
@@ -1112,6 +1112,7 @@ void nfs_free_server(struct nfs_server *server)
nfs_put_client(server->nfs_client);

nfs_free_iostats(server->io_stats);
+ nfs_free_fhandle(server->rootfh);
bdi_destroy(&server->backing_dev_info);
kfree(server);
nfs_release_automount_timer();
diff --git a/fs/nfs/getroot.c b/fs/nfs/getroot.c
index dcb6154..53eed24 100644
--- a/fs/nfs/getroot.c
+++ b/fs/nfs/getroot.c
@@ -232,6 +232,9 @@ struct dentry *nfs4_get_root(struct super_block *sb, struct nfs_fh *mntfh,
ret = ERR_CAST(inode);
goto out;
}
+ server->rootfh = nfs_alloc_fhandle();
+ if (server->rootfh != NULL)
+ nfs_copy_fh(server->rootfh, mntfh);

error = nfs_superblock_set_dummy_root(sb, inode);
if (error != 0) {
diff --git a/include/linux/nfs_fs_sb.h b/include/linux/nfs_fs_sb.h
index ba4d765..6532d7b 100644
--- a/include/linux/nfs_fs_sb.h
+++ b/include/linux/nfs_fs_sb.h
@@ -157,6 +157,7 @@ struct nfs_server {
struct list_head layouts;
struct list_head delegations;
void (*destroy)(struct nfs_server *);
+ struct nfs_fh *rootfh;

atomic_t active; /* Keep trace of any activity to this server */

--
1.7.8.3


2012-01-30 19:30:47

by Malahal Naineni

[permalink] [raw]
Subject: [PATCH 08/13] NFS: Store server locations for replication

Signed-off-by: Malahal Naineni <[email protected]>
---
fs/nfs/client.c | 4 +++
fs/nfs/nfs4_fs.h | 3 ++
fs/nfs/nfs4proc.c | 36 +++++++++++++++++++++++++++++++
fs/nfs/nfs4xdr.c | 16 +++++++++----
fs/nfs/super.c | 51 +++++++++++++++++++++++++++++++++++++++++++++
include/linux/nfs_fs_sb.h | 2 +
include/linux/nfs_xdr.h | 3 ++
7 files changed, 110 insertions(+), 5 deletions(-)

diff --git a/fs/nfs/client.c b/fs/nfs/client.c
index 2045baa..54de25a 100644
--- a/fs/nfs/client.c
+++ b/fs/nfs/client.c
@@ -1096,6 +1096,8 @@ static struct nfs_server *nfs_alloc_server(void)
*/
void nfs_free_server(struct nfs_server *server)
{
+ int i;
+
dprintk("--> nfs_free_server()\n");

nfs_server_remove_lists(server);
@@ -1113,6 +1115,8 @@ void nfs_free_server(struct nfs_server *server)

nfs_free_iostats(server->io_stats);
nfs_free_fhandle(server->rootfh);
+ for (i = 0; i < NFS_MAX_REPLI_SERVERS; i++)
+ kfree(server->repli_servers[i]);
bdi_destroy(&server->backing_dev_info);
kfree(server);
nfs_release_automount_timer();
diff --git a/fs/nfs/nfs4_fs.h b/fs/nfs/nfs4_fs.h
index 4d7d0ae..7ff0177 100644
--- a/fs/nfs/nfs4_fs.h
+++ b/fs/nfs/nfs4_fs.h
@@ -224,6 +224,9 @@ extern int nfs4_do_close(struct nfs4_state *state, gfp_t gfp_mask, int wait, boo
extern int nfs4_server_capabilities(struct nfs_server *server, struct nfs_fh *fhandle);
extern int nfs4_proc_fs_locations(struct inode *dir, const struct qstr *name,
struct nfs4_fs_locations *fs_locations, struct page *page);
+extern int nfs4_get_fs_locations(struct nfs_server *server,
+ struct nfs4_fs_locations *locations, struct page *page);
+
extern void nfs4_release_lockowner(const struct nfs4_lock_state *);
extern const struct xattr_handler *nfs4_xattr_handlers[];

diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 09674cc..775adb3 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -4814,6 +4814,42 @@ int nfs4_proc_secinfo(struct inode *dir, const struct qstr *name, struct nfs4_se
return err;
}

+int nfs4_get_fs_locations(struct nfs_server *server,
+ struct nfs4_fs_locations *locations, struct page *page)
+{
+ u32 bitmask[2] = {
+ [0] = FATTR4_WORD0_FSID | FATTR4_WORD0_FS_LOCATIONS,
+ };
+ struct nfs4_fs_locations_arg args = {
+ .fh = server->rootfh,
+ .page = page,
+ .bitmask = bitmask,
+ .replication = 1,
+ };
+ struct nfs4_fs_locations_res res = {
+ .fs_locations = locations,
+ .replication = 1,
+ };
+ struct rpc_message msg = {
+ .rpc_proc = &nfs4_procedures[NFSPROC4_CLNT_FS_LOCATIONS],
+ .rpc_argp = &args,
+ .rpc_resp = &res,
+ };
+ int status;
+
+ dprintk("--> %s: FSID %llx:%llx on \"%s\"\n", __func__,
+ (unsigned long long)server->fsid.major,
+ (unsigned long long)server->fsid.minor,
+ server->nfs_client->cl_hostname);
+
+ nfs_fattr_init(&locations->fattr);
+ locations->server = server;
+ locations->nlocations = 0;
+ status = nfs4_call_sync(server->client, server, &msg, &args.seq_args, &res.seq_res, 0);
+ dprintk("<-- %s status=%d\n", __func__, status);
+ return status;
+}
+
#ifdef CONFIG_NFS_V4_1
/*
* Check the exchange flags returned by the server for invalid flags, having
diff --git a/fs/nfs/nfs4xdr.c b/fs/nfs/nfs4xdr.c
index 95e92e4..b2fff0f 100644
--- a/fs/nfs/nfs4xdr.c
+++ b/fs/nfs/nfs4xdr.c
@@ -2723,8 +2723,12 @@ static void nfs4_xdr_enc_fs_locations(struct rpc_rqst *req,

encode_compound_hdr(xdr, req, &hdr);
encode_sequence(xdr, &args->seq_args, &hdr);
- encode_putfh(xdr, args->dir_fh, &hdr);
- encode_lookup(xdr, args->name, &hdr);
+ if (args->replication) {
+ encode_putfh(xdr, args->fh, &hdr);
+ } else {
+ encode_putfh(xdr, args->dir_fh, &hdr);
+ encode_lookup(xdr, args->name, &hdr);
+ }
replen = hdr.replen; /* get the attribute into args->page */
encode_fs_locations(xdr, args->bitmask, &hdr);

@@ -6576,9 +6580,11 @@ static int nfs4_xdr_dec_fs_locations(struct rpc_rqst *req,
status = decode_putfh(xdr);
if (status)
goto out;
- status = decode_lookup(xdr);
- if (status)
- goto out;
+ if (!res->replication) {
+ status = decode_lookup(xdr);
+ if (status)
+ goto out;
+ }
xdr_enter_page(xdr, PAGE_SIZE);
status = decode_getfattr(xdr, &res->fs_locations->fattr,
res->fs_locations->server);
diff --git a/fs/nfs/super.c b/fs/nfs/super.c
index f2e7d7c..3b20fc4 100644
--- a/fs/nfs/super.c
+++ b/fs/nfs/super.c
@@ -322,6 +322,7 @@ static struct dentry *nfs4_referral_mount(struct file_system_type *fs_type,
static struct dentry *nfs4_remote_referral_mount(struct file_system_type *fs_type,
int flags, const char *dev_name, void *raw_data);
static void nfs4_kill_super(struct super_block *sb);
+static void nfs4_store_repli_locations(struct dentry *dentry);

static struct file_system_type nfs4_fs_type = {
.owner = THIS_MODULE,
@@ -2829,6 +2830,11 @@ static struct dentry *nfs4_try_mount(int flags, const char *dev_name,
dfprintk(MOUNT, "<-- nfs4_try_mount() = %ld%s\n",
IS_ERR(res) ? PTR_ERR(res) : 0,
IS_ERR(res) ? " [error]" : "");
+
+ /* Add replication locations */
+ if (!IS_ERR(root_mnt))
+ nfs4_store_repli_locations(res);
+
return res;
}

@@ -3087,4 +3093,49 @@ static struct dentry *nfs4_referral_mount(struct file_system_type *fs_type,
return res;
}

+static void nfs4_store_repli_locations(struct dentry *dentry)
+{
+ struct nfs4_fs_locations *locations;
+ struct nfs_server *server;
+ struct page *page;
+ unsigned int i, s, loc, n;
+ int err;
+
+ page = alloc_page(GFP_KERNEL);
+ locations = kmalloc(sizeof(struct nfs4_fs_locations), GFP_KERNEL);
+ if (!page || !locations)
+ goto free;
+
+ server = NFS_SERVER(dentry->d_inode);
+
+ /* Free old locations */
+ for (i = 0; i < NFS_MAX_REPLI_SERVERS; i++) {
+ kfree(server->repli_servers[i]);
+ server->repli_servers[i] = NULL;
+ }
+
+ err = nfs4_get_fs_locations(server, locations, page);
+ if (err)
+ goto free;
+
+ for (n = 0, loc = 0; loc < locations->nlocations &&
+ n < NFS_MAX_REPLI_SERVERS; loc++) {
+ const struct nfs4_fs_location *location =
+ &locations->locations[loc];
+ for (s = 0; s < location->nservers &&
+ n < NFS_MAX_REPLI_SERVERS; s++) {
+ const struct nfs4_string *buf = &location->servers[s];
+ if (buf->len <= 0 || buf->len > PAGE_SIZE)
+ continue;
+ server->repli_servers[n++] = kstrndup(buf->data,
+ buf->len,
+ GFP_KERNEL);
+ }
+ }
+
+free:
+ if (page)
+ __free_page(page);
+ kfree(locations);
+}
#endif /* CONFIG_NFS_V4 */
diff --git a/include/linux/nfs_fs_sb.h b/include/linux/nfs_fs_sb.h
index 6532d7b..4970c58 100644
--- a/include/linux/nfs_fs_sb.h
+++ b/include/linux/nfs_fs_sb.h
@@ -158,6 +158,8 @@ struct nfs_server {
struct list_head delegations;
void (*destroy)(struct nfs_server *);
struct nfs_fh *rootfh;
+#define NFS_MAX_REPLI_SERVERS 2
+ char *repli_servers[NFS_MAX_REPLI_SERVERS];

atomic_t active; /* Keep trace of any activity to this server */

diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h
index a764cef..ca6debd 100644
--- a/include/linux/nfs_xdr.h
+++ b/include/linux/nfs_xdr.h
@@ -1004,15 +1004,18 @@ struct nfs4_fs_locations {

struct nfs4_fs_locations_arg {
const struct nfs_fh *dir_fh;
+ const struct nfs_fh *fh;
const struct qstr *name;
struct page *page;
const u32 *bitmask;
struct nfs4_sequence_args seq_args;
+ unsigned char replication:1;
};

struct nfs4_fs_locations_res {
struct nfs4_fs_locations *fs_locations;
struct nfs4_sequence_res seq_res;
+ unsigned char replication:1;
};

struct nfs4_secinfo_oid {
--
1.7.8.3


2012-01-30 19:30:54

by Malahal Naineni

[permalink] [raw]
Subject: [PATCH 13/13] NFS: Avoid spurious replication recoveries

As soon as we detect a server failure, we quickly handle replication
recovery without waiting for all the active commands to finish from the
failed server. The first error would cause us to work with a different
(replicated) server.

Any later failures from the old server are indistinguishable from the
new replicated server. These failures from the old server trigger a
spurious replication recovery again. To avoid this, we add start time to
nfs_client. If this is a recent enough nfs_client, we don't handle
replication recovery and just retry the request instead.

Signed-off-by: Malahal Naineni <[email protected]>
---
fs/nfs/client.c | 1 +
fs/nfs/nfs4namespace.c | 23 +++++++++++++++++++++++
include/linux/nfs_fs_sb.h | 1 +
3 files changed, 25 insertions(+), 0 deletions(-)

diff --git a/fs/nfs/client.c b/fs/nfs/client.c
index 000ebdb..f0d8d24 100644
--- a/fs/nfs/client.c
+++ b/fs/nfs/client.c
@@ -1477,6 +1477,7 @@ int nfs4_clone_client(struct nfs_client *clp, const struct sockaddr *sap,
* lose state.
*/
new->cl_boot_time = clp->cl_boot_time;
+ new->cl_start_time = jiffies;

dprintk("<-- %s moved (%llx:%llx) to nfs_client %p\n", __func__,
(unsigned long long)server->fsid.major,
diff --git a/fs/nfs/nfs4namespace.c b/fs/nfs/nfs4namespace.c
index ee75e27..617d6bf 100644
--- a/fs/nfs/nfs4namespace.c
+++ b/fs/nfs/nfs4namespace.c
@@ -356,6 +356,29 @@ int nfs4_replace_transport(struct nfs_server *server)
unsigned int i;
int error;

+ /*
+ * As soon as we detect a server failure, we quickly handle
+ * replication recovery without waiting for all the active
+ * commands to finish from the failed server. The first error
+ * would cause us to work with a different (replicated) server.
+ *
+ * Any later failures from the old server are indistinguishable
+ * from the new replicated server. These failures from the old
+ * server trigger a spurious replication recovery again. To
+ * avoid this, we add start time to nfs_client. If this is a
+ * recent enough nfs_client, we don't handle replication
+ * recovery and just retry the request instead.
+ */
+#define NFS_REPLI_SETTLE(rclient) (2 * (rclient)->cl_timeout->to_initval)
+ if (time_before(jiffies, server->nfs_client->cl_start_time +
+ NFS_REPLI_SETTLE(server->client))) {
+ dprintk("%s() ignoring spurious replication request, "
+ "current: %lu, client start: %lu, repli_settle: %lu\n",
+ __func__, jiffies, server->nfs_client->cl_start_time,
+ NFS_REPLI_SETTLE(server->client));
+ return 0;
+ }
+
sap = kmalloc(addr_bufsize, GFP_KERNEL);
if (sap == NULL) {
error = -ENOMEM;
diff --git a/include/linux/nfs_fs_sb.h b/include/linux/nfs_fs_sb.h
index 8c16ec5..9d79a0e 100644
--- a/include/linux/nfs_fs_sb.h
+++ b/include/linux/nfs_fs_sb.h
@@ -88,6 +88,7 @@ struct nfs_client {
#endif

struct server_scope *server_scope; /* from exchange_id */
+ unsigned long cl_start_time;
};

/*
--
1.7.8.3


2012-01-30 19:30:17

by Malahal Naineni

[permalink] [raw]
Subject: [PATCH 01/13] SUNRPC: Allow temporary blocking of an rpc client

From: Trond Myklebust <[email protected]>

Add a mechanism to allow us to temporarily block an rpc client while
we do surgery on its transport and authentication code.

The new function rpc_lock_client() will block all new rpc calls from
starting, and then wait for existing rpc calls to complete. If the
wait times out before the rpc calls have completed, then the function
returns the number of outstanding active tasks, otherwise it returns 0.

In the event of a non-zero return value, it is up to the caller either
to cancel the lock (by calling rpc_unlock_client), or to take the
appropriate action to ensure the existing rpc calls complete (e.g.
by calling rpc_killall_tasks()).

Signed-off-by: Trond Myklebust <[email protected]>
---
include/linux/sunrpc/clnt.h | 11 ++++++
net/sunrpc/clnt.c | 72 +++++++++++++++++++++++++++++++++++++++++++
2 files changed, 83 insertions(+), 0 deletions(-)

diff --git a/include/linux/sunrpc/clnt.h b/include/linux/sunrpc/clnt.h
index 2c5993a..c85696e 100644
--- a/include/linux/sunrpc/clnt.h
+++ b/include/linux/sunrpc/clnt.h
@@ -24,6 +24,7 @@
#include <asm/signal.h>
#include <linux/path.h>
#include <net/ipv6.h>
+#include <linux/completion.h>

struct rpc_inode;

@@ -32,6 +33,7 @@ struct rpc_inode;
*/
struct rpc_clnt {
atomic_t cl_count; /* Number of references */
+ atomic_t cl_active_tasks;/* Number of active tasks */
struct list_head cl_clients; /* Global list of clients */
struct list_head cl_tasks; /* List of tasks */
spinlock_t cl_lock; /* spinlock */
@@ -47,6 +49,10 @@ struct rpc_clnt {
struct rpc_stat * cl_stats; /* per-program statistics */
struct rpc_iostats * cl_metrics; /* per-client statistics */

+ unsigned long cl_flags; /* Bit flags */
+ struct rpc_wait_queue cl_waitqueue;
+ struct completion cl_completion;
+
unsigned int cl_softrtry : 1,/* soft timeouts */
cl_discrtry : 1,/* disconnect before retry */
cl_autobind : 1,/* use getport() */
@@ -66,6 +72,8 @@ struct rpc_clnt {
char *cl_principal; /* target to authenticate to */
};

+#define RPC_CLIENT_LOCKED 0
+
/*
* General RPC program info
*/
@@ -136,6 +144,9 @@ void rpc_shutdown_client(struct rpc_clnt *);
void rpc_release_client(struct rpc_clnt *);
void rpc_task_release_client(struct rpc_task *);

+int rpc_lock_client(struct rpc_clnt *clnt, unsigned long timeout);
+void rpc_unlock_client(struct rpc_clnt *clnt);
+
int rpcb_create_local(void);
void rpcb_put_local(void);
int rpcb_register(u32, u32, int, unsigned short);
diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
index f0268ea..b6a7817 100644
--- a/net/sunrpc/clnt.c
+++ b/net/sunrpc/clnt.c
@@ -225,6 +225,8 @@ static struct rpc_clnt * rpc_new_client(const struct rpc_create_args *args, stru

atomic_set(&clnt->cl_count, 1);

+ rpc_init_wait_queue(&clnt->cl_waitqueue, "client waitqueue");
+
err = rpc_setup_pipedir(clnt, program->pipe_dir_name);
if (err < 0)
goto out_no_path;
@@ -394,6 +396,8 @@ rpc_clone_client(struct rpc_clnt *clnt)
goto out_no_principal;
}
atomic_set(&new->cl_count, 1);
+ atomic_set(&new->cl_active_tasks, 0);
+ rpc_init_wait_queue(&new->cl_waitqueue, "client waitqueue");
err = rpc_setup_pipedir(new, clnt->cl_program->pipe_dir_name);
if (err != 0)
goto out_no_path;
@@ -570,11 +574,76 @@ out:
}
EXPORT_SYMBOL_GPL(rpc_bind_new_program);

+/**
+ * rpc_lock_client - lock the RPC client
+ * @clnt: pointer to a struct rpc_clnt
+ * @timeout: timeout parameter to pass to wait_for_completion_timeout()
+ *
+ * This function sets the RPC_CLIENT_LOCKED flag, which causes
+ * all new rpc_tasks to wait instead of executing. It then waits for
+ * any existing active tasks to complete.
+ */
+int rpc_lock_client(struct rpc_clnt *clnt, unsigned long timeout)
+{
+ if (!test_and_set_bit(RPC_CLIENT_LOCKED, &clnt->cl_flags))
+ init_completion(&clnt->cl_completion);
+
+ if (atomic_read(&clnt->cl_active_tasks) &&
+ !wait_for_completion_timeout(&clnt->cl_completion, timeout))
+ return -ETIMEDOUT;
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(rpc_lock_client);
+
+/**
+ * rpc_unlock_client
+ * @clnt: pointer to a struct rpc_clnt
+ *
+ * Clears the RPC_CLIENT_LOCKED flag, and starts any rpc_tasks that
+ * were waiting on it.
+ */
+void rpc_unlock_client(struct rpc_clnt *clnt)
+{
+ spin_lock(&clnt->cl_lock);
+ clear_bit(RPC_CLIENT_LOCKED, &clnt->cl_flags);
+ spin_unlock(&clnt->cl_lock);
+ rpc_wake_up(&clnt->cl_waitqueue);
+}
+EXPORT_SYMBOL_GPL(rpc_unlock_client);
+
+static void rpc_task_clear_active(struct rpc_task *task)
+{
+ struct rpc_clnt *clnt = task->tk_client;
+
+ if (atomic_dec_and_test(&clnt->cl_active_tasks) &&
+ test_bit(RPC_CLIENT_LOCKED, &clnt->cl_flags))
+ complete(&clnt->cl_completion);
+}
+
+static void rpc_task_set_active(struct rpc_task *task)
+{
+ struct rpc_clnt *clnt = task->tk_client;
+
+ atomic_inc(&clnt->cl_active_tasks);
+ if (unlikely(test_bit(RPC_CLIENT_LOCKED, &clnt->cl_flags))) {
+ spin_lock(&clnt->cl_lock);
+ if (test_bit(RPC_CLIENT_LOCKED, &clnt->cl_flags) &&
+ !RPC_ASSASSINATED(task)) {
+ rpc_sleep_on(&clnt->cl_waitqueue, task,
+ rpc_task_set_active);
+ rpc_task_clear_active(task);
+ }
+ spin_unlock(&clnt->cl_lock);
+ }
+}
+
void rpc_task_release_client(struct rpc_task *task)
{
struct rpc_clnt *clnt = task->tk_client;

if (clnt != NULL) {
+ rpc_task_clear_active(task);
/* Remove from client task list */
spin_lock(&clnt->cl_lock);
list_del(&task->tk_task);
@@ -598,6 +667,9 @@ void rpc_task_set_client(struct rpc_task *task, struct rpc_clnt *clnt)
spin_lock(&clnt->cl_lock);
list_add_tail(&task->tk_task, &clnt->cl_tasks);
spin_unlock(&clnt->cl_lock);
+
+ /* Notify the client when this task is activated */
+ task->tk_callback = rpc_task_set_active;
}
}

--
1.7.8.3


2012-01-30 19:30:45

by Malahal Naineni

[permalink] [raw]
Subject: [PATCH 06/13] NFS: Add an API for cloning an nfs_client

From: Chuck Lever <[email protected]>

After a migration event, we have to preserve the long-form client ID
or session ID the client used with the source server, and introduce it
to the destination server, in case the migration transparently
migrated state for the migrating FSID.

To preserve this state information, clone the source FSID's
nfs_client. The migrated FSID is moved from the original nfs_client
to the cloned one.

Signed-off-by: Chuck Lever <[email protected]>
Signed-off-by: malahal Naineni <[email protected]>
---
fs/nfs/client.c | 46 ++++++++++++++++++++++++++++++++++++++++++++++
fs/nfs/internal.h | 4 ++++
2 files changed, 50 insertions(+), 0 deletions(-)

diff --git a/fs/nfs/client.c b/fs/nfs/client.c
index be5e702..3e3c2ff 100644
--- a/fs/nfs/client.c
+++ b/fs/nfs/client.c
@@ -1435,6 +1435,52 @@ error:
return error;
}

+int nfs4_clone_client(struct nfs_client *clp, const struct sockaddr *sap,
+ size_t salen, const char *ip_addr,
+ struct nfs_server *server)
+{
+ struct rpc_clnt *rpcclnt = clp->cl_rpcclient;
+ struct nfs_client_initdata cl_init = {
+ .addr = sap,
+ .addrlen = salen,
+ .rpc_ops = &nfs_v4_clientops,
+ .proto = rpc_protocol(rpcclnt),
+ .minorversion = clp->cl_minorversion,
+ };
+ struct nfs_client *new;
+ int status = 0;
+
+ dprintk("--> %s cloning \"%s\" (client ID %llx)\n",
+ __func__, clp->cl_hostname, clp->cl_clientid);
+
+ new = nfs_get_client(&cl_init, rpcclnt->cl_timeout, ip_addr,
+ rpcclnt->cl_auth->au_flavor, 0);
+ if (IS_ERR(new)) {
+ dprintk("<-- %s nfs_get_client failed\n", __func__);
+ status = PTR_ERR(new);
+ goto out;
+ }
+
+ nfs_server_remove_lists(server);
+ server->nfs_client = new;
+ nfs_server_insert_lists(server);
+
+ /*
+ * The client ID verifier is derived from cl_boot_time.
+ * This verifier must not change, or callback update will
+ * act like a regular SETCLIENTID, causing the server to
+ * lose state.
+ */
+ new->cl_boot_time = clp->cl_boot_time;
+
+ dprintk("<-- %s moved (%llx:%llx) to nfs_client %p\n", __func__,
+ (unsigned long long)server->fsid.major,
+ (unsigned long long)server->fsid.minor, new);
+
+out:
+ return status;
+}
+
/*
* Set up a pNFS Data Server client.
*
diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
index 8102db9..93c8daa 100644
--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -165,6 +165,10 @@ extern struct nfs_server *nfs_clone_server(struct nfs_server *,
struct nfs_fattr *);
extern void nfs_mark_client_ready(struct nfs_client *clp, int state);
extern int nfs4_check_client_ready(struct nfs_client *clp);
+extern int nfs4_clone_client(struct nfs_client *clp,
+ const struct sockaddr *sap, size_t salen,
+ const char *ip_addr,
+ struct nfs_server *server);
extern struct nfs_client *nfs4_set_ds_client(struct nfs_client* mds_clp,
const struct sockaddr *ds_addr,
int ds_addrlen, int ds_proto);
--
1.7.8.3


2012-01-30 19:30:01

by Malahal Naineni

[permalink] [raw]
Subject: [PATCH 03/13] SUNRPC: Move clnt->cl_server into struct rpc_xprt

From: Trond Myklebust <[email protected]>

When the cl_xprt field is updated, the cl_server field will also have
to change. Since the contents of cl_server follow the remote endpoint
of cl_xprt, just move that field to the rpc_xprt.

Signed-off-by: Trond Myklebust <[email protected]>
[ cel: simplify check_gss_callback_principal(), whitespace changes ]
[ cel: forward ported to 3.1 ]
Signed-off-by: Chuck Lever <[email protected]>
Signed-off-by: Malahal Naineni <[email protected]>
---
fs/nfs/callback.c | 3 +-
fs/nfs/nfs4proc.c | 4 ++-
include/linux/sunrpc/clnt.h | 3 +-
include/linux/sunrpc/xprt.h | 2 +
net/sunrpc/clnt.c | 74 ++++++++++++++++++++----------------------
net/sunrpc/rpc_pipe.c | 2 +-
net/sunrpc/rpcb_clnt.c | 9 +++--
net/sunrpc/xprt.c | 15 ++++++++-
8 files changed, 62 insertions(+), 50 deletions(-)

diff --git a/fs/nfs/callback.c b/fs/nfs/callback.c
index 516f337..41970c6 100644
--- a/fs/nfs/callback.c
+++ b/fs/nfs/callback.c
@@ -332,7 +332,6 @@ void nfs_callback_down(int minorversion)
int
check_gss_callback_principal(struct nfs_client *clp, struct svc_rqst *rqstp)
{
- struct rpc_clnt *r = clp->cl_rpcclient;
char *p = svc_gss_principal(rqstp);

if (rqstp->rq_authop->flavour != RPC_AUTH_GSS)
@@ -353,7 +352,7 @@ check_gss_callback_principal(struct nfs_client *clp, struct svc_rqst *rqstp)
if (memcmp(p, "nfs@", 4) != 0)
return 0;
p += 4;
- if (strcmp(p, r->cl_server) != 0)
+ if (strcmp(p, clp->cl_hostname) != 0)
return 0;
return 1;
}
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 6ceae67..09674cc 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -1098,10 +1098,12 @@ static struct nfs4_state *nfs4_opendata_to_nfs4_state(struct nfs4_opendata *data
delegation_flags = delegation->flags;
rcu_read_unlock();
if (data->o_arg.claim == NFS4_OPEN_CLAIM_DELEGATE_CUR) {
+ rcu_read_lock();
pr_err_ratelimited("NFS: Broken NFSv4 server %s is "
"returning a delegation for "
"OPEN(CLAIM_DELEGATE_CUR)\n",
- NFS_CLIENT(inode)->cl_server);
+ rcu_dereference(NFS_CLIENT(inode)->cl_xprt)->servername);
+ rcu_read_unlock();
} else if ((delegation_flags & 1UL<<NFS_DELEGATION_NEED_RECLAIM) == 0)
nfs_inode_set_delegation(state->inode,
data->owner->so_cred,
diff --git a/include/linux/sunrpc/clnt.h b/include/linux/sunrpc/clnt.h
index a0a384f..0adc955 100644
--- a/include/linux/sunrpc/clnt.h
+++ b/include/linux/sunrpc/clnt.h
@@ -43,7 +43,6 @@ struct rpc_clnt {
cl_vers, /* RPC version number */
cl_maxproc; /* max procedure number */

- char * cl_server; /* server machine name */
char * cl_protname; /* protocol name */
struct rpc_auth * cl_auth; /* authenticator */
struct rpc_stat * cl_stats; /* per-program statistics */
@@ -117,7 +116,7 @@ struct rpc_create_args {
size_t addrsize;
struct sockaddr *saddress;
const struct rpc_timeout *timeout;
- char *servername;
+ const char *servername;
struct rpc_program *program;
u32 prognumber; /* overrides program->number */
u32 version;
diff --git a/include/linux/sunrpc/xprt.h b/include/linux/sunrpc/xprt.h
index 15518a1..4b1d0f2 100644
--- a/include/linux/sunrpc/xprt.h
+++ b/include/linux/sunrpc/xprt.h
@@ -226,6 +226,7 @@ struct rpc_xprt {
} stat;

struct net *xprt_net;
+ const char *servername;
const char *address_strings[RPC_DISPLAY_MAX];
};

@@ -255,6 +256,7 @@ struct xprt_create {
struct sockaddr * srcaddr; /* optional local address */
struct sockaddr * dstaddr; /* remote peer address */
size_t addrlen;
+ const char * servername;
struct svc_xprt *bc_xprt; /* NFSv4.1 backchannel */
};

diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
index e67eba3..26c1a2f 100644
--- a/net/sunrpc/clnt.c
+++ b/net/sunrpc/clnt.c
@@ -149,15 +149,8 @@ static struct rpc_clnt * rpc_new_client(const struct rpc_create_args *args, stru
struct rpc_clnt *clnt = NULL;
struct rpc_auth *auth;
int err;
- size_t len;

/* sanity check the name before trying to print it */
- err = -EINVAL;
- len = strlen(args->servername);
- if (len > RPC_MAXNETNAMELEN)
- goto out_no_rpciod;
- len++;
-
dprintk("RPC: creating %s client for %s (xprt %p)\n",
program->name, args->servername, xprt);

@@ -180,16 +173,6 @@ static struct rpc_clnt * rpc_new_client(const struct rpc_create_args *args, stru
goto out_err;
clnt->cl_parent = clnt;

- clnt->cl_server = clnt->cl_inline_name;
- if (len > sizeof(clnt->cl_inline_name)) {
- char *buf = kmalloc(len, GFP_KERNEL);
- if (buf != NULL)
- clnt->cl_server = buf;
- else
- len = sizeof(clnt->cl_inline_name);
- }
- strlcpy(clnt->cl_server, args->servername, len);
-
rcu_assign_pointer(clnt->cl_xprt, xprt);
clnt->cl_procinfo = version->procs;
clnt->cl_maxproc = version->nrprocs;
@@ -258,8 +241,6 @@ out_no_path:
out_no_principal:
rpc_free_iostats(clnt->cl_metrics);
out_no_stats:
- if (clnt->cl_server != clnt->cl_inline_name)
- kfree(clnt->cl_server);
kfree(clnt);
out_err:
xprt_put(xprt);
@@ -289,6 +270,7 @@ struct rpc_clnt *rpc_create(struct rpc_create_args *args)
.srcaddr = args->saddress,
.dstaddr = args->address,
.addrlen = args->addrsize,
+ .servername = args->servername,
.bc_xprt = args->bc_xprt,
};
char servername[48];
@@ -297,7 +279,7 @@ struct rpc_clnt *rpc_create(struct rpc_create_args *args)
* If the caller chooses not to specify a hostname, whip
* up a string representation of the passed-in address.
*/
- if (args->servername == NULL) {
+ if (xprtargs.servername == NULL) {
struct sockaddr_un *sun =
(struct sockaddr_un *)args->address;
struct sockaddr_in *sin =
@@ -324,7 +306,7 @@ struct rpc_clnt *rpc_create(struct rpc_create_args *args)
* address family isn't recognized. */
return ERR_PTR(-EINVAL);
}
- args->servername = servername;
+ xprtargs.servername = servername;
}

xprt = xprt_create_transport(&xprtargs);
@@ -466,8 +448,9 @@ EXPORT_SYMBOL_GPL(rpc_killall_tasks);
*/
void rpc_shutdown_client(struct rpc_clnt *clnt)
{
- dprintk("RPC: shutting down %s client for %s\n",
- clnt->cl_protname, clnt->cl_server);
+ dprintk_rcu("RPC: shutting down %s client for %s\n",
+ clnt->cl_protname,
+ rcu_dereference(clnt->cl_xprt)->servername);

while (!list_empty(&clnt->cl_tasks)) {
rpc_killall_tasks(clnt);
@@ -485,8 +468,9 @@ EXPORT_SYMBOL_GPL(rpc_shutdown_client);
static void
rpc_free_client(struct rpc_clnt *clnt)
{
- dprintk("RPC: destroying %s client for %s\n",
- clnt->cl_protname, clnt->cl_server);
+ dprintk_rcu("RPC: destroying %s client for %s\n",
+ clnt->cl_protname,
+ rcu_dereference(clnt->cl_xprt)->servername);
if (!IS_ERR(clnt->cl_path.dentry)) {
rpc_remove_client_dir(clnt->cl_path.dentry);
rpc_put_mount();
@@ -495,8 +479,6 @@ rpc_free_client(struct rpc_clnt *clnt)
rpc_release_client(clnt->cl_parent);
goto out_free;
}
- if (clnt->cl_server != clnt->cl_inline_name)
- kfree(clnt->cl_server);
out_free:
rpc_unregister_client(clnt);
rpc_free_iostats(clnt->cl_metrics);
@@ -1649,8 +1631,11 @@ call_timeout(struct rpc_task *task)
}
if (RPC_IS_SOFT(task)) {
if (clnt->cl_chatty)
+ rcu_read_lock();
printk(KERN_NOTICE "%s: server %s not responding, timed out\n",
- clnt->cl_protname, clnt->cl_server);
+ clnt->cl_protname,
+ rcu_dereference(clnt->cl_xprt)->servername);
+ rcu_read_unlock();
if (task->tk_flags & RPC_TASK_TIMEOUT)
rpc_exit(task, -ETIMEDOUT);
else
@@ -1660,9 +1645,13 @@ call_timeout(struct rpc_task *task)

if (!(task->tk_flags & RPC_CALL_MAJORSEEN)) {
task->tk_flags |= RPC_CALL_MAJORSEEN;
- if (clnt->cl_chatty)
+ if (clnt->cl_chatty) {
+ rcu_read_lock();
printk(KERN_NOTICE "%s: server %s not responding, still trying\n",
- clnt->cl_protname, clnt->cl_server);
+ clnt->cl_protname,
+ rcu_dereference(clnt->cl_xprt)->servername);
+ rcu_read_unlock();
+ }
}
rpc_force_rebind(clnt);
/*
@@ -1691,9 +1680,13 @@ call_decode(struct rpc_task *task)
dprint_status(task);

if (task->tk_flags & RPC_CALL_MAJORSEEN) {
- if (clnt->cl_chatty)
+ if (clnt->cl_chatty) {
+ rcu_read_lock();
printk(KERN_NOTICE "%s: server %s OK\n",
- clnt->cl_protname, clnt->cl_server);
+ clnt->cl_protname,
+ rcu_dereference(clnt->cl_xprt)->servername);
+ rcu_read_unlock();
+ }
task->tk_flags &= ~RPC_CALL_MAJORSEEN;
}

@@ -1843,8 +1836,11 @@ rpc_verify_header(struct rpc_task *task)
task->tk_action = call_bind;
goto out_retry;
case RPC_AUTH_TOOWEAK:
+ rcu_read_lock();
printk(KERN_NOTICE "RPC: server %s requires stronger "
- "authentication.\n", task->tk_client->cl_server);
+ "authentication.\n",
+ rcu_dereference(task->tk_client->cl_xprt)->servername);
+ rcu_read_unlock();
break;
default:
dprintk("RPC: %5u %s: unknown auth error: %x\n",
@@ -1867,28 +1863,28 @@ rpc_verify_header(struct rpc_task *task)
case RPC_SUCCESS:
return p;
case RPC_PROG_UNAVAIL:
- dprintk("RPC: %5u %s: program %u is unsupported by server %s\n",
+ dprintk_rcu("RPC: %5u %s: program %u is unsupported by server %s\n",
task->tk_pid, __func__,
(unsigned int)task->tk_client->cl_prog,
- task->tk_client->cl_server);
+ rcu_dereference(task->tk_client->cl_xprt)->servername);
error = -EPFNOSUPPORT;
goto out_err;
case RPC_PROG_MISMATCH:
- dprintk("RPC: %5u %s: program %u, version %u unsupported by "
+ dprintk_rcu("RPC: %5u %s: program %u, version %u unsupported by "
"server %s\n", task->tk_pid, __func__,
(unsigned int)task->tk_client->cl_prog,
(unsigned int)task->tk_client->cl_vers,
- task->tk_client->cl_server);
+ rcu_dereference(task->tk_client->cl_xprt)->servername);
error = -EPROTONOSUPPORT;
goto out_err;
case RPC_PROC_UNAVAIL:
- dprintk("RPC: %5u %s: proc %s unsupported by program %u, "
+ dprintk_rcu("RPC: %5u %s: proc %s unsupported by program %u, "
"version %u on server %s\n",
task->tk_pid, __func__,
rpc_proc_name(task),
task->tk_client->cl_prog,
task->tk_client->cl_vers,
- task->tk_client->cl_server);
+ rcu_dereference(task->tk_client->cl_xprt)->servername);
error = -EOPNOTSUPP;
goto out_err;
case RPC_GARBAGE_ARGS:
diff --git a/net/sunrpc/rpc_pipe.c b/net/sunrpc/rpc_pipe.c
index e8d212d..71424dd 100644
--- a/net/sunrpc/rpc_pipe.c
+++ b/net/sunrpc/rpc_pipe.c
@@ -380,7 +380,7 @@ rpc_show_info(struct seq_file *m, void *v)
struct rpc_clnt *clnt = m->private;

rcu_read_lock();
- seq_printf(m, "RPC server: %s\n", clnt->cl_server);
+ seq_printf(m, "RPC server: %s\n", rcu_dereference(clnt->cl_xprt)->servername);
seq_printf(m, "service: %s (%d) version %d\n", clnt->cl_protname,
clnt->cl_prog, clnt->cl_vers);
seq_printf(m, "address: %s\n", rpc_peeraddr2str(clnt, RPC_DISPLAY_ADDR));
diff --git a/net/sunrpc/rpcb_clnt.c b/net/sunrpc/rpcb_clnt.c
index 848fe90..94017f6 100644
--- a/net/sunrpc/rpcb_clnt.c
+++ b/net/sunrpc/rpcb_clnt.c
@@ -340,8 +340,9 @@ out:
return result;
}

-static struct rpc_clnt *rpcb_create(char *hostname, struct sockaddr *srvaddr,
- size_t salen, int proto, u32 version)
+static struct rpc_clnt *rpcb_create(const char *hostname,
+ struct sockaddr *srvaddr, size_t salen,
+ int proto, u32 version)
{
struct rpc_create_args args = {
.net = &init_net,
@@ -654,7 +655,7 @@ void rpcb_getport_async(struct rpc_task *task)

dprintk("RPC: %5u %s(%s, %u, %u, %d)\n",
task->tk_pid, __func__,
- clnt->cl_server, clnt->cl_prog, clnt->cl_vers, xprt->prot);
+ xprt->servername, clnt->cl_prog, clnt->cl_vers, xprt->prot);

/* Put self on the wait queue to ensure we get notified if
* some other task is already attempting to bind the port */
@@ -705,7 +706,7 @@ void rpcb_getport_async(struct rpc_task *task)
dprintk("RPC: %5u %s: trying rpcbind version %u\n",
task->tk_pid, __func__, bind_version);

- rpcb_clnt = rpcb_create(clnt->cl_server, sap, salen, xprt->prot,
+ rpcb_clnt = rpcb_create(xprt->servername, sap, salen, xprt->prot,
bind_version);
if (IS_ERR(rpcb_clnt)) {
status = PTR_ERR(rpcb_clnt);
diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c
index c64c0ef..339131c 100644
--- a/net/sunrpc/xprt.c
+++ b/net/sunrpc/xprt.c
@@ -66,6 +66,7 @@ static void xprt_init(struct rpc_xprt *xprt, struct net *net);
static void xprt_request_init(struct rpc_task *, struct rpc_xprt *);
static void xprt_connect_status(struct rpc_task *task);
static int __xprt_get_cong(struct rpc_xprt *, struct rpc_task *);
+static void xprt_destroy(struct rpc_xprt *xprt);

static DEFINE_SPINLOCK(xprt_list_lock);
static LIST_HEAD(xprt_list);
@@ -750,7 +751,7 @@ static void xprt_connect_status(struct rpc_task *task)
default:
dprintk("RPC: %5u xprt_connect_status: error %d connecting to "
"server %s\n", task->tk_pid, -task->tk_status,
- task->tk_client->cl_server);
+ xprt->servername);
xprt_release_write(xprt, task);
task->tk_status = -EIO;
}
@@ -1220,6 +1221,17 @@ found:
(unsigned long)xprt);
else
init_timer(&xprt->timer);
+
+ if (strlen(args->servername) > RPC_MAXNETNAMELEN) {
+ xprt_destroy(xprt);
+ return ERR_PTR(-EINVAL);
+ }
+ xprt->servername = kstrdup(args->servername, GFP_KERNEL);
+ if (xprt->servername == NULL) {
+ xprt_destroy(xprt);
+ return ERR_PTR(-ENOMEM);
+ }
+
dprintk("RPC: created transport %p with %u slots\n", xprt,
xprt->max_reqs);
out:
@@ -1242,6 +1254,7 @@ static void xprt_destroy(struct rpc_xprt *xprt)
rpc_destroy_wait_queue(&xprt->sending);
rpc_destroy_wait_queue(&xprt->backlog);
cancel_work_sync(&xprt->task_cleanup);
+ kfree(xprt->servername);
/*
* Tear down transport state and free the rpc_xprt
*/
--
1.7.8.3


2012-01-30 19:30:37

by Malahal Naineni

[permalink] [raw]
Subject: [PATCH 10/13] NFS: Add replace transport infrastructure for replication

Adds nfs4_replace_transport() that goes through next replicated server
from the stored replication locations. It replaces the transport.

Signed-off-by: Malahal Naineni <[email protected]>
---
fs/nfs/nfs4_fs.h | 1 +
fs/nfs/nfs4namespace.c | 125 +++++++++++++++++++++++++++++++++++++++++++++
include/linux/nfs_fs_sb.h | 1 +
3 files changed, 127 insertions(+), 0 deletions(-)

diff --git a/fs/nfs/nfs4_fs.h b/fs/nfs/nfs4_fs.h
index 7ff0177..b2a973b 100644
--- a/fs/nfs/nfs4_fs.h
+++ b/fs/nfs/nfs4_fs.h
@@ -331,6 +331,7 @@ extern void nfs4_close_state(struct nfs4_state *, fmode_t);
extern void nfs4_close_sync(struct nfs4_state *, fmode_t);
extern void nfs4_state_set_mode_locked(struct nfs4_state *, fmode_t);
extern void nfs4_schedule_lease_recovery(struct nfs_client *);
+int nfs4_replace_transport(struct nfs_server *server);
extern void nfs4_schedule_state_manager(struct nfs_client *);
extern void nfs4_schedule_path_down_recovery(struct nfs_client *clp);
extern void nfs4_schedule_stateid_recovery(const struct nfs_server *, struct nfs4_state *);
diff --git a/fs/nfs/nfs4namespace.c b/fs/nfs/nfs4namespace.c
index bb80c49..ee75e27 100644
--- a/fs/nfs/nfs4namespace.c
+++ b/fs/nfs/nfs4namespace.c
@@ -263,3 +263,128 @@ out:
dprintk("%s: done\n", __func__);
return mnt;
}
+
+/*
+ * Returns zero on success, or a negative errno value.
+ */
+static int nfs4_update_server(struct nfs_server *server, const char *hostname,
+ struct sockaddr *sap, size_t salen)
+{
+ struct nfs_client *clp = server->nfs_client;
+ struct rpc_clnt *clnt = server->client;
+ struct xprt_create xargs = {
+ .ident = clp->cl_proto,
+ .net = &init_net,
+ .dstaddr = sap,
+ .addrlen = salen,
+ .servername = hostname,
+ };
+ char buf[INET6_ADDRSTRLEN + 1];
+ struct sockaddr_storage address;
+ struct sockaddr *localaddr = (struct sockaddr *)&address;
+ int error;
+
+ dprintk("--> %s: move FSID %llx:%llx to \"%s\")\n", __func__,
+ (unsigned long long)server->fsid.major,
+ (unsigned long long)server->fsid.minor,
+ hostname);
+
+ /*
+ * rpc_lock_client() deadlocks here. This is because the tasks
+ * that received NFS4ERR_MOVED are waiting for us to wake them
+ * when we are done recovering. But they have bumped
+ * cl_active_tasks for this clnt, so rpc_lock_client() can't make
+ * any progress.
+ */
+#ifdef USE_RPC_LOCK_CLIENT
+ error = rpc_lock_client(clnt, clnt->cl_timeout->to_maxval);
+ if (error != 0) {
+ dprintk("<-- %s(): rpc_lock_client returned %d\n",
+ __func__, error);
+ goto out;
+ }
+#endif /* USE_RPC_LOCK_CLIENT */
+
+ error = rpc_switch_client_transport(clnt, &xargs, clnt->cl_timeout);
+ if (error != 0) {
+ dprintk("<-- %s(): rpc_switch_client_transport returned %d\n",
+ __func__, error);
+ goto out;
+ }
+
+ /*
+ * If we were able to contact the server at @sap, set up a new
+ * nfs_client and move @server to it.
+ */
+ error = rpc_localaddr(clnt, localaddr, sizeof(address));
+ if (error != 0) {
+ dprintk("<-- %s(): rpc_localaddr returned %d\n",
+ __func__, error);
+ goto out;
+ }
+ error = -EAFNOSUPPORT;
+ if (rpc_ntop(localaddr, buf, sizeof(buf)) == 0) {
+ dprintk("<-- %s(): rpc_ntop returned %d\n",
+ __func__, error);
+ goto out;
+ }
+ error = nfs4_clone_client(clp, sap, salen, buf, server);
+ if (error != 0) {
+ dprintk("<-- %s(): nfs4_clone_client returned %d\n",
+ __func__, error);
+ goto out;
+ }
+ if (server->nfs_client->cl_hostname == NULL)
+ server->nfs_client->cl_hostname = kstrdup(hostname, GFP_KERNEL);
+
+ dprintk("<-- %s() succeeded\n", __func__);
+
+out:
+#ifdef USE_RPC_LOCK_CLIENT
+ rpc_unlock_client(clnt);
+#endif /* USE_RPC_LOCK_CLIENT */
+ return error;
+}
+
+int nfs4_replace_transport(struct nfs_server *server)
+{
+ const size_t addr_bufsize = sizeof(struct sockaddr_storage);
+ struct sockaddr *sap;
+ size_t salen;
+ char *hostname;
+ size_t hostnamelen;
+ unsigned int i;
+ int error;
+
+ sap = kmalloc(addr_bufsize, GFP_KERNEL);
+ if (sap == NULL) {
+ error = -ENOMEM;
+ goto out;
+ }
+
+ error = -ENOENT;
+ /* Start after the current entry and search until the current entry */
+ for (i = (server->repli_current + 1) % NFS_MAX_REPLI_SERVERS;
+ i != server->repli_current;
+ i = (i + 1) % NFS_MAX_REPLI_SERVERS) {
+ if (server->repli_servers[i] == NULL)
+ continue;
+ hostname = server->repli_servers[i];
+ hostnamelen = strlen(hostname);
+ salen = nfs_parse_server_name(hostname, hostnamelen, sap,
+ addr_bufsize);
+ if (salen == 0)
+ continue;
+ rpc_set_port(sap, NFS_PORT);
+ error = nfs4_update_server(server, hostname, sap, salen);
+ if (error == 0) {
+ dprintk("%s(): updated with server: %s\n",
+ __func__, hostname);
+ server->repli_current = i;
+ break;
+ }
+ }
+
+out:
+ return error;
+}
diff --git a/include/linux/nfs_fs_sb.h b/include/linux/nfs_fs_sb.h
index 4970c58..f5fd4bb 100644
--- a/include/linux/nfs_fs_sb.h
+++ b/include/linux/nfs_fs_sb.h
@@ -160,6 +160,7 @@ struct nfs_server {
struct nfs_fh *rootfh;
#define NFS_MAX_REPLI_SERVERS 2
char *repli_servers[NFS_MAX_REPLI_SERVERS];
+ int repli_current; /* Current serving replica index */

atomic_t active; /* Keep trace of any activity to this server */

--
1.7.8.3


2012-01-30 19:30:01

by Malahal Naineni

[permalink] [raw]
Subject: [PATCH 02/13] SUNRPC: Use RCU to dereference the rpc_clnt.cl_xprt field

From: Trond Myklebust <[email protected]>

A migration event will replace the rpc_xprt used by an rpc_clnt. To
ensure this can be done safely, all references to cl_xprt must now use
a form of rcu_dereference().

Special care is taken with rpc_peeraddr2str(), which returns a pointer
to memory whose lifetime is the same as the rpc_xprt.

Signed-off-by: Trond Myklebust <[email protected]>
[ cel: fix lockdep splats and layering violations ]
Signed-off-by: Chuck Lever <[email protected]>
Signed-off-by: Malahal Naineni <[email protected]>
---
fs/nfs/callback_proc.c | 9 ++--
fs/nfs/client.c | 16 +++++--
fs/nfs/nfs4proc.c | 15 +++++--
fs/nfs/nfs4state.c | 25 ++++++++----
fs/nfs/super.c | 5 ++
include/linux/sunrpc/clnt.h | 4 +-
include/linux/sunrpc/debug.h | 11 +++++
net/sunrpc/clnt.c | 94 +++++++++++++++++++++++++++++++++++-------
net/sunrpc/rpc_pipe.c | 3 +
net/sunrpc/rpcb_clnt.c | 15 +++++--
net/sunrpc/stats.c | 6 ++-
11 files changed, 161 insertions(+), 42 deletions(-)

diff --git a/fs/nfs/callback_proc.c b/fs/nfs/callback_proc.c
index 54cea8a..60ce755 100644
--- a/fs/nfs/callback_proc.c
+++ b/fs/nfs/callback_proc.c
@@ -8,6 +8,7 @@
#include <linux/nfs4.h>
#include <linux/nfs_fs.h>
#include <linux/slab.h>
+#include <linux/rcupdate.h>
#include "nfs4_fs.h"
#include "callback.h"
#include "delegation.h"
@@ -33,7 +34,7 @@ __be32 nfs4_callback_getattr(struct cb_getattrargs *args,
res->bitmap[0] = res->bitmap[1] = 0;
res->status = htonl(NFS4ERR_BADHANDLE);

- dprintk("NFS: GETATTR callback request from %s\n",
+ dprintk_rcu("NFS: GETATTR callback request from %s\n",
rpc_peeraddr2str(cps->clp->cl_rpcclient, RPC_DISPLAY_ADDR));

inode = nfs_delegation_find_inode(cps->clp, &args->fh);
@@ -73,7 +74,7 @@ __be32 nfs4_callback_recall(struct cb_recallargs *args, void *dummy,
if (!cps->clp) /* Always set for v4.0. Set in cb_sequence for v4.1 */
goto out;

- dprintk("NFS: RECALL callback request from %s\n",
+ dprintk_rcu("NFS: RECALL callback request from %s\n",
rpc_peeraddr2str(cps->clp->cl_rpcclient, RPC_DISPLAY_ADDR));

res = htonl(NFS4ERR_BADHANDLE);
@@ -517,7 +518,7 @@ __be32 nfs4_callback_recallany(struct cb_recallanyargs *args, void *dummy,
if (!cps->clp) /* set in cb_sequence */
goto out;

- dprintk("NFS: RECALL_ANY callback request from %s\n",
+ dprintk_rcu("NFS: RECALL_ANY callback request from %s\n",
rpc_peeraddr2str(cps->clp->cl_rpcclient, RPC_DISPLAY_ADDR));

status = cpu_to_be32(NFS4ERR_INVAL);
@@ -552,7 +553,7 @@ __be32 nfs4_callback_recallslot(struct cb_recallslotargs *args, void *dummy,
if (!cps->clp) /* set in cb_sequence */
goto out;

- dprintk("NFS: CB_RECALL_SLOT request from %s target max slots %d\n",
+ dprintk_rcu("NFS: CB_RECALL_SLOT request from %s target max slots %d\n",
rpc_peeraddr2str(cps->clp->cl_rpcclient, RPC_DISPLAY_ADDR),
args->crsa_target_max_slots);

diff --git a/fs/nfs/client.c b/fs/nfs/client.c
index 31778f7..be5e702 100644
--- a/fs/nfs/client.c
+++ b/fs/nfs/client.c
@@ -1284,16 +1284,18 @@ static int nfs4_init_callback(struct nfs_client *clp)
int error;

if (clp->rpc_ops->version == 4) {
+ struct rpc_xprt *xprt;
+
+ xprt = rcu_dereference_raw(clp->cl_rpcclient->cl_xprt);
+
if (nfs4_has_session(clp)) {
- error = xprt_setup_backchannel(
- clp->cl_rpcclient->cl_xprt,
+ error = xprt_setup_backchannel(xprt,
NFS41_BC_MIN_CALLBACKS);
if (error < 0)
return error;
}

- error = nfs_callback_up(clp->cl_mvops->minor_version,
- clp->cl_rpcclient->cl_xprt);
+ error = nfs_callback_up(clp->cl_mvops->minor_version, xprt);
if (error < 0) {
dprintk("%s: failed to start callback. Error = %d\n",
__func__, error);
@@ -1675,7 +1677,7 @@ struct nfs_server *nfs4_create_referral_server(struct nfs_clone_mount *data,
data->addrlen,
parent_client->cl_ipaddr,
data->authflavor,
- parent_server->client->cl_xprt->prot,
+ rpc_protocol(parent_server->client),
parent_server->client->cl_timeout,
parent_client->cl_mvops->minor_version);
if (error < 0)
@@ -1880,12 +1882,14 @@ static int nfs_server_list_show(struct seq_file *m, void *v)
if (clp->cl_cons_state != NFS_CS_READY)
return 0;

+ rcu_read_lock();
seq_printf(m, "v%u %s %s %3d %s\n",
clp->rpc_ops->version,
rpc_peeraddr2str(clp->cl_rpcclient, RPC_DISPLAY_HEX_ADDR),
rpc_peeraddr2str(clp->cl_rpcclient, RPC_DISPLAY_HEX_PORT),
atomic_read(&clp->cl_count),
clp->cl_hostname);
+ rcu_read_unlock();

return 0;
}
@@ -1959,6 +1963,7 @@ static int nfs_volume_list_show(struct seq_file *m, void *v)
(unsigned long long) server->fsid.major,
(unsigned long long) server->fsid.minor);

+ rcu_read_lock();
seq_printf(m, "v%u %s %s %-7s %-17s %s\n",
clp->rpc_ops->version,
rpc_peeraddr2str(clp->cl_rpcclient, RPC_DISPLAY_HEX_ADDR),
@@ -1966,6 +1971,7 @@ static int nfs_volume_list_show(struct seq_file *m, void *v)
dev,
fsid,
nfs_server_fscache_state(server));
+ rcu_read_unlock();

return 0;
}
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index f0c849c..6ceae67 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -3789,6 +3789,7 @@ int nfs4_proc_setclientid(struct nfs_client *clp, u32 program,
*p = htonl((u32)clp->cl_boot_time.tv_nsec);

for(;;) {
+ rcu_read_lock();
setclientid.sc_name_len = scnprintf(setclientid.sc_name,
sizeof(setclientid.sc_name), "%s/%s %s %s %u",
clp->cl_ipaddr,
@@ -3805,6 +3806,7 @@ int nfs4_proc_setclientid(struct nfs_client *clp, u32 program,
setclientid.sc_uaddr_len = scnprintf(setclientid.sc_uaddr,
sizeof(setclientid.sc_uaddr), "%s.%u.%u",
clp->cl_ipaddr, port >> 8, port & 255);
+ rcu_read_unlock();

status = rpc_call_sync(clp->cl_rpcclient, &msg, RPC_TASK_TIMEOUT);
if (status != -NFS4ERR_CLID_INUSE)
@@ -5152,11 +5154,16 @@ struct nfs4_session *nfs4_alloc_session(struct nfs_client *clp)

void nfs4_destroy_session(struct nfs4_session *session)
{
+ struct rpc_xprt *xprt;
+
nfs4_proc_destroy_session(session);
+
+ rcu_read_lock();
+ xprt = rcu_dereference(session->clp->cl_rpcclient->cl_xprt);
+ rcu_read_unlock();
dprintk("%s Destroy backchannel for xprt %p\n",
- __func__, session->clp->cl_rpcclient->cl_xprt);
- xprt_destroy_backchannel(session->clp->cl_rpcclient->cl_xprt,
- NFS41_BC_MIN_CALLBACKS);
+ __func__, xprt);
+ xprt_destroy_backchannel(xprt, NFS41_BC_MIN_CALLBACKS);
nfs4_destroy_slot_tables(session);
kfree(session);
}
@@ -5184,7 +5191,7 @@ static void nfs4_init_channel_attrs(struct nfs41_create_session_args *args)
args->fc_attrs.max_rqst_sz = mxrqst_sz;
args->fc_attrs.max_resp_sz = mxresp_sz;
args->fc_attrs.max_ops = NFS4_MAX_OPS;
- args->fc_attrs.max_reqs = session->clp->cl_rpcclient->cl_xprt->max_reqs;
+ args->fc_attrs.max_reqs = rpc_max_reqs(session->clp->cl_rpcclient);

dprintk("%s: Fore Channel : max_rqst_sz=%u max_resp_sz=%u "
"max_ops=%u max_reqs=%u\n",
diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
index a53f33b..c97bbc7 100644
--- a/fs/nfs/nfs4state.c
+++ b/fs/nfs/nfs4state.c
@@ -1072,19 +1072,28 @@ static void nfs4_clear_state_manager_bit(struct nfs_client *clp)
void nfs4_schedule_state_manager(struct nfs_client *clp)
{
struct task_struct *task;
+ char buf[INET6_ADDRSTRLEN + sizeof("-manager") + 1];

if (test_and_set_bit(NFS4CLNT_MANAGER_RUNNING, &clp->cl_state) != 0)
return;
__module_get(THIS_MODULE);
atomic_inc(&clp->cl_count);
- task = kthread_run(nfs4_run_state_manager, clp, "%s-manager",
- rpc_peeraddr2str(clp->cl_rpcclient,
- RPC_DISPLAY_ADDR));
- if (!IS_ERR(task))
- return;
- nfs4_clear_state_manager_bit(clp);
- nfs_put_client(clp);
- module_put(THIS_MODULE);
+
+ /* The rcu_read_lock() is not strictly necessary, as the state
+ * manager is the only thread that ever changes the rpc_xprt
+ * after it's initialized. At this point, we're single threaded. */
+ rcu_read_lock();
+ snprintf(buf, sizeof(buf), "%s-manager",
+ rpc_peeraddr2str(clp->cl_rpcclient, RPC_DISPLAY_ADDR));
+ rcu_read_unlock();
+ task = kthread_run(nfs4_run_state_manager, clp, buf);
+ if (IS_ERR(task)) {
+ printk(KERN_ERR "%s: kthread_run: %ld\n",
+ __func__, PTR_ERR(task));
+ nfs4_clear_state_manager_bit(clp);
+ nfs_put_client(clp);
+ module_put(THIS_MODULE);
+ }
}

/*
diff --git a/fs/nfs/super.c b/fs/nfs/super.c
index 3dfa4f1..f2e7d7c 100644
--- a/fs/nfs/super.c
+++ b/fs/nfs/super.c
@@ -52,6 +52,7 @@
#include <linux/nfs_xdr.h>
#include <linux/magic.h>
#include <linux/parser.h>
+#include <linux/rcupdate.h>

#include <asm/system.h>
#include <asm/uaccess.h>
@@ -676,8 +677,10 @@ static void nfs_show_mount_options(struct seq_file *m, struct nfs_server *nfss,
else
seq_puts(m, nfs_infop->nostr);
}
+ rcu_read_lock();
seq_printf(m, ",proto=%s",
rpc_peeraddr2str(nfss->client, RPC_DISPLAY_NETID));
+ rcu_read_unlock();
if (version == 4) {
if (nfss->port != NFS_PORT)
seq_printf(m, ",port=%u", nfss->port);
@@ -726,9 +729,11 @@ static int nfs_show_options(struct seq_file *m, struct dentry *root)

nfs_show_mount_options(m, nfss, 0);

+ rcu_read_lock();
seq_printf(m, ",addr=%s",
rpc_peeraddr2str(nfss->nfs_client->cl_rpcclient,
RPC_DISPLAY_ADDR));
+ rcu_read_unlock();

return 0;
}
diff --git a/include/linux/sunrpc/clnt.h b/include/linux/sunrpc/clnt.h
index c85696e..a0a384f 100644
--- a/include/linux/sunrpc/clnt.h
+++ b/include/linux/sunrpc/clnt.h
@@ -37,7 +37,7 @@ struct rpc_clnt {
struct list_head cl_clients; /* Global list of clients */
struct list_head cl_tasks; /* List of tasks */
spinlock_t cl_lock; /* spinlock */
- struct rpc_xprt * cl_xprt; /* transport */
+ struct rpc_xprt __rcu * cl_xprt; /* transport */
struct rpc_procinfo * cl_procinfo; /* procedure info */
u32 cl_prog, /* RPC program number */
cl_vers, /* RPC version number */
@@ -167,6 +167,8 @@ struct rpc_task *rpc_call_null(struct rpc_clnt *clnt, struct rpc_cred *cred,
int rpc_restart_call_prepare(struct rpc_task *);
int rpc_restart_call(struct rpc_task *);
void rpc_setbufsize(struct rpc_clnt *, unsigned int, unsigned int);
+int rpc_protocol(struct rpc_clnt *);
+unsigned int rpc_max_reqs(struct rpc_clnt *);
size_t rpc_max_payload(struct rpc_clnt *);
void rpc_force_rebind(struct rpc_clnt *);
size_t rpc_peeraddr(struct rpc_clnt *, struct sockaddr *, size_t);
diff --git a/include/linux/sunrpc/debug.h b/include/linux/sunrpc/debug.h
index c2786f2..28136fd 100644
--- a/include/linux/sunrpc/debug.h
+++ b/include/linux/sunrpc/debug.h
@@ -47,15 +47,26 @@ extern unsigned int nlm_debug;
#endif

#define dprintk(args...) dfprintk(FACILITY, ## args)
+#define dprintk_rcu(args...) dfprintk_rcu(FACILITY, ## args)

#undef ifdebug
#ifdef RPC_DEBUG
# define ifdebug(fac) if (unlikely(rpc_debug & RPCDBG_##fac))
# define dfprintk(fac, args...) do { ifdebug(fac) printk(args); } while(0)
+
+# define dfprintk_rcu(fac, args...) \
+ do { \
+ ifdebug(fac) { \
+ rcu_read_lock(); \
+ printk(args); \
+ rcu_read_unlock(); \
+ } \
+ } while (0)
# define RPC_IFDEBUG(x) x
#else
# define ifdebug(fac) if (0)
# define dfprintk(fac, args...) do ; while (0)
+# define dfprintk_rcu(fac, args...) do ; while (0)
# define RPC_IFDEBUG(x)
#endif

diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
index b6a7817..e67eba3 100644
--- a/net/sunrpc/clnt.c
+++ b/net/sunrpc/clnt.c
@@ -31,6 +31,7 @@
#include <linux/in.h>
#include <linux/in6.h>
#include <linux/un.h>
+#include <linux/rcupdate.h>

#include <linux/sunrpc/clnt.h>
#include <linux/sunrpc/rpc_pipe_fs.h>
@@ -189,7 +190,7 @@ static struct rpc_clnt * rpc_new_client(const struct rpc_create_args *args, stru
}
strlcpy(clnt->cl_server, args->servername, len);

- clnt->cl_xprt = xprt;
+ rcu_assign_pointer(clnt->cl_xprt, xprt);
clnt->cl_procinfo = version->procs;
clnt->cl_maxproc = version->nrprocs;
clnt->cl_protname = program->name;
@@ -204,7 +205,7 @@ static struct rpc_clnt * rpc_new_client(const struct rpc_create_args *args, stru
INIT_LIST_HEAD(&clnt->cl_tasks);
spin_lock_init(&clnt->cl_lock);

- if (!xprt_bound(clnt->cl_xprt))
+ if (!xprt_bound(xprt))
clnt->cl_autobind = 1;

clnt->cl_timeout = xprt->timeout;
@@ -376,6 +377,7 @@ struct rpc_clnt *
rpc_clone_client(struct rpc_clnt *clnt)
{
struct rpc_clnt *new;
+ struct rpc_xprt *xprt;
int err = -ENOMEM;

new = kmemdup(clnt, sizeof(*new), GFP_KERNEL);
@@ -395,6 +397,12 @@ rpc_clone_client(struct rpc_clnt *clnt)
if (new->cl_principal == NULL)
goto out_no_principal;
}
+ rcu_read_lock();
+ xprt = xprt_get(rcu_dereference(clnt->cl_xprt));
+ rcu_read_unlock();
+ if (xprt == NULL)
+ goto out_no_transport;
+ rcu_assign_pointer(new->cl_xprt, xprt);
atomic_set(&new->cl_count, 1);
atomic_set(&new->cl_active_tasks, 0);
rpc_init_wait_queue(&new->cl_waitqueue, "client waitqueue");
@@ -403,12 +411,13 @@ rpc_clone_client(struct rpc_clnt *clnt)
goto out_no_path;
if (new->cl_auth)
atomic_inc(&new->cl_auth->au_count);
- xprt_get(clnt->cl_xprt);
atomic_inc(&clnt->cl_count);
rpc_register_client(new);
rpciod_up();
return new;
out_no_path:
+ xprt_put(xprt);
+out_no_transport:
kfree(new->cl_principal);
out_no_principal:
rpc_free_iostats(new->cl_metrics);
@@ -493,7 +502,7 @@ out_free:
rpc_free_iostats(clnt->cl_metrics);
kfree(clnt->cl_principal);
clnt->cl_metrics = NULL;
- xprt_put(clnt->cl_xprt);
+ xprt_put(rcu_dereference_raw(clnt->cl_xprt));
rpciod_down();
kfree(clnt);
}
@@ -850,13 +859,18 @@ EXPORT_SYMBOL_GPL(rpc_call_start);
size_t rpc_peeraddr(struct rpc_clnt *clnt, struct sockaddr *buf, size_t bufsize)
{
size_t bytes;
- struct rpc_xprt *xprt = clnt->cl_xprt;
+ struct rpc_xprt *xprt;
+
+ rcu_read_lock();
+ xprt = rcu_dereference(clnt->cl_xprt);

- bytes = sizeof(xprt->addr);
+ bytes = xprt->addrlen;
if (bytes > bufsize)
bytes = bufsize;
- memcpy(buf, &clnt->cl_xprt->addr, bytes);
- return xprt->addrlen;
+ memcpy(buf, &xprt->addr, bytes);
+ rcu_read_unlock();
+
+ return bytes;
}
EXPORT_SYMBOL_GPL(rpc_peeraddr);

@@ -865,11 +879,16 @@ EXPORT_SYMBOL_GPL(rpc_peeraddr);
* @clnt: RPC client structure
* @format: address format
*
+ * NB: the lifetime of the memory referenced by the returned pointer is
+ * the same as the rpc_xprt itself. As long as the caller uses this
+ * pointer, it must hold the RCU read lock.
*/
const char *rpc_peeraddr2str(struct rpc_clnt *clnt,
enum rpc_display_format_t format)
{
- struct rpc_xprt *xprt = clnt->cl_xprt;
+ struct rpc_xprt *xprt;
+
+ xprt = rcu_dereference(clnt->cl_xprt);

if (xprt->address_strings[format] != NULL)
return xprt->address_strings[format];
@@ -881,14 +900,51 @@ EXPORT_SYMBOL_GPL(rpc_peeraddr2str);
void
rpc_setbufsize(struct rpc_clnt *clnt, unsigned int sndsize, unsigned int rcvsize)
{
- struct rpc_xprt *xprt = clnt->cl_xprt;
+ struct rpc_xprt *xprt;
+
+ rcu_read_lock();
+ xprt = rcu_dereference(clnt->cl_xprt);
if (xprt->ops->set_buffer_size)
xprt->ops->set_buffer_size(xprt, sndsize, rcvsize);
+ rcu_read_unlock();
}
EXPORT_SYMBOL_GPL(rpc_setbufsize);

-/*
- * Return size of largest payload RPC client can support, in bytes
+/**
+ * rpc_protocol - Get transport protocol number for an RPC client
+ * @clnt: RPC client to query
+ *
+ */
+int rpc_protocol(struct rpc_clnt *clnt)
+{
+ int protocol;
+
+ rcu_read_lock();
+ protocol = rcu_dereference(clnt->cl_xprt)->prot;
+ rcu_read_unlock();
+ return protocol;
+}
+EXPORT_SYMBOL_GPL(rpc_protocol);
+
+/**
+ * rpc_max_reqs - Get the maximum number of outstanding requests
+ * @clnt: RPC client to query
+ *
+ */
+unsigned int rpc_max_reqs(struct rpc_clnt *clnt)
+{
+ unsigned int max_reqs;
+
+ rcu_read_lock();
+ max_reqs = rcu_dereference(clnt->cl_xprt)->max_reqs;
+ rcu_read_unlock();
+ return max_reqs;
+}
+EXPORT_SYMBOL_GPL(rpc_max_reqs);
+
+/**
+ * rpc_max_payload - Get maximum payload size for a transport, in bytes
+ * @clnt: RPC client to query
*
* For stream transports, this is one RPC record fragment (see RFC
* 1831), as we don't support multi-record requests yet. For datagram
@@ -897,7 +953,12 @@ EXPORT_SYMBOL_GPL(rpc_setbufsize);
*/
size_t rpc_max_payload(struct rpc_clnt *clnt)
{
- return clnt->cl_xprt->max_payload;
+ size_t ret;
+
+ rcu_read_lock();
+ ret = rcu_dereference(clnt->cl_xprt)->max_payload;
+ rcu_read_unlock();
+ return ret;
}
EXPORT_SYMBOL_GPL(rpc_max_payload);

@@ -908,8 +969,11 @@ EXPORT_SYMBOL_GPL(rpc_max_payload);
*/
void rpc_force_rebind(struct rpc_clnt *clnt)
{
- if (clnt->cl_autobind)
- xprt_clear_bound(clnt->cl_xprt);
+ if (clnt->cl_autobind) {
+ rcu_read_lock();
+ xprt_clear_bound(rcu_dereference(clnt->cl_xprt));
+ rcu_read_unlock();
+ }
}
EXPORT_SYMBOL_GPL(rpc_force_rebind);

diff --git a/net/sunrpc/rpc_pipe.c b/net/sunrpc/rpc_pipe.c
index 63a7a7a..e8d212d 100644
--- a/net/sunrpc/rpc_pipe.c
+++ b/net/sunrpc/rpc_pipe.c
@@ -16,6 +16,7 @@
#include <linux/namei.h>
#include <linux/fsnotify.h>
#include <linux/kernel.h>
+#include <linux/rcupdate.h>

#include <asm/ioctls.h>
#include <linux/fs.h>
@@ -378,12 +379,14 @@ rpc_show_info(struct seq_file *m, void *v)
{
struct rpc_clnt *clnt = m->private;

+ rcu_read_lock();
seq_printf(m, "RPC server: %s\n", clnt->cl_server);
seq_printf(m, "service: %s (%d) version %d\n", clnt->cl_protname,
clnt->cl_prog, clnt->cl_vers);
seq_printf(m, "address: %s\n", rpc_peeraddr2str(clnt, RPC_DISPLAY_ADDR));
seq_printf(m, "protocol: %s\n", rpc_peeraddr2str(clnt, RPC_DISPLAY_PROTO));
seq_printf(m, "port: %s\n", rpc_peeraddr2str(clnt, RPC_DISPLAY_PORT));
+ rcu_read_unlock();
return 0;
}

diff --git a/net/sunrpc/rpcb_clnt.c b/net/sunrpc/rpcb_clnt.c
index 8761bf8..848fe90 100644
--- a/net/sunrpc/rpcb_clnt.c
+++ b/net/sunrpc/rpcb_clnt.c
@@ -611,9 +611,10 @@ static struct rpc_task *rpcb_call_async(struct rpc_clnt *rpcb_clnt, struct rpcbi
static struct rpc_clnt *rpcb_find_transport_owner(struct rpc_clnt *clnt)
{
struct rpc_clnt *parent = clnt->cl_parent;
+ struct rpc_xprt *xprt = rcu_dereference(clnt->cl_xprt);

while (parent != clnt) {
- if (parent->cl_xprt != clnt->cl_xprt)
+ if (rcu_dereference(parent->cl_xprt) != xprt)
break;
if (clnt->cl_autobind)
break;
@@ -644,8 +645,12 @@ void rpcb_getport_async(struct rpc_task *task)
size_t salen;
int status;

- clnt = rpcb_find_transport_owner(task->tk_client);
- xprt = clnt->cl_xprt;
+ rcu_read_lock();
+ do {
+ clnt = rpcb_find_transport_owner(task->tk_client);
+ xprt = xprt_get(rcu_dereference(clnt->cl_xprt));
+ } while (xprt == NULL);
+ rcu_read_unlock();

dprintk("RPC: %5u %s(%s, %u, %u, %d)\n",
task->tk_pid, __func__,
@@ -658,6 +663,7 @@ void rpcb_getport_async(struct rpc_task *task)
if (xprt_test_and_set_binding(xprt)) {
dprintk("RPC: %5u %s: waiting for another binder\n",
task->tk_pid, __func__);
+ xprt_put(xprt);
return;
}

@@ -725,7 +731,7 @@ void rpcb_getport_async(struct rpc_task *task)
switch (bind_version) {
case RPCBVERS_4:
case RPCBVERS_3:
- map->r_netid = rpc_peeraddr2str(clnt, RPC_DISPLAY_NETID);
+ map->r_netid = xprt->address_strings[RPC_DISPLAY_NETID];
map->r_addr = rpc_sockaddr2uaddr(sap, GFP_ATOMIC);
map->r_owner = "";
break;
@@ -754,6 +760,7 @@ bailout_release_client:
bailout_nofree:
rpcb_wake_rpcbind_waiters(xprt, status);
task->tk_status = status;
+ xprt_put(xprt);
}
EXPORT_SYMBOL_GPL(rpcb_getport_async);

diff --git a/net/sunrpc/stats.c b/net/sunrpc/stats.c
index 80df89d..4084255 100644
--- a/net/sunrpc/stats.c
+++ b/net/sunrpc/stats.c
@@ -22,6 +22,7 @@
#include <linux/sunrpc/clnt.h>
#include <linux/sunrpc/svcsock.h>
#include <linux/sunrpc/metrics.h>
+#include <linux/rcupdate.h>

#include "netns.h"

@@ -179,7 +180,7 @@ static void _print_name(struct seq_file *seq, unsigned int op,
void rpc_print_iostats(struct seq_file *seq, struct rpc_clnt *clnt)
{
struct rpc_iostats *stats = clnt->cl_metrics;
- struct rpc_xprt *xprt = clnt->cl_xprt;
+ struct rpc_xprt *xprt;
unsigned int op, maxproc = clnt->cl_maxproc;

if (!stats)
@@ -189,8 +190,11 @@ void rpc_print_iostats(struct seq_file *seq, struct rpc_clnt *clnt)
seq_printf(seq, "p/v: %u/%u (%s)\n",
clnt->cl_prog, clnt->cl_vers, clnt->cl_protname);

+ rcu_read_lock();
+ xprt = rcu_dereference(clnt->cl_xprt);
if (xprt)
xprt->ops->print_stats(xprt, seq);
+ rcu_read_unlock();

seq_printf(seq, "\tper-op statistics\n");
for (op = 0; op < maxproc; op++) {
--
1.7.8.3


2012-01-30 19:30:46

by Malahal Naineni

[permalink] [raw]
Subject: [PATCH 09/13] NFS: Add replica servers to volumes proc file.

Signed-off-by: Malahal Naineni <[email protected]>
---
fs/nfs/client.c | 16 +++++++++++++---
1 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/fs/nfs/client.c b/fs/nfs/client.c
index 54de25a..000ebdb 100644
--- a/fs/nfs/client.c
+++ b/fs/nfs/client.c
@@ -1997,10 +1997,12 @@ static int nfs_volume_list_show(struct seq_file *m, void *v)
struct nfs_server *server;
struct nfs_client *clp;
char dev[8], fsid[17];
+ char *p, *end, replicas[256];
+ int i;

/* display header on line 1 */
if (v == &nfs_volume_list) {
- seq_puts(m, "NV SERVER PORT DEV FSID FSC\n");
+ seq_puts(m, "NV SERVER PORT DEV FSID FSC REPLICAS\n");
return 0;
}
/* display one transport per line on subsequent lines */
@@ -2014,14 +2016,22 @@ static int nfs_volume_list_show(struct seq_file *m, void *v)
(unsigned long long) server->fsid.major,
(unsigned long long) server->fsid.minor);

+ p = replicas;
+ end = replicas + sizeof(replicas);
+ strncpy(replicas, "none", sizeof(replicas));
+ for (i = 0; i < NFS_MAX_REPLI_SERVERS && p < end; i++)
+ if (server->repli_servers[i])
+ p += snprintf(p, end - p, "%s/", server->repli_servers[i]);
+
rcu_read_lock();
- seq_printf(m, "v%u %s %s %-7s %-17s %s\n",
+ seq_printf(m, "v%u %s %s %-7s %-17s %s %s\n",
clp->rpc_ops->version,
rpc_peeraddr2str(clp->cl_rpcclient, RPC_DISPLAY_HEX_ADDR),
rpc_peeraddr2str(clp->cl_rpcclient, RPC_DISPLAY_HEX_PORT),
dev,
fsid,
- nfs_server_fscache_state(server));
+ nfs_server_fscache_state(server),
+ replicas);
rcu_read_unlock();

return 0;
--
1.7.8.3


2012-01-30 19:30:03

by Malahal Naineni

[permalink] [raw]
Subject: [PATCH 04/13] SUNRPC: Add a helper to switch the transport of the rpc_client

From: Trond Myklebust <[email protected]>

Signed-off-by: Trond Myklebust <[email protected]>
[ cel: fix whitespace ]
Signed-off-by: Chuck Lever <[email protected]>
Signed-off-by: Malahal Naineni <[email protected]>
---
include/linux/sunrpc/clnt.h | 3 ++
net/sunrpc/clnt.c | 76 +++++++++++++++++++++++++++++++++++++++---
2 files changed, 73 insertions(+), 6 deletions(-)

diff --git a/include/linux/sunrpc/clnt.h b/include/linux/sunrpc/clnt.h
index 0adc955..a786143 100644
--- a/include/linux/sunrpc/clnt.h
+++ b/include/linux/sunrpc/clnt.h
@@ -145,6 +145,9 @@ void rpc_task_release_client(struct rpc_task *);

int rpc_lock_client(struct rpc_clnt *clnt, unsigned long timeout);
void rpc_unlock_client(struct rpc_clnt *clnt);
+int rpc_switch_client_transport(struct rpc_clnt *,
+ struct xprt_create *,
+ const struct rpc_timeout *);

int rpcb_create_local(void);
void rpcb_put_local(void);
diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
index 26c1a2f..c5cc8ac 100644
--- a/net/sunrpc/clnt.c
+++ b/net/sunrpc/clnt.c
@@ -26,6 +26,7 @@
#include <linux/namei.h>
#include <linux/mount.h>
#include <linux/slab.h>
+#include <linux/rcupdate.h>
#include <linux/utsname.h>
#include <linux/workqueue.h>
#include <linux/in.h>
@@ -142,12 +143,35 @@ err:
return error;
}

+static void rpc_set_client_transport(struct rpc_clnt *clnt,
+ struct rpc_xprt *xprt,
+ const struct rpc_timeout *timeout)
+{
+ struct rpc_xprt *old;
+
+ spin_lock(&clnt->cl_lock);
+ old = clnt->cl_xprt;
+
+ if (!xprt_bound(xprt))
+ clnt->cl_autobind = 1;
+
+ clnt->cl_timeout = timeout;
+ rcu_assign_pointer(clnt->cl_xprt, xprt);
+ spin_unlock(&clnt->cl_lock);
+
+ if (old != NULL) {
+ synchronize_rcu();
+ xprt_put(old);
+ }
+}
+
static struct rpc_clnt * rpc_new_client(const struct rpc_create_args *args, struct rpc_xprt *xprt)
{
struct rpc_program *program = args->program;
struct rpc_version *version;
struct rpc_clnt *clnt = NULL;
struct rpc_auth *auth;
+ const struct rpc_timeout *timeout;
int err;

/* sanity check the name before trying to print it */
@@ -173,7 +197,6 @@ static struct rpc_clnt * rpc_new_client(const struct rpc_create_args *args, stru
goto out_err;
clnt->cl_parent = clnt;

- rcu_assign_pointer(clnt->cl_xprt, xprt);
clnt->cl_procinfo = version->procs;
clnt->cl_maxproc = version->nrprocs;
clnt->cl_protname = program->name;
@@ -188,16 +211,15 @@ static struct rpc_clnt * rpc_new_client(const struct rpc_create_args *args, stru
INIT_LIST_HEAD(&clnt->cl_tasks);
spin_lock_init(&clnt->cl_lock);

- if (!xprt_bound(xprt))
- clnt->cl_autobind = 1;
-
- clnt->cl_timeout = xprt->timeout;
+ timeout = xprt->timeout;
if (args->timeout != NULL) {
memcpy(&clnt->cl_timeout_default, args->timeout,
sizeof(clnt->cl_timeout_default));
- clnt->cl_timeout = &clnt->cl_timeout_default;
+ timeout = &clnt->cl_timeout_default;
}

+ rpc_set_client_transport(clnt, xprt, timeout);
+
clnt->cl_rtt = &clnt->cl_rtt_default;
rpc_init_rtt(&clnt->cl_rtt_default, clnt->cl_timeout->to_initval);
clnt->cl_principal = NULL;
@@ -411,6 +433,48 @@ out_no_clnt:
}
EXPORT_SYMBOL_GPL(rpc_clone_client);

+/**
+ * rpc_switch_client_transport: switch the RPC transport on the fly
+ * @clnt: pointer to a struct rpc_clnt
+ * @args: pointer to the new transport arguments
+ * @timeout: pointer to the new timeout parameters
+ *
+ * This function allows the caller to switch the RPC transport for the
+ * rpc_clnt structure 'clnt' to allow it to connect to a mirrored NFS server,
+ * for instance. It assumes that the caller has ensured that there are no
+ * active tasks by using some form of locking.
+ */
+int rpc_switch_client_transport(struct rpc_clnt *clnt,
+ struct xprt_create *args,
+ const struct rpc_timeout *timeout)
+{
+ struct rpc_xprt *xprt;
+ struct rpc_auth *auth;
+ rpc_authflavor_t pseudoflavor;
+
+ xprt = xprt_create_transport(args);
+ if (IS_ERR(xprt))
+ return PTR_ERR(xprt);
+
+ pseudoflavor = clnt->cl_auth->au_flavor;
+
+ rpc_set_client_transport(clnt, xprt, timeout);
+
+ /*
+ * Note: we must always create a new rpc_auth cache
+ * when switching to a different server! RPCSEC_GSS sessions
+ * in particular are between a single client and server,
+ * so we cannot reuse the sessions in the cache when we switch
+ * servers.
+ */
+ auth = rpcauth_create(pseudoflavor, clnt);
+ if (IS_ERR(auth))
+ return PTR_ERR(auth);
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(rpc_switch_client_transport);
+
/*
* Kill all tasks for the given client.
* XXX: kill their descendants as well?
--
1.7.8.3


2012-01-30 19:30:47

by Malahal Naineni

[permalink] [raw]
Subject: [PATCH 05/13] SUNRPC: Add API to acquire source address

From: Chuck Lever <[email protected]>

NFSv4.0 clients must send endpoint information for their callback
service to NFSv4.0 servers during their first contact with a server.
Traditionally, user space provides the callback endpoint IP address
via the "clientaddr=" mount option.

During an NFSv4 migration event, it is possible that an FSID may be
migrated to a destination server that is accessible via a different
client-side NIC than the source server was. The client must update
callback endpoint information on the destination server so that it can
maintain leases and allow delegation.

Without a new "clientaddr=" option from user space, however, the
kernel itself must construct an appropriate IP address for the
callback update. Provide an API in the RPC client for upper layer
RPC consumers to acquire a source address for a remote.

The mechanism used by the mount.nfs command is copied: set up a
connected UDP socket to the designated remote, then scrape the source
address off the socket. We are careful to select the correct network
namespace when setting up the temporary UDP socket.

Signed-off-by: Chuck Lever <[email protected]>
Signed-off-by: Malahal Naineni <[email protected]>
---
include/linux/sunrpc/clnt.h | 1 +
net/sunrpc/clnt.c | 149 +++++++++++++++++++++++++++++++++++++++++++
2 files changed, 150 insertions(+), 0 deletions(-)

diff --git a/include/linux/sunrpc/clnt.h b/include/linux/sunrpc/clnt.h
index a786143..deffdd5 100644
--- a/include/linux/sunrpc/clnt.h
+++ b/include/linux/sunrpc/clnt.h
@@ -175,6 +175,7 @@ size_t rpc_max_payload(struct rpc_clnt *);
void rpc_force_rebind(struct rpc_clnt *);
size_t rpc_peeraddr(struct rpc_clnt *, struct sockaddr *, size_t);
const char *rpc_peeraddr2str(struct rpc_clnt *, enum rpc_display_format_t);
+int rpc_localaddr(struct rpc_clnt *, struct sockaddr *, size_t);

size_t rpc_ntop(const struct sockaddr *, char *, const size_t);
size_t rpc_pton(const char *, const size_t,
diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
index c5cc8ac..e9e8097 100644
--- a/net/sunrpc/clnt.c
+++ b/net/sunrpc/clnt.c
@@ -943,6 +943,155 @@ const char *rpc_peeraddr2str(struct rpc_clnt *clnt,
}
EXPORT_SYMBOL_GPL(rpc_peeraddr2str);

+static const struct sockaddr_in rpc_inaddr_loopback = {
+ .sin_family = AF_INET,
+ .sin_addr.s_addr = htonl(INADDR_ANY),
+};
+
+static const struct sockaddr_in6 rpc_in6addr_loopback = {
+ .sin6_family = AF_INET6,
+ .sin6_addr = IN6ADDR_ANY_INIT,
+};
+
+/*
+ * Try a getsockname() on a connected datagram socket. Using a
+ * connected datagram socket prevents leaving a socket in TIME_WAIT.
+ * This conserves the ephemeral port number space.
+ *
+ * Returns zero and fills in "buf" if successful; otherwise, a
+ * negative errno is returned.
+ */
+static int rpc_sockname(struct net *net, struct sockaddr *sap, size_t salen,
+ struct sockaddr *buf, int buflen)
+{
+ struct socket *sock;
+ int err;
+
+ err = __sock_create(net, sap->sa_family,
+ SOCK_DGRAM, IPPROTO_UDP, &sock, 1);
+ if (err < 0) {
+ dprintk("RPC: can't create UDP socket (%d)\n", err);
+ goto out;
+ }
+
+ switch (sap->sa_family) {
+ case AF_INET:
+ err = kernel_bind(sock,
+ (struct sockaddr *)&rpc_inaddr_loopback,
+ sizeof(rpc_inaddr_loopback));
+ break;
+ case AF_INET6:
+ err = kernel_bind(sock,
+ (struct sockaddr *)&rpc_in6addr_loopback,
+ sizeof(rpc_in6addr_loopback));
+ break;
+ default:
+ err = -EAFNOSUPPORT;
+ goto out;
+ }
+ if (err < 0) {
+ dprintk("RPC: can't bind UDP socket (%d)\n", err);
+ goto out_release;
+ }
+
+ err = kernel_connect(sock, sap, salen, 0);
+ if (err < 0) {
+ dprintk("RPC: can't connect UDP socket (%d)\n", err);
+ goto out_release;
+ }
+
+ err = kernel_getsockname(sock, buf, &buflen);
+ if (err < 0) {
+ dprintk("RPC: getsockname failed (%d)\n", err);
+ goto out_release;
+ }
+
+ err = 0;
+ if (buf->sa_family == AF_INET6) {
+ struct sockaddr_in6 *sin6 = (struct sockaddr_in6 *)buf;
+ sin6->sin6_scope_id = 0;
+ }
+ dprintk("RPC: %s succeeded\n", __func__);
+
+out_release:
+ sock_release(sock);
+out:
+ return err;
+}
+
+/*
+ * Scraping a connected socket failed, so we don't have a useable
+ * local address. Fallback: generate an address that will prevent
+ * the server from calling us back.
+ *
+ * Returns zero and fills in "buf" if successful; otherwise, a
+ * negative errno is returned.
+ */
+static int rpc_anyaddr(int family, struct sockaddr *buf, size_t buflen)
+{
+ switch (family) {
+ case AF_INET:
+ if (buflen < sizeof(rpc_inaddr_loopback))
+ return -EINVAL;
+ memcpy(buf, &rpc_inaddr_loopback,
+ sizeof(rpc_inaddr_loopback));
+ break;
+ case AF_INET6:
+ if (buflen < sizeof(rpc_in6addr_loopback))
+ return -EINVAL;
+ memcpy(buf, &rpc_in6addr_loopback,
+ sizeof(rpc_in6addr_loopback));
+ default:
+ dprintk("RPC: %s: address family not supported\n",
+ __func__);
+ return -EAFNOSUPPORT;
+ }
+ dprintk("RPC: %s: succeeded\n", __func__);
+ return 0;
+}
+
+/**
+ * rpc_localaddr - discover local endpoint address for an RPC client
+ * @clnt: RPC client structure
+ * @buf: target buffer
+ * @buflen: size of target buffer, in bytes
+ *
+ * Returns zero and fills in "buf" and "buflen" if successful;
+ * otherwise, a negative errno is returned.
+ *
+ * This works even if the underlying transport is not currently connected,
+ * or if the upper layer never previously provided a source address.
+ *
+ * The result of this function call is transient: multiple calls in
+ * succession may give different results, depending on how local
+ * networking configuration changes over time.
+ */
+int rpc_localaddr(struct rpc_clnt *clnt, struct sockaddr *buf, size_t buflen)
+{
+ struct sockaddr_storage address;
+ struct sockaddr *sap = (struct sockaddr *)&address;
+ struct rpc_xprt *xprt;
+ struct net *net;
+ size_t salen;
+ int err;
+
+ rcu_read_lock();
+ xprt = rcu_dereference(clnt->cl_xprt);
+ salen = xprt->addrlen;
+ memcpy(sap, &xprt->addr, salen);
+ net = get_net(xprt->xprt_net);
+ rcu_read_unlock();
+
+ rpc_set_port(sap, 0);
+ err = rpc_sockname(net, sap, salen, buf, buflen);
+ put_net(net);
+ if (err != 0)
+ /* Couldn't discover local address, return ANYADDR */
+ return rpc_anyaddr(sap->sa_family, buf, buflen);
+ return 0;
+}
+EXPORT_SYMBOL_GPL(rpc_localaddr);
+
void
rpc_setbufsize(struct rpc_clnt *clnt, unsigned int sndsize, unsigned int rcvsize)
{
--
1.7.8.3