LinuxLists.cc - [PATCH 0/2] Stable patches for NFSv4/4.1 trunking

2013-04-05 21:16:32

Subject: [PATCH 0/2] Stable patches for NFSv4/4.1 trunking

The following two sets of issues for the stable kernel were found when
debugging the rpcsec_gss patches.
Thanks to Bryan for helping with testing.

Trond Myklebust (2):
NFSv4: Fix a memory leak in nfs4_discover_server_trunking
NFSv4/4.1: Fix bugs in nfs4[01]_walk_client_list

fs/nfs/nfs4client.c | 44 ++++++++++++++++++++++++++++----------------
fs/nfs/nfs4state.c | 8 +++++++-
2 files changed, 35 insertions(+), 17 deletions(-)

--
1.8.1.4

2013-04-08 13:02:58

by William Dauchy

[permalink] [raw]

Subject: Re: [PATCH 0/2] Stable patches for NFSv4/4.1 trunking

On Mon, Apr 8, 2013 at 2:06 PM, Jim Rees <[email protected]> wrote:
> Every patch, no matter how "obviously" correct, carries a risk of
> introducing new bugs. Trond is responsible for weighing the risk of new bugs
> against the benefit of fixing old ones. He has a lot of experience doing
> this. If he makes a mistake, he takes the heat for your bugs. Introducing a
> new bug into stable is a lot worse than introducing one into -next.

agreed. I understand the point of view.

--
William

2013-04-06 13:11:27

by William Dauchy

[permalink] [raw]

Subject: Re: [PATCH 0/2] Stable patches for NFSv4/4.1 trunking

Hi Trond,

On Fri, Apr 5, 2013 at 11:16 PM, Trond Myklebust
<[email protected]> wrote:
> The following two sets of issues for the stable kernel were found when
> debugging the rpcsec_gss patches.
> Thanks to Bryan for helping with testing.
>
> Trond Myklebust (2):
> NFSv4: Fix a memory leak in nfs4_discover_server_trunking
> NFSv4/4.1: Fix bugs in nfs4[01]_walk_client_list
>
> fs/nfs/nfs4client.c | 44 ++++++++++++++++++++++++++++----------------
> fs/nfs/nfs4state.c | 8 +++++++-
> 2 files changed, 35 insertions(+), 17 deletions(-)

I was wondering why this commit is not tagged to go in stable as well:

commit f05c124a70a4953a66acbd6d6c601ea1eb5d0fa7
Author: Trond Myklebust <[email protected]>
Date: Fri Apr 5 14:13:21 2013 -0400

SUNRPC: Fix a potential memory leak in rpc_new_client

If the call to rpciod_up() fails, we currently leak a reference to the
struct rpc_xprt.
As part of the fix, we also remove the redundant check for xprt!=NULL.
This is already taken care of by the callers.

Signed-off-by: Trond Myklebust <[email protected]>

http://git.linux-nfs.org/?p=trondmy/linux-nfs.git;a=commit;h=f05c124a70a4953a66acbd6d6c601ea1eb5d0fa7

Regards,

--
William

2013-04-06 20:29:58

by Myklebust, Trond

[permalink] [raw]

Subject: Re: [PATCH 0/2] Stable patches for NFSv4/4.1 trunking

On Sat, 2013-04-06 at 15:11 +0200, William Dauchy wrote:
> Hi Trond,
>
> On Fri, Apr 5, 2013 at 11:16 PM, Trond Myklebust
> <[email protected]> wrote:
> > The following two sets of issues for the stable kernel were found when
> > debugging the rpcsec_gss patches.
> > Thanks to Bryan for helping with testing.
> >
> > Trond Myklebust (2):
> > NFSv4: Fix a memory leak in nfs4_discover_server_trunking
> > NFSv4/4.1: Fix bugs in nfs4[01]_walk_client_list
> >
> > fs/nfs/nfs4client.c | 44 ++++++++++++++++++++++++++++----------------
> > fs/nfs/nfs4state.c | 8 +++++++-
> > 2 files changed, 35 insertions(+), 17 deletions(-)
>
> I was wondering why this commit is not tagged to go in stable as well:
>
> commit f05c124a70a4953a66acbd6d6c601ea1eb5d0fa7
> Author: Trond Myklebust <[email protected]>
> Date: Fri Apr 5 14:13:21 2013 -0400
>
> SUNRPC: Fix a potential memory leak in rpc_new_client
>
> If the call to rpciod_up() fails, we currently leak a reference to the
> struct rpc_xprt.
> As part of the fix, we also remove the redundant check for xprt!=NULL.
> This is already taken care of by the callers.
>
> Signed-off-by: Trond Myklebust <[email protected]>
>
> http://git.linux-nfs.org/?p=trondmy/linux-nfs.git;a=commit;h=f05c124a70a4953a66acbd6d6c601ea1eb5d0fa7

The main reason is that it should be a _very_ rare event, since it
requires a call to try_to_get_module(THIS) to fail, and so I can't see
that it could ever cause a huge leakage.
If someone can show that it is more of a problem than I suggest above,
then I'm happy to reconsider.

--
Trond Myklebust
Linux NFS client maintainer

NetApp
[email protected]
http://www.netapp.com

2013-04-08 10:16:08

by William Dauchy

[permalink] [raw]

Subject: Re: [PATCH 0/2] Stable patches for NFSv4/4.1 trunking

On Sat, Apr 6, 2013 at 10:29 PM, Myklebust, Trond
<[email protected]> wrote:
> The main reason is that it should be a _very_ rare event, since it
> requires a call to try_to_get_module(THIS) to fail, and so I can't see
> that it could ever cause a huge leakage.
> If someone can show that it is more of a problem than I suggest above,
> then I'm happy to reconsider.

Understood. I didn't know a patch fixing a very rare event could not
go in stable.

--
William

2013-04-05 21:16:34

by Myklebust, Trond

[permalink] [raw]

Subject: [PATCH 2/2] NFSv4/4.1: Fix bugs in nfs4[01]_walk_client_list

It is unsafe to use list_for_each_entry_safe() here, because
when we drop the nn->nfs_client_lock, we pin the _current_ list
entry and ensure that it stays in the list, but we don't do the
same for the _next_ list entry. Use of list_for_each_entry() is
therefore the correct thing to do.

Also fix the refcounting in nfs41_walk_client_list().

Finally, ensure that the nfs_client has finished being initialised
and, in the case of NFSv4.1, that the session is set up.

Signed-off-by: Trond Myklebust <[email protected]>
Cc: Chuck Lever <[email protected]>
Cc: Bryan Schumaker <[email protected]>
Cc: [email protected] [>= 3.7]
---
fs/nfs/nfs4client.c | 44 ++++++++++++++++++++++++++++----------------
1 file changed, 28 insertions(+), 16 deletions(-)

diff --git a/fs/nfs/nfs4client.c b/fs/nfs/nfs4client.c
index ac4fc9a..c7b346f 100644
--- a/fs/nfs/nfs4client.c
+++ b/fs/nfs/nfs4client.c
@@ -300,7 +300,7 @@ int nfs40_walk_client_list(struct nfs_client *new,
struct rpc_cred *cred)
{
struct nfs_net *nn = net_generic(new->cl_net, nfs_net_id);
- struct nfs_client *pos, *n, *prev = NULL;
+ struct nfs_client *pos, *prev = NULL;
struct nfs4_setclientid_res clid = {
.clientid = new->cl_clientid,
.confirm = new->cl_confirm,
@@ -308,10 +308,23 @@ int nfs40_walk_client_list(struct nfs_client *new,
int status = -NFS4ERR_STALE_CLIENTID;

spin_lock(&nn->nfs_client_lock);
- list_for_each_entry_safe(pos, n, &nn->nfs_client_list, cl_share_link) {
+ list_for_each_entry(pos, &nn->nfs_client_list, cl_share_link) {
/* If "pos" isn't marked ready, we can't trust the
* remaining fields in "pos" */
- if (pos->cl_cons_state < NFS_CS_READY)
+ if (pos->cl_cons_state > NFS_CS_READY) {
+ atomic_inc(&pos->cl_count);
+ spin_unlock(&nn->nfs_client_lock);
+
+ if (prev)
+ nfs_put_client(prev);
+ prev = pos;
+
+ status = nfs_wait_client_init_complete(pos);
+ spin_lock(&nn->nfs_client_lock);
+ if (status < 0)
+ continue;
+ }
+ if (pos->cl_cons_state != NFS_CS_READY)
continue;

if (pos->rpc_ops != new->rpc_ops)
@@ -423,16 +436,16 @@ int nfs41_walk_client_list(struct nfs_client *new,
struct rpc_cred *cred)
{
struct nfs_net *nn = net_generic(new->cl_net, nfs_net_id);
- struct nfs_client *pos, *n, *prev = NULL;
+ struct nfs_client *pos, *prev = NULL;
int status = -NFS4ERR_STALE_CLIENTID;

spin_lock(&nn->nfs_client_lock);
- list_for_each_entry_safe(pos, n, &nn->nfs_client_list, cl_share_link) {
+ list_for_each_entry(pos, &nn->nfs_client_list, cl_share_link) {
/* If "pos" isn't marked ready, we can't trust the
* remaining fields in "pos", especially the client
* ID and serverowner fields. Wait for CREATE_SESSION
* to finish. */
- if (pos->cl_cons_state < NFS_CS_READY) {
+ if (pos->cl_cons_state > NFS_CS_READY) {
atomic_inc(&pos->cl_count);
spin_unlock(&nn->nfs_client_lock);

@@ -440,18 +453,17 @@ int nfs41_walk_client_list(struct nfs_client *new,
nfs_put_client(prev);
prev = pos;

- nfs4_schedule_lease_recovery(pos);
status = nfs_wait_client_init_complete(pos);
- if (status < 0) {
- nfs_put_client(pos);
- spin_lock(&nn->nfs_client_lock);
- continue;
+ if (status == 0) {
+ nfs4_schedule_lease_recovery(pos);
+ status = nfs4_wait_clnt_recover(pos);
}
- status = pos->cl_cons_state;
spin_lock(&nn->nfs_client_lock);
if (status < 0)
continue;
}
+ if (pos->cl_cons_state != NFS_CS_READY)
+ continue;

if (pos->rpc_ops != new->rpc_ops)
continue;
@@ -469,17 +481,17 @@ int nfs41_walk_client_list(struct nfs_client *new,
continue;

atomic_inc(&pos->cl_count);
- spin_unlock(&nn->nfs_client_lock);
+ *result = pos;
dprintk("NFS: <-- %s using nfs_client = %p ({%d})\n",
__func__, pos, atomic_read(&pos->cl_count));
-
- *result = pos;
- return 0;
+ break;
}

/* No matching nfs_client found. */
spin_unlock(&nn->nfs_client_lock);
dprintk("NFS: <-- %s status = %d\n", __func__, status);
+ if (prev)
+ nfs_put_client(prev);
return status;
}
#endif /* CONFIG_NFS_V4_1 */
--
1.8.1.4

2013-04-05 21:16:33

by Myklebust, Trond

[permalink] [raw]

Subject: [PATCH 1/2] NFSv4: Fix a memory leak in nfs4_discover_server_trunking

When we assign a new rpc_client to clp->cl_rpcclient, we need to destroy
the old one.

Signed-off-by: Trond Myklebust <[email protected]>
Cc: Chuck Lever <[email protected]>
Cc: [email protected] [>=3.7]
---
fs/nfs/nfs4state.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
index 6ace365..d41a351 100644
--- a/fs/nfs/nfs4state.c
+++ b/fs/nfs/nfs4state.c
@@ -1886,7 +1886,13 @@ again:
status = PTR_ERR(clnt);
break;
}
- clp->cl_rpcclient = clnt;
+ /* Note: this is safe because we haven't yet marked the
+ * client as ready, so we are the only user of
+ * clp->cl_rpcclient
+ */
+ clnt = xchg(&clp->cl_rpcclient, clnt);
+ rpc_shutdown_client(clnt);
+ clnt = clp->cl_rpcclient;
goto again;

case -NFS4ERR_MINOR_VERS_MISMATCH:
--
1.8.1.4

2013-04-08 12:06:33

by Jim Rees

[permalink] [raw]

Subject: Re: [PATCH 0/2] Stable patches for NFSv4/4.1 trunking

William Dauchy wrote:

On Sat, Apr 6, 2013 at 10:29 PM, Myklebust, Trond
<[email protected]> wrote:
> The main reason is that it should be a _very_ rare event, since it
> requires a call to try_to_get_module(THIS) to fail, and so I can't see
> that it could ever cause a huge leakage.
> If someone can show that it is more of a problem than I suggest above,
> then I'm happy to reconsider.

Understood. I didn't know a patch fixing a very rare event could not
go in stable.

Every patch, no matter how "obviously" correct, carries a risk of
introducing new bugs. Trond is responsible for weighing the risk of new bugs
against the benefit of fixing old ones. He has a lot of experience doing
this. If he makes a mistake, he takes the heat for your bugs. Introducing a
new bug into stable is a lot worse than introducing one into -next.