2014-02-10 20:27:02

by Steve Dickson

[permalink] [raw]
Subject: [PATCH] NFSv4: Infinite loop in lease recovery when rpc.gssd is not running.

Commit 0ea9de0e introduce a regression in the lease recovery code.

An infinite loop is caused when nfs4_establish_lease() fails
with -EACCES. This causes nfs4_handle_reclaim_lease_error()
to sleep a bit and resets the NFS4CLNT_LEASE_EXPIRED bit.
This in turn causes nfs4_state_manager() to try and
reestablished the lease, again, again, again...

The problem is a valid RPCSEC_GSS client is being created when
rpc.gssd is not running. This is causing the RPC code to fail
with the -EACCES sending the lease reestablished off the
deep end.

Moving the gssd_running() check back into nfs4_init_client(),
stopping the RPCSEC_GSS client from being create, stops
the looping

Signed-off-by: Steve Dickson <[email protected]>
---
fs/nfs/nfs4client.c | 5 ++++-
1 files changed, 4 insertions(+), 1 deletions(-)

diff --git a/fs/nfs/nfs4client.c b/fs/nfs/nfs4client.c
index 860ad26..a60269f 100644
--- a/fs/nfs/nfs4client.c
+++ b/fs/nfs/nfs4client.c
@@ -372,7 +372,10 @@ struct nfs_client *nfs4_init_client(struct nfs_client *clp,
__set_bit(NFS_CS_DISCRTRY, &clp->cl_flags);
__set_bit(NFS_CS_NO_RETRANS_TIMEOUT, &clp->cl_flags);

- error = nfs_create_rpc_client(clp, timeparms, RPC_AUTH_GSS_KRB5I);
+ error = -EINVAL;
+ if (gssd_running(clp->cl_net))
+ error = nfs_create_rpc_client(clp, timeparms,
+ RPC_AUTH_GSS_KRB5I);
if (error == -EINVAL)
error = nfs_create_rpc_client(clp, timeparms, RPC_AUTH_UNIX);
if (error < 0)
--
1.7.1



2014-02-10 21:48:10

by Trond Myklebust

[permalink] [raw]
Subject: [PATCH] SUNRPC: Don't create a gss auth cache unless rpc.gssd is running

An infinite loop is caused when nfs4_establish_lease() fails
with -EACCES. This causes nfs4_handle_reclaim_lease_error()
to sleep a bit and resets the NFS4CLNT_LEASE_EXPIRED bit.
This in turn causes nfs4_state_manager() to try and
reestablished the lease, again, again, again...

The problem is a valid RPCSEC_GSS client is being created when
rpc.gssd is not running.

Link: http://lkml.kernel.org/r/[email protected]
Fixes: 0ea9de0ea6a4 (sunrpc: turn warn_gssd() log message into a dprintk())
Reported-by: Steve Dickson <[email protected]>
Signed-off-by: Trond Myklebust <[email protected]>
---
net/sunrpc/auth_gss/auth_gss.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/net/sunrpc/auth_gss/auth_gss.c b/net/sunrpc/auth_gss/auth_gss.c
index 6c0513a7f992..44a61e8fda6f 100644
--- a/net/sunrpc/auth_gss/auth_gss.c
+++ b/net/sunrpc/auth_gss/auth_gss.c
@@ -991,6 +991,8 @@ gss_create_new(struct rpc_auth_create_args *args, struct rpc_clnt *clnt)
gss_auth->service = gss_pseudoflavor_to_service(gss_auth->mech, flavor);
if (gss_auth->service == 0)
goto err_put_mech;
+ if (!gssd_running(gss_auth->net))
+ goto err_put_mech;
auth = &gss_auth->rpc_auth;
auth->au_cslack = GSS_CRED_SLACK >> 2;
auth->au_rslack = GSS_VERF_SLACK >> 2;
--
1.8.5.3


2014-02-10 21:10:16

by Trond Myklebust

[permalink] [raw]
Subject: Re: [PATCH] NFSv4: Infinite loop in lease recovery when rpc.gssd is not running.


On Feb 10, 2014, at 16:06, Steve Dickson <[email protected]> wrote:

> [ Resent with Trond's correct email address ]
>
> Commit 0ea9de0e introduce a regression in the lease recovery code.
>
> An infinite loop is caused when nfs4_establish_lease() fails
> with -EACCES. This causes nfs4_handle_reclaim_lease_error()
> to sleep a bit and resets the NFS4CLNT_LEASE_EXPIRED bit.
> This in turn causes nfs4_state_manager() to try and
> reestablished the lease, again, again, again...
>
> The problem is a valid RPCSEC_GSS client is being created when
> rpc.gssd is not running. This is causing the RPC code to fail
> with the -EACCES sending the lease reestablished off the
> deep end.
>
> Moving the gssd_running() check back into nfs4_init_client(),
> stopping the RPCSEC_GSS client from being create, stops
> the looping
>
> Signed-off-by: Steve Dickson <[email protected]>
> ---
> fs/nfs/nfs4client.c | 5 ++++-
> 1 files changed, 4 insertions(+), 1 deletions(-)
>
> diff --git a/fs/nfs/nfs4client.c b/fs/nfs/nfs4client.c
> index 860ad26..a60269f 100644
> --- a/fs/nfs/nfs4client.c
> +++ b/fs/nfs/nfs4client.c
> @@ -372,7 +372,10 @@ struct nfs_client *nfs4_init_client(struct nfs_client *clp,
> __set_bit(NFS_CS_DISCRTRY, &clp->cl_flags);
> __set_bit(NFS_CS_NO_RETRANS_TIMEOUT, &clp->cl_flags);
>
> - error = nfs_create_rpc_client(clp, timeparms, RPC_AUTH_GSS_KRB5I);
> + error = -EINVAL;
> + if (gssd_running(clp->cl_net))
> + error = nfs_create_rpc_client(clp, timeparms,
> + RPC_AUTH_GSS_KRB5I);
> if (error == -EINVAL)
> error = nfs_create_rpc_client(clp, timeparms, RPC_AUTH_UNIX);
> if (error < 0)
> --
> 1.7.1
>

NACK. gssd_running() is not an acceptable solution outside of the RPC layer.

_________________________________
Trond Myklebust
Linux NFS client maintainer, PrimaryData
[email protected]


2014-02-11 11:26:18

by Steve Dickson

[permalink] [raw]
Subject: Re: [PATCH] SUNRPC: Don't create a gss auth cache unless rpc.gssd is running



On 02/10/2014 06:01 PM, Steve Dickson wrote:
>
>
> On 02/10/2014 04:48 PM, Trond Myklebust wrote:
>> An infinite loop is caused when nfs4_establish_lease() fails
>> with -EACCES. This causes nfs4_handle_reclaim_lease_error()
>> to sleep a bit and resets the NFS4CLNT_LEASE_EXPIRED bit.
>> This in turn causes nfs4_state_manager() to try and
>> reestablished the lease, again, again, again...
>>
>> The problem is a valid RPCSEC_GSS client is being created when
>> rpc.gssd is not running.
>>
>> Link: http://lkml.kernel.org/r/[email protected]
>> Fixes: 0ea9de0ea6a4 (sunrpc: turn warn_gssd() log message into a dprintk())
>> Reported-by: Steve Dickson <[email protected]>
>> Signed-off-by: Trond Myklebust <[email protected]>
>> ---
>> net/sunrpc/auth_gss/auth_gss.c | 2 ++
>> 1 file changed, 2 insertions(+)
>>
>> diff --git a/net/sunrpc/auth_gss/auth_gss.c b/net/sunrpc/auth_gss/auth_gss.c
>> index 6c0513a7f992..44a61e8fda6f 100644
>> --- a/net/sunrpc/auth_gss/auth_gss.c
>> +++ b/net/sunrpc/auth_gss/auth_gss.c
>> @@ -991,6 +991,8 @@ gss_create_new(struct rpc_auth_create_args *args, struct rpc_clnt *clnt)
>> gss_auth->service = gss_pseudoflavor_to_service(gss_auth->mech, flavor);
>> if (gss_auth->service == 0)
>> goto err_put_mech;
>> + if (!gssd_running(gss_auth->net))
>> + goto err_put_mech;
>> auth = &gss_auth->rpc_auth;
>> auth->au_cslack = GSS_CRED_SLACK >> 2;
>> auth->au_rslack = GSS_VERF_SLACK >> 2;
>>
> Unfortunately I'm seeing the same loop but this time its with _nfs4_proc_exchange_id
>
> Here is the trace point output:
> 192.168.62.8-ma-20371 [000] .... 955443.604229: nfs4_exchange_id: error=-13 (EACCES) dstaddr=192.168.62.8
>
> and here is the rpcdebug output:
> [ 2782.341981] NFS call exchange_id auth=RPCSEC_GSS, 'Linux NFSv4.1 <client>'
> [ 2782.360540] NFS reply exchange_id: -13
>
> All three mounts (v4.0, v4.1, v4.2) are hung...
>
> Looking into it...
Pilot error on my part... I only reloaded sunrpc.ko not auth_rpcgss.ko
What a good night sleep can do for you... :-)

Tested-by: Steve Dickson <[email protected]>

Question, should we be checking that gssd still running when
gss_auth pointer is found in the hash table? I'm thinking
of the case where gssd was started and then stopped.

steved.

2014-02-10 23:01:35

by Steve Dickson

[permalink] [raw]
Subject: Re: [PATCH] SUNRPC: Don't create a gss auth cache unless rpc.gssd is running



On 02/10/2014 04:48 PM, Trond Myklebust wrote:
> An infinite loop is caused when nfs4_establish_lease() fails
> with -EACCES. This causes nfs4_handle_reclaim_lease_error()
> to sleep a bit and resets the NFS4CLNT_LEASE_EXPIRED bit.
> This in turn causes nfs4_state_manager() to try and
> reestablished the lease, again, again, again...
>
> The problem is a valid RPCSEC_GSS client is being created when
> rpc.gssd is not running.
>
> Link: http://lkml.kernel.org/r/[email protected]
> Fixes: 0ea9de0ea6a4 (sunrpc: turn warn_gssd() log message into a dprintk())
> Reported-by: Steve Dickson <[email protected]>
> Signed-off-by: Trond Myklebust <[email protected]>
> ---
> net/sunrpc/auth_gss/auth_gss.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/net/sunrpc/auth_gss/auth_gss.c b/net/sunrpc/auth_gss/auth_gss.c
> index 6c0513a7f992..44a61e8fda6f 100644
> --- a/net/sunrpc/auth_gss/auth_gss.c
> +++ b/net/sunrpc/auth_gss/auth_gss.c
> @@ -991,6 +991,8 @@ gss_create_new(struct rpc_auth_create_args *args, struct rpc_clnt *clnt)
> gss_auth->service = gss_pseudoflavor_to_service(gss_auth->mech, flavor);
> if (gss_auth->service == 0)
> goto err_put_mech;
> + if (!gssd_running(gss_auth->net))
> + goto err_put_mech;
> auth = &gss_auth->rpc_auth;
> auth->au_cslack = GSS_CRED_SLACK >> 2;
> auth->au_rslack = GSS_VERF_SLACK >> 2;
>
Unfortunately I'm seeing the same loop but this time its with _nfs4_proc_exchange_id

Here is the trace point output:
192.168.62.8-ma-20371 [000] .... 955443.604229: nfs4_exchange_id: error=-13 (EACCES) dstaddr=192.168.62.8

and here is the rpcdebug output:
[ 2782.341981] NFS call exchange_id auth=RPCSEC_GSS, 'Linux NFSv4.1 <client>'
[ 2782.360540] NFS reply exchange_id: -13

All three mounts (v4.0, v4.1, v4.2) are hung...

Looking into it...

steved.