2012-10-30 20:16:08

by Anna Schumaker

[permalink] [raw]
Subject: [PATCH] NFS: Wait for session recovery to finish before returning

From: Bryan Schumaker <[email protected]>

Currently, we will schedule session recovery and then return to the
caller of nfs4_handle_exception. This works for most cases, but causes
a hang on the following test case:

Client Server
------ ------
Open file over NFS v4.1
Write to file
Expire client
Try to lock file

The server will return NFS4ERR_BADSESSION, prompting the client to
schedule recovery. However, the client will continue placing lock
attempts and the open recovery never seems to be scheduled. The
simplest solution is to wait for session recovery to run before retrying
the lock.

Signed-off-by: Bryan Schumaker <[email protected]>
---
fs/nfs/nfs4proc.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 8b04d14..70e0c47 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -338,8 +338,7 @@ static int nfs4_handle_exception(struct nfs_server *server, int errorcode, struc
dprintk("%s ERROR: %d Reset session\n", __func__,
errorcode);
nfs4_schedule_session_recovery(clp->cl_session, errorcode);
- exception->retry = 1;
- break;
+ goto wait_on_recovery;
#endif /* defined(CONFIG_NFS_V4_1) */
case -NFS4ERR_FILE_OPEN:
if (exception->timeout > HZ) {
--
1.8.0



2012-10-31 22:19:37

by Andy Adamson

[permalink] [raw]
Subject: Re: [PATCH] NFS: Wait for session recovery to finish before returning

On Wed, Oct 31, 2012 at 6:14 PM, Andy Adamson <[email protected]> wrote:
> On Tue, Oct 30, 2012 at 4:06 PM, <[email protected]> wrote:
>> From: Bryan Schumaker <[email protected]>
>>
>> Currently, we will schedule session recovery and then return to the
>> caller of nfs4_handle_exception. This works for most cases, but causes
>> a hang on the following test case:
>>
>> Client Server
>> ------ ------
>> Open file over NFS v4.1
>> Write to file
>> Expire client
>> Try to lock file
>>
>> The server will return NFS4ERR_BADSESSION, prompting the client to
>> schedule recovery. However, the client will continue placing lock
>> attempts and the open recovery never seems to be scheduled.
>
> I need more detail. Is this what is happening?
>
> When the first NFS4ERR_BADSESSION error arrives, session recovery
> starts by setting the NFS4_SESSION_DRAINING bit which places all non
> RPC_PRIORITY_PRIVILEGED tasks with OP_SEQUENCE on the slot_tbl_waitq,
> so the lock requests should be placed on the slot_tbl_waitq and not go
> on the wire.
>
> Once the session is drained, DESTROY_SESSION and CREATE_SESSION are
> called and one will return with NFS4ERR_STALE_CLIENTID kicking off the
> EXCHANGE_ID / CREATE_SESSION to establish a new clientid.
>
> Do you see the clientid being recovered?
>
> Once the new session is created, it inherits the slot_tbl_waitq of the
> old session, and all the lock requests and any other tasks piled up on
> the slot_tbl_waitq will run. At the same time, the state_manager
> thread is running recovery for any OPENs established under the old
> clientid, and then their associated LOCKs.
>
> The OPEN and LOCK recovery tasks run with RPC_PRIORITY_PRIVILEGE which
> should mean that they get run before the RPC tasks with lower
> priority.
>
> So my question (finally!): How is open recovery starvation happening?
> nfs41_sequence_free_slot => nfs4_check_drain_fc_complete =>
> rpc_wake_up_first on the session slot_tbl_waitq, which in turn calls
> __rpc_find_next_queued_priority which should choose
> RPC_PRIORITY_PRIVILEGE tasks over lower privileged tasks on the
> session slot_tbl_waitq.
>
> So even with a bunch of slots, when the first one returns and and free
> it's slot, the priority tasks (e.g. open recovery) should run.
>
>
> -->Andy
>
>
>> The
>> simplest solution is to wait for session recovery to run before retrying
>> the lock.
>
> If on the wire RPCs return with NFS4ERR_BADSESSION, and then wait for
> session recovery in the exception handler, they will not call their
> rpc_release function

Please ignore the above sentence - it was early thinking....

-->Andy

>
>>
>> Signed-off-by: Bryan Schumaker <[email protected]>
>> ---
>> fs/nfs/nfs4proc.c | 3 +--
>> 1 file changed, 1 insertion(+), 2 deletions(-)
>>
>> diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
>> index 8b04d14..70e0c47 100644
>> --- a/fs/nfs/nfs4proc.c
>> +++ b/fs/nfs/nfs4proc.c
>> @@ -338,8 +338,7 @@ static int nfs4_handle_exception(struct nfs_server *server, int errorcode, struc
>> dprintk("%s ERROR: %d Reset session\n", __func__,
>> errorcode);
>> nfs4_schedule_session_recovery(clp->cl_session, errorcode);
>> - exception->retry = 1;
>> - break;
>> + goto wait_on_recovery;
>> #endif /* defined(CONFIG_NFS_V4_1) */
>> case -NFS4ERR_FILE_OPEN:
>> if (exception->timeout > HZ) {
>> --
>> 1.8.0
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>> the body of a message to [email protected]
>> More majordomo info at http://vger.kernel.org/majordomo-info.html

2012-10-31 22:14:47

by Andy Adamson

[permalink] [raw]
Subject: Re: [PATCH] NFS: Wait for session recovery to finish before returning

On Tue, Oct 30, 2012 at 4:06 PM, <[email protected]> wrote:
> From: Bryan Schumaker <[email protected]>
>
> Currently, we will schedule session recovery and then return to the
> caller of nfs4_handle_exception. This works for most cases, but causes
> a hang on the following test case:
>
> Client Server
> ------ ------
> Open file over NFS v4.1
> Write to file
> Expire client
> Try to lock file
>
> The server will return NFS4ERR_BADSESSION, prompting the client to
> schedule recovery. However, the client will continue placing lock
> attempts and the open recovery never seems to be scheduled.

I need more detail. Is this what is happening?

When the first NFS4ERR_BADSESSION error arrives, session recovery
starts by setting the NFS4_SESSION_DRAINING bit which places all non
RPC_PRIORITY_PRIVILEGED tasks with OP_SEQUENCE on the slot_tbl_waitq,
so the lock requests should be placed on the slot_tbl_waitq and not go
on the wire.

Once the session is drained, DESTROY_SESSION and CREATE_SESSION are
called and one will return with NFS4ERR_STALE_CLIENTID kicking off the
EXCHANGE_ID / CREATE_SESSION to establish a new clientid.

Do you see the clientid being recovered?

Once the new session is created, it inherits the slot_tbl_waitq of the
old session, and all the lock requests and any other tasks piled up on
the slot_tbl_waitq will run. At the same time, the state_manager
thread is running recovery for any OPENs established under the old
clientid, and then their associated LOCKs.

The OPEN and LOCK recovery tasks run with RPC_PRIORITY_PRIVILEGE which
should mean that they get run before the RPC tasks with lower
priority.

So my question (finally!): How is open recovery starvation happening?
nfs41_sequence_free_slot => nfs4_check_drain_fc_complete =>
rpc_wake_up_first on the session slot_tbl_waitq, which in turn calls
__rpc_find_next_queued_priority which should choose
RPC_PRIORITY_PRIVILEGE tasks over lower privileged tasks on the
session slot_tbl_waitq.

So even with a bunch of slots, when the first one returns and and free
it's slot, the priority tasks (e.g. open recovery) should run.


-->Andy


> The
> simplest solution is to wait for session recovery to run before retrying
> the lock.

If on the wire RPCs return with NFS4ERR_BADSESSION, and then wait for
session recovery in the exception handler, they will not call their
rpc_release function

>
> Signed-off-by: Bryan Schumaker <[email protected]>
> ---
> fs/nfs/nfs4proc.c | 3 +--
> 1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
> index 8b04d14..70e0c47 100644
> --- a/fs/nfs/nfs4proc.c
> +++ b/fs/nfs/nfs4proc.c
> @@ -338,8 +338,7 @@ static int nfs4_handle_exception(struct nfs_server *server, int errorcode, struc
> dprintk("%s ERROR: %d Reset session\n", __func__,
> errorcode);
> nfs4_schedule_session_recovery(clp->cl_session, errorcode);
> - exception->retry = 1;
> - break;
> + goto wait_on_recovery;
> #endif /* defined(CONFIG_NFS_V4_1) */
> case -NFS4ERR_FILE_OPEN:
> if (exception->timeout > HZ) {
> --
> 1.8.0
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html