The following three patches fix a number of problems during state
reclaim. They apply to Trond's master branch.
[PATCH 1/3] nfs41: Mark stateids in need of reclaim if state manager gets stale clientid
[PATCH 2/3] nfs41: Handle session errors during delegation return
[PATCH 3/3] nfs41: Retry delegation return if it failed with session error
These were tested on top of an old version of Andy's session draining
patches.
- ricardo
The state manager was not marking the stateids as needing to be reclaimed
after reestablishing the clientid.
Signed-off-by: Ricardo Labiaga <[email protected]>
---
fs/nfs/nfs4state.c | 1 +
1 files changed, 1 insertions(+), 0 deletions(-)
diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
index 630199d..ae90df8 100644
--- a/fs/nfs/nfs4state.c
+++ b/fs/nfs/nfs4state.c
@@ -1157,6 +1157,7 @@ static void nfs4_session_recovery_handle_error(struct nfs_client *clp, int err)
case -NFS4ERR_STALE_CLIENTID:
set_bit(NFS4CLNT_LEASE_EXPIRED, &clp->cl_state);
set_bit(NFS4CLNT_SESSION_SETUP, &clp->cl_state);
+ nfs4_state_start_reclaim_reboot(clp);
}
}
--
1.5.4.3
Update nfs4_delegreturn_done() to retry the operation after setting the
NFS4CLNT_SESSION_SETUP bit to indicate the need to reset the session.
Signed-off-by: Ricardo Labiaga <[email protected]>
---
fs/nfs/nfs4proc.c | 13 +++++++++++--
1 files changed, 11 insertions(+), 2 deletions(-)
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 97d4a82..25f4180 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -3484,9 +3484,18 @@ static void nfs4_delegreturn_done(struct rpc_task *task, void *calldata)
nfs4_sequence_done_free_slot(data->res.server, &data->res.seq_res,
task->tk_status);
- data->rpc_status = task->tk_status;
- if (data->rpc_status == 0)
+ switch (task->tk_status) {
+ case -NFS4ERR_STALE_STATEID:
+ case -NFS4ERR_EXPIRED:
+ case 0:
renew_lease(data->res.server, data->timestamp);
+ break;
+ default:
+ if (nfs4_async_handle_error(task, data->res.server, NULL) ==
+ -EAGAIN)
+ nfs4_restart_rpc(task, data->res.server->nfs_client);
+ }
+ data->rpc_status = task->tk_status;
}
static void nfs4_delegreturn_release(void *calldata)
--
1.5.4.3
Add session error handling to nfs4_open_delegation_recall()
Signed-off-by: Ricardo Labiaga <[email protected]>
---
fs/nfs/nfs4proc.c | 12 ++++++++++++
1 files changed, 12 insertions(+), 0 deletions(-)
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index fb94ed0..97d4a82 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -1169,6 +1169,18 @@ int nfs4_open_delegation_recall(struct nfs_open_context *ctx, struct nfs4_state
case -ENOENT:
case -ESTALE:
goto out;
+ case -NFS4ERR_BADSESSION:
+ case -NFS4ERR_BADSLOT:
+ case -NFS4ERR_BAD_HIGH_SLOT:
+ case -NFS4ERR_CONN_NOT_BOUND_TO_SESSION:
+ case -NFS4ERR_DEADSESSION:
+ case -NFS4ERR_SEQ_FALSE_RETRY:
+ case -NFS4ERR_SEQ_MISORDERED:
+ dprintk("%s ERROR: %d Reset session\n",
+ __func__, err);
+ set_bit(NFS4CLNT_SESSION_SETUP,
+ &server->nfs_client->cl_state);
+ goto out;
case -NFS4ERR_STALE_CLIENTID:
case -NFS4ERR_STALE_STATEID:
case -NFS4ERR_EXPIRED:
--
1.5.4.3
On Sat, 2009-12-05 at 12:11 -0800, Ricardo Labiaga wrote:
> The state manager was not marking the stateids as needing to be reclaimed
> after reestablishing the clientid.
>
> Signed-off-by: Ricardo Labiaga <[email protected]>
> ---
> fs/nfs/nfs4state.c | 1 +
> 1 files changed, 1 insertions(+), 0 deletions(-)
>
> diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
> index 630199d..ae90df8 100644
> --- a/fs/nfs/nfs4state.c
> +++ b/fs/nfs/nfs4state.c
> @@ -1157,6 +1157,7 @@ static void nfs4_session_recovery_handle_error(struct nfs_client *clp, int err)
> case -NFS4ERR_STALE_CLIENTID:
> set_bit(NFS4CLNT_LEASE_EXPIRED, &clp->cl_state);
> set_bit(NFS4CLNT_SESSION_SETUP, &clp->cl_state);
> + nfs4_state_start_reclaim_reboot(clp);
> }
> }
>
So, why do we need a special nfs4_session_recovery_handle_error() that
just mirrors the existing nfs4_recovery_handle_error().
Please just get rid of it...
Trond
On Sat, 2009-12-05 at 12:11 -0800, Ricardo Labiaga wrote:
> Add session error handling to nfs4_open_delegation_recall()
>
> Signed-off-by: Ricardo Labiaga <[email protected]>
> ---
> fs/nfs/nfs4proc.c | 12 ++++++++++++
> 1 files changed, 12 insertions(+), 0 deletions(-)
>
> diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
> index fb94ed0..97d4a82 100644
> --- a/fs/nfs/nfs4proc.c
> +++ b/fs/nfs/nfs4proc.c
> @@ -1169,6 +1169,18 @@ int nfs4_open_delegation_recall(struct nfs_open_context *ctx, struct nfs4_state
> case -ENOENT:
> case -ESTALE:
> goto out;
> + case -NFS4ERR_BADSESSION:
> + case -NFS4ERR_BADSLOT:
> + case -NFS4ERR_BAD_HIGH_SLOT:
> + case -NFS4ERR_CONN_NOT_BOUND_TO_SESSION:
> + case -NFS4ERR_DEADSESSION:
> + case -NFS4ERR_SEQ_FALSE_RETRY:
> + case -NFS4ERR_SEQ_MISORDERED:
> + dprintk("%s ERROR: %d Reset session\n",
> + __func__, err);
> + set_bit(NFS4CLNT_SESSION_SETUP,
> + &server->nfs_client->cl_state);
> + goto out;
> case -NFS4ERR_STALE_CLIENTID:
> case -NFS4ERR_STALE_STATEID:
> case -NFS4ERR_EXPIRED:
BADSESSION and DEADSESSION should call nfs4_schedule_state_recovery()
instead.
The rest can continue to call NFS4CLNT_SESSION_RESET, but should also
call nfs4_schedule_state_manager().
Trond
On Sat, 2009-12-05 at 12:11 -0800, Ricardo Labiaga wrote:
> Update nfs4_delegreturn_done() to retry the operation after setting the
> NFS4CLNT_SESSION_SETUP bit to indicate the need to reset the session.
>
> Signed-off-by: Ricardo Labiaga <[email protected]>
> ---
> fs/nfs/nfs4proc.c | 13 +++++++++++--
> 1 files changed, 11 insertions(+), 2 deletions(-)
>
> diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
> index 97d4a82..25f4180 100644
> --- a/fs/nfs/nfs4proc.c
> +++ b/fs/nfs/nfs4proc.c
> @@ -3484,9 +3484,18 @@ static void nfs4_delegreturn_done(struct rpc_task *task, void *calldata)
> nfs4_sequence_done_free_slot(data->res.server, &data->res.seq_res,
> task->tk_status);
>
> - data->rpc_status = task->tk_status;
> - if (data->rpc_status == 0)
> + switch (task->tk_status) {
> + case -NFS4ERR_STALE_STATEID:
> + case -NFS4ERR_EXPIRED:
> + case 0:
> renew_lease(data->res.server, data->timestamp);
> + break;
> + default:
> + if (nfs4_async_handle_error(task, data->res.server, NULL) ==
> + -EAGAIN)
> + nfs4_restart_rpc(task, data->res.server->nfs_client);
return;
> + }
> + data->rpc_status = task->tk_status;
> }
>
> static void nfs4_delegreturn_release(void *calldata)
On 12/5/09 12:34 PM, "Trond Myklebust" <[email protected]> wrote:
> On Sat, 2009-12-05 at 12:11 -0800, Ricardo Labiaga wrote:
>> The state manager was not marking the stateids as needing to be reclaimed
>> after reestablishing the clientid.
>>
>> Signed-off-by: Ricardo Labiaga <[email protected]>
>> ---
>> fs/nfs/nfs4state.c | 1 +
>> 1 files changed, 1 insertions(+), 0 deletions(-)
>>
>> diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
>> index 630199d..ae90df8 100644
>> --- a/fs/nfs/nfs4state.c
>> +++ b/fs/nfs/nfs4state.c
>> @@ -1157,6 +1157,7 @@ static void nfs4_session_recovery_handle_error(struct
>> nfs_client *clp, int err)
>> case -NFS4ERR_STALE_CLIENTID:
>> set_bit(NFS4CLNT_LEASE_EXPIRED, &clp->cl_state);
>> set_bit(NFS4CLNT_SESSION_SETUP, &clp->cl_state);
>> + nfs4_state_start_reclaim_reboot(clp);
>> }
>> }
>>
>
> So, why do we need a special nfs4_session_recovery_handle_error() that
> just mirrors the existing nfs4_recovery_handle_error().
>
Good point. The early exit from nfs4_state_end_reclaim_reboot() if
NFS4CLNT_RECLAIM_REBOOT is set makes it equivalent. I'll make the change.
- ricardo
> Please just get rid of it...
>
> Trond
On 12/5/09 12:38 PM, "Trond Myklebust" <[email protected]> wrote:
> On Sat, 2009-12-05 at 12:11 -0800, Ricardo Labiaga wrote:
>> Add session error handling to nfs4_open_delegation_recall()
>>
>> Signed-off-by: Ricardo Labiaga <[email protected]>
>> ---
>> fs/nfs/nfs4proc.c | 12 ++++++++++++
>> 1 files changed, 12 insertions(+), 0 deletions(-)
>>
>> diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
>> index fb94ed0..97d4a82 100644
>> --- a/fs/nfs/nfs4proc.c
>> +++ b/fs/nfs/nfs4proc.c
>> @@ -1169,6 +1169,18 @@ int nfs4_open_delegation_recall(struct
>> nfs_open_context *ctx, struct nfs4_state
>> case -ENOENT:
>> case -ESTALE:
>> goto out;
>> + case -NFS4ERR_BADSESSION:
>> + case -NFS4ERR_BADSLOT:
>> + case -NFS4ERR_BAD_HIGH_SLOT:
>> + case -NFS4ERR_CONN_NOT_BOUND_TO_SESSION:
>> + case -NFS4ERR_DEADSESSION:
>> + case -NFS4ERR_SEQ_FALSE_RETRY:
>> + case -NFS4ERR_SEQ_MISORDERED:
>> + dprintk("%s ERROR: %d Reset session\n",
>> + __func__, err);
>> + set_bit(NFS4CLNT_SESSION_SETUP,
>> + &server->nfs_client->cl_state);
>> + goto out;
>> case -NFS4ERR_STALE_CLIENTID:
>> case -NFS4ERR_STALE_STATEID:
>> case -NFS4ERR_EXPIRED:
>
> BADSESSION and DEADSESSION should call nfs4_schedule_state_recovery()
> instead.
>
Will do. I did earlier because I was still based on the master branch.
> The rest can continue to call NFS4CLNT_SESSION_RESET, but should also
> call nfs4_schedule_state_manager().
The other error handlers (nfs4_handle_exception, ...) call
nfs4_schedule_state_recovery() instead of setting NFS4CLNT_SESSION_RESET.
Should all of them be changed as well? Not sure I understand the difference
in the handling.
- ricardo
>
> Trond