LinuxLists.cc - [PATCH 0/5] Reclaim Stage Bug Fixes Version 2

2009-12-07 08:27:43

Subject: [PATCH 0/5] Reclaim Stage Bug Fixes Version 2

The following five patches fix problems during state reclaim.
They incorporate Trond's comments and apply to Trond's
nfs-for-next branch. Patches 2 and 5 handle session errors by
scheduling the state recovery. They do not differentiate between
DEAD/BADSESSION and other session errors in order to be consistent
with the existing error handling. If this needs to be different,
I can submit a separate patch to address all of the areas that
handle this.

Patch 4 is new in the series. It results from the changes requested
to Patch 1.

[PATCH 1/5] nfs41: Mark stateids in need of reclaim if state manager gets stale clientid
[PATCH 2/5] nfs41: Handle session errors during delegation return
[PATCH 3/5] nfs41: Retry delegation return if it failed with session error
[PATCH 4/5] nfs41: New NFS4CLNT_RECLAIM_COMPLETE_PENDING state
[PATCH 5/5] nfs41: Handle NFSv4.1 session errors in the delegation recall code

- ricardo

2009-12-07 14:47:46

by Myklebust, Trond

[permalink] [raw]

Subject: Re: [PATCH 4/5] nfs41: New NFS4CLNT_RECLAIM_COMPLETE_PENDING state

On Mon, 2009-12-07 at 00:21 -0800, Ricardo Labiaga wrote:
> nfs4_state_end_reclaim_reboot() can also be invoked as a
> result of error processing, so it is not a safe place to
> invoke RECLAIM_COMPLETE. Instead, create a new state
> flag that tracks the fact that a RECLAIM_COMPLETE needs
> to be issued when all state has been reclaimed, or when
> we're done establishing the session for the first time.
>
> If an error occurs in the main state manager loop, just clear the
> flag. No sense in checking if the flag is set in order to clear it.
> We're not going to issue the RECLAIM_COMPLETE since there's a high
> probability that we had some kind of communication or session problem
> which is s how we ended up in the error case.

This patch looks wrong for two reasons.

1. We only want to call RECLAIM_COMPLETE if we saw a STALE_CLIENTID
error prior to the last attempt to re-establish the client id.
2. We have to call it before we can start the no-grace reclaims.

Looking at the code, I'm not convinced that we need a separate
'RECLAIM_COMPLETE_PENDING' state. It should be pretty much identical to
the existing NFS4CLNT_RECLAIM_REBOOT state.
The only difference is that in the NFSv4.1 case we want to be able to
call RECLAIM_COMPLETE even in the case where we have no state to
reclaim.

Cheers
Trond

2009-12-07 17:51:18

by Labiaga, Ricardo

[permalink] [raw]

Subject: Re: [PATCH 4/5] nfs41: New NFS4CLNT_RECLAIM_COMPLETE_PENDING state

On 12/7/09 6:47 AM, "Trond Myklebust" <[email protected]> wrote:

> On Mon, 2009-12-07 at 00:21 -0800, Ricardo Labiaga wrote:
>> nfs4_state_end_reclaim_reboot() can also be invoked as a
>> result of error processing, so it is not a safe place to
>> invoke RECLAIM_COMPLETE. Instead, create a new state
>> flag that tracks the fact that a RECLAIM_COMPLETE needs
>> to be issued when all state has been reclaimed, or when
>> we're done establishing the session for the first time.
>>
>> If an error occurs in the main state manager loop, just clear the
>> flag. No sense in checking if the flag is set in order to clear it.
>> We're not going to issue the RECLAIM_COMPLETE since there's a high
>> probability that we had some kind of communication or session problem
>> which is s how we ended up in the error case.
>
> This patch looks wrong for two reasons.
>
> 1. We only want to call RECLAIM_COMPLETE if we saw a STALE_CLIENTID
> error prior to the last attempt to re-establish the client id.

Section 18.51.3

"Whenever a client establishes a new client ID and before it does the
first non-reclaim operation that obtains a lock, it MUST send a
RECLAIM_COMPLETE with rca_one_fs set to FALSE, even if there are no
locks to reclaim. If non-reclaim locking operations are done before
the RECLAIM_COMPLETE, an NFS4ERR_GRACE error will be returned."

I interpreted the spec as you did, but I was talked into interpreting the
previous statement as:

"The client doesn't always know if the server has rebooted, so send it a
RECLAIM_COMPLETE after your initial EXCHANGE_ID/ CREATE_SESSION as well and
all will be happy".

Let me send an email to the NFSv4 IETF list to discuss it there.

> 2. We have to call it before we can start the no-grace reclaims.

My oversight. I should have placed the check for the new state under
RECLAIM_REBOOT and before RECLAIM_NOGRACE - provided we determine that we
need to send RECLAIM_COMPLETE even in the case where we didn't see
STALE_CLIENTID

>
> Looking at the code, I'm not convinced that we need a separate
> 'RECLAIM_COMPLETE_PENDING' state. It should be pretty much identical to
> the existing NFS4CLNT_RECLAIM_REBOOT state.
> The only difference is that in the NFSv4.1 case we want to be able to
> call RECLAIM_COMPLETE even in the case where we have no state to
> reclaim.
>

Yes, this would be the case if my interpretation of the spec is incorrect.

- ricardo

> Cheers
> Trond

2009-12-07 18:11:45

by Myklebust, Trond

[permalink] [raw]

Subject: Re: [PATCH 4/5] nfs41: New NFS4CLNT_RECLAIM_COMPLETE_PENDING state

On Mon, 2009-12-07 at 09:51 -0800, Labiaga, Ricardo wrote:
> On 12/7/09 6:47 AM, "Trond Myklebust" <[email protected]> wrote:
>
> > Looking at the code, I'm not convinced that we need a separate
> > 'RECLAIM_COMPLETE_PENDING' state. It should be pretty much identical to
> > the existing NFS4CLNT_RECLAIM_REBOOT state.
> > The only difference is that in the NFSv4.1 case we want to be able to
> > call RECLAIM_COMPLETE even in the case where we have no state to
> > reclaim.
> >
>
> Yes, this would be the case if my interpretation of the spec is incorrect.

I don't see how your interpretation changes anything w.r.t the question
of whether we need a new state or not. You can still set
NFS4CLNT_RECLAIM_REBOOT and get it to do the right thing...

Trond

2009-12-07 21:40:50

by Labiaga, Ricardo

[permalink] [raw]

Subject: RE: [PATCH 4/5] nfs41: New NFS4CLNT_RECLAIM_COMPLETE_PENDING state

> -----Original Message-----
> From: Myklebust, Trond
> Sent: Monday, December 07, 2009 10:12 AM
> To: Labiaga, Ricardo
> Cc: [email protected]
> Subject: Re: [PATCH 4/5] nfs41: New NFS4CLNT_RECLAIM_COMPLETE_PENDING
> state
>
> On Mon, 2009-12-07 at 09:51 -0800, Labiaga, Ricardo wrote:
> > On 12/7/09 6:47 AM, "Trond Myklebust" <[email protected]>
> wrote:
> >
> > > Looking at the code, I'm not convinced that we need a separate
> > > 'RECLAIM_COMPLETE_PENDING' state. It should be pretty much
identical
> to
> > > the existing NFS4CLNT_RECLAIM_REBOOT state.
> > > The only difference is that in the NFSv4.1 case we want to be able
to
> > > call RECLAIM_COMPLETE even in the case where we have no state to
> > > reclaim.
> > >
> >
> > Yes, this would be the case if my interpretation of the spec is
> incorrect.
>
> I don't see how your interpretation changes anything w.r.t the
question
> of whether we need a new state or not. You can still set
> NFS4CLNT_RECLAIM_REBOOT and get it to do the right thing...
>

Agreed, I was trying to avoid calling nfs4_do_reclaim() and friends when
there was no work to do. I agree the optimization does not warrant the
extra state complexity since NFS4CLNT_RECLAIM_REBOOT will indeed do the
job.

I'll make the change after we reach agreement on the NFSv4 IETF list.

- ricardo

2009-12-07 08:27:44

by Labiaga, Ricardo

[permalink] [raw]

Subject: [PATCH 1/5] nfs41: Mark stateids in need of reclaim if state manager gets stale clientid

The state manager was not marking the stateids as needing to be reclaimed
after reestablishing the clientid.

Signed-off-by: Ricardo Labiaga <[email protected]>
---
fs/nfs/nfs4state.c | 12 ++----------
1 files changed, 2 insertions(+), 10 deletions(-)

diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
index d3c9c9e..d741ec6 100644
--- a/fs/nfs/nfs4state.c
+++ b/fs/nfs/nfs4state.c
@@ -1256,14 +1256,6 @@ void nfs41_handle_sequence_flag_errors(struct nfs_client *clp, u32 flags)
nfs_expire_all_delegations(clp);
}

-static void nfs4_session_recovery_handle_error(struct nfs_client *clp, int err)
-{
- switch (err) {
- case -NFS4ERR_STALE_CLIENTID:
- set_bit(NFS4CLNT_LEASE_EXPIRED, &clp->cl_state);
- }
-}
-
static int nfs4_reset_session(struct nfs_client *clp)
{
struct nfs4_session *ses = clp->cl_session;
@@ -1276,14 +1268,14 @@ static int nfs4_reset_session(struct nfs_client *clp)
status = nfs4_proc_destroy_session(clp->cl_session);
if (status && status != -NFS4ERR_BADSESSION &&
status != -NFS4ERR_DEADSESSION) {
- nfs4_session_recovery_handle_error(clp, status);
+ nfs4_recovery_handle_error(clp, status);
goto out;
}

memset(clp->cl_session->sess_id.data, 0, NFS4_MAX_SESSIONID_LEN);
status = nfs4_proc_create_session(clp);
if (status)
- nfs4_session_recovery_handle_error(clp, status);
+ nfs4_recovery_handle_error(clp, status);
out:
/*
* Let the state manager reestablish state
--
1.5.4.3

2009-12-07 08:27:44

by Labiaga, Ricardo

[permalink] [raw]

Subject: [PATCH 2/5] nfs41: Handle session errors during delegation return

Add session error handling to nfs4_open_delegation_recall()

Signed-off-by: Ricardo Labiaga <[email protected]>
---
fs/nfs/nfs4proc.c | 8 ++++++++
1 files changed, 8 insertions(+), 0 deletions(-)

diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index fbae2c9..6a8861c 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -1179,6 +1179,14 @@ int nfs4_open_delegation_recall(struct nfs_open_context *ctx, struct nfs4_state
case -ENOENT:
case -ESTALE:
goto out;
+ case -NFS4ERR_BADSESSION:
+ case -NFS4ERR_BADSLOT:
+ case -NFS4ERR_BAD_HIGH_SLOT:
+ case -NFS4ERR_CONN_NOT_BOUND_TO_SESSION:
+ case -NFS4ERR_DEADSESSION:
+ nfs4_schedule_state_recovery(
+ server->nfs_client);
+ goto out;
case -NFS4ERR_STALE_CLIENTID:
case -NFS4ERR_STALE_STATEID:
case -NFS4ERR_EXPIRED:
--
1.5.4.3

2009-12-07 08:27:45

by Labiaga, Ricardo

[permalink] [raw]

Subject: [PATCH 4/5] nfs41: New NFS4CLNT_RECLAIM_COMPLETE_PENDING state

nfs4_state_end_reclaim_reboot() can also be invoked as a
result of error processing, so it is not a safe place to
invoke RECLAIM_COMPLETE. Instead, create a new state
flag that tracks the fact that a RECLAIM_COMPLETE needs
to be issued when all state has been reclaimed, or when
we're done establishing the session for the first time.

If an error occurs in the main state manager loop, just clear the
flag. No sense in checking if the flag is set in order to clear it.
We're not going to issue the RECLAIM_COMPLETE since there's a high
probability that we had some kind of communication or session problem
which is s how we ended up in the error case.

Signed-off-by: Ricardo Labiaga <[email protected]>
---
fs/nfs/nfs4_fs.h | 1 +
fs/nfs/nfs4state.c | 12 +++++++++---
2 files changed, 10 insertions(+), 3 deletions(-)

diff --git a/fs/nfs/nfs4_fs.h b/fs/nfs/nfs4_fs.h
index 19f04cc..9962e28 100644
--- a/fs/nfs/nfs4_fs.h
+++ b/fs/nfs/nfs4_fs.h
@@ -46,6 +46,7 @@ enum nfs4_client_state {
NFS4CLNT_DELEGRETURN,
NFS4CLNT_SESSION_RESET,
NFS4CLNT_SESSION_DRAINING,
+ NFS4CLNT_RECLAIM_COMPLETE_PENDING,
};

/*
diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
index d741ec6..3bb43df 100644
--- a/fs/nfs/nfs4state.c
+++ b/fs/nfs/nfs4state.c
@@ -1096,9 +1096,6 @@ static void nfs4_state_end_reclaim_reboot(struct nfs_client *clp)
struct rb_node *pos;
struct nfs4_state *state;

- nfs4_reclaim_complete(clp,
- nfs4_reboot_recovery_ops[clp->cl_minorversion]);
-
if (!test_and_clear_bit(NFS4CLNT_RECLAIM_REBOOT, &clp->cl_state))
return;

@@ -1335,6 +1332,8 @@ static void nfs4_state_manager(struct nfs_client *clp)
goto out_error;
}
clear_bit(NFS4CLNT_CHECK_LEASE, &clp->cl_state);
+ set_bit(NFS4CLNT_RECLAIM_COMPLETE_PENDING,
+ &clp->cl_state);
}

if (test_and_clear_bit(NFS4CLNT_CHECK_LEASE, &clp->cl_state)) {
@@ -1386,6 +1385,12 @@ static void nfs4_state_manager(struct nfs_client *clp)
continue;
}

+ if (test_and_clear_bit(NFS4CLNT_RECLAIM_COMPLETE_PENDING,
+ &clp->cl_state)) {
+ nfs4_reclaim_complete(clp,
+ nfs4_reboot_recovery_ops[clp->cl_minorversion]);
+ }
+
nfs4_clear_state_manager_bit(clp);
/* Did we race with an attempt to give us more work? */
if (clp->cl_state == 0)
@@ -1397,6 +1402,7 @@ static void nfs4_state_manager(struct nfs_client *clp)
out_error:
printk(KERN_WARNING "Error: state manager failed on NFSv4 server %s"
" with error %d\n", clp->cl_hostname, -status);
+ clear_bit(NFS4CLNT_RECLAIM_COMPLETE_PENDING, &clp->cl_state);
nfs4_clear_state_manager_bit(clp);
}

--
1.5.4.3

2009-12-07 08:27:45

by Labiaga, Ricardo

[permalink] [raw]

Subject: [PATCH 5/5] nfs41: Handle NFSv4.1 session errors in the delegation recall code

Signed-off-by: Ricardo Labiaga <[email protected]>
---
fs/nfs/nfs4proc.c | 5 +++++
1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 53b0a7d..a286d70 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -4191,6 +4191,11 @@ int nfs4_lock_delegation_recall(struct nfs4_state *state, struct file_lock *fl)
case -NFS4ERR_EXPIRED:
case -NFS4ERR_STALE_CLIENTID:
case -NFS4ERR_STALE_STATEID:
+ case -NFS4ERR_BADSESSION:
+ case -NFS4ERR_BADSLOT:
+ case -NFS4ERR_BAD_HIGH_SLOT:
+ case -NFS4ERR_CONN_NOT_BOUND_TO_SESSION:
+ case -NFS4ERR_DEADSESSION:
nfs4_schedule_state_recovery(server->nfs_client);
goto out;
case -ERESTARTSYS:
--
1.5.4.3

2009-12-07 08:27:44

by Labiaga, Ricardo

[permalink] [raw]

Subject: [PATCH 3/5] nfs41: Retry delegation return if it failed with session error

Update nfs4_delegreturn_done() to retry the operation after setting the
NFS4CLNT_SESSION_SETUP bit to indicate the need to reset the session.

Signed-off-by: Ricardo Labiaga <[email protected]>
---
fs/nfs/nfs4proc.c | 15 +++++++++++++--
1 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 6a8861c..53b0a7d 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -3490,9 +3490,20 @@ static void nfs4_delegreturn_done(struct rpc_task *task, void *calldata)
nfs4_sequence_done(data->res.server, &data->res.seq_res,
task->tk_status);

- data->rpc_status = task->tk_status;
- if (data->rpc_status == 0)
+ switch (task->tk_status) {
+ case -NFS4ERR_STALE_STATEID:
+ case -NFS4ERR_EXPIRED:
+ case 0:
renew_lease(data->res.server, data->timestamp);
+ break;
+ default:
+ if (nfs4_async_handle_error(task, data->res.server, NULL) ==
+ -EAGAIN) {
+ nfs4_restart_rpc(task, data->res.server->nfs_client);
+ return;
+ }
+ }
+ data->rpc_status = task->tk_status;
}

static void nfs4_delegreturn_release(void *calldata)
--
1.5.4.3