2010-10-01 11:30:48

by Sachin Prabhu

[permalink] [raw]
Subject: NFS4 clients cannot reclaim locks

NFS4 clients appear to have problems reclaiming locks after a server reboot. I can recreate the issue on 2.6.34.7-56.fc13.x86_64 on a Fedora system.

The problem appears to happen in cases where after a reboot, a WRITE call is made just before the RENEW call. In that case, the NFS4ERR_STALE_STATEID is returned for the WRITE call which results in NFS_STATE_RECLAIM_REBOOT being set in the state flags. However the NFS4ERR_STALE_CLIENTID returned for the subsequent RENEW call is handled by
nfs4_recovery_handle_error() -> nfs4_state_end_reclaim_reboot(clp);
which ends up setting the state flag to NFS_STATE_RECLAIM_NOGRACE and clearing the NFS_STATE_RECLAIM_REBOOT in nfs4_state_mark_reclaim_nograce().

The process of reclaiming the locks then seem to hit another roadblock in nfs4_open_expired() where it fails to open the file and reset the state. It ends up calling nfs4_reclaim_locks() in a loop with the old stateid in nfs4_reclaim_open_state().

By commenting out the call to nfs4_state_end_reclaim_reboot(clp) in nfs4_recovery_handle_error(), the client was able to handle this particular scenario properly.

Has any one else seen this issue?

Sachin Prabhu


2010-10-05 15:28:24

by Timo Aaltonen

[permalink] [raw]
Subject: Re: NFS4 clients cannot reclaim locks

On Fri, 1 Oct 2010, Sachin Prabhu wrote:

> NFS4 clients appear to have problems reclaiming locks after a server reboot. I can recreate the issue on 2.6.34.7-56.fc13.x86_64 on a Fedora system.
>
> The problem appears to happen in cases where after a reboot, a WRITE call is made just before the RENEW call. In that case, the NFS4ERR_STALE_STATEID is returned for the WRITE call which results in NFS_STATE_RECLAIM_REBOOT being set in the state flags. However the NFS4ERR_STALE_CLIENTID returned for the subsequent RENEW call is handled by
> nfs4_recovery_handle_error() -> nfs4_state_end_reclaim_reboot(clp);
> which ends up setting the state flag to NFS_STATE_RECLAIM_NOGRACE and clearing the NFS_STATE_RECLAIM_REBOOT in nfs4_state_mark_reclaim_nograce().
>
> The process of reclaiming the locks then seem to hit another roadblock in nfs4_open_expired() where it fails to open the file and reset the state. It ends up calling nfs4_reclaim_locks() in a loop with the old stateid in nfs4_reclaim_open_state().
>
> By commenting out the call to nfs4_state_end_reclaim_reboot(clp) in nfs4_recovery_handle_error(), the client was able to handle this particular scenario properly.
>
> Has any one else seen this issue?

could this be related to the bug I was seeing with nfsv4 (now using v3
with success):

https://bugzilla.kernel.org/show_bug.cgi?id=15973

though the error returned by the server is BAD_STATEID..


--
Timo Aaltonen
Systems Specialist, Aalto IT

2010-10-01 20:46:42

by Myklebust, Trond

[permalink] [raw]
Subject: Re: NFS4 clients cannot reclaim locks

On Fri, 2010-10-01 at 07:30 -0400, Sachin Prabhu wrote:
> NFS4 clients appear to have problems reclaiming locks after a server reboot. I can recreate the issue on 2.6.34.7-56.fc13.x86_64 on a Fedora system.
>
> The problem appears to happen in cases where after a reboot, a WRITE call is made just before the RENEW call. In that case, the NFS4ERR_STALE_STATEID is returned for the WRITE call which results in NFS_STATE_RECLAIM_REBOOT being set in the state flags. However the NFS4ERR_STALE_CLIENTID returned for the subsequent RENEW call is handled by
> nfs4_recovery_handle_error() -> nfs4_state_end_reclaim_reboot(clp);
> which ends up setting the state flag to NFS_STATE_RECLAIM_NOGRACE and clearing the NFS_STATE_RECLAIM_REBOOT in nfs4_state_mark_reclaim_nograce().

Yup. I don't think we should call nfs4_state_mark_reclaim_reboot() here.

> The process of reclaiming the locks then seem to hit another roadblock in nfs4_open_expired() where it fails to open the file and reset the state. It ends up calling nfs4_reclaim_locks() in a loop with the old stateid in nfs4_reclaim_open_state().

Any idea how nfs4_open_expired() is failing? It seems that if it does,
we should see an error, which would cause the lock reclaim to fail.

Also, why is the call to nfs4_reclaim_locks() looping? That too should
exit in case of an error.

> By commenting out the call to nfs4_state_end_reclaim_reboot(clp) in nfs4_recovery_handle_error(), the client was able to handle this particular scenario properly.

We do need to keep the nfs4_state_end_reclaim_reboot() there. Otherwise,
we have a problem if the server reboots again while we're in the middle
of reclaiming state.

Cheers
Trond

2010-10-06 15:59:46

by Sachin Prabhu

[permalink] [raw]
Subject: Re: NFS4 clients cannot reclaim locks


----- "Trond Myklebust" <[email protected]> wrote:

> Yup. That makes sense. Does the following patch help?
>
> Cheers
> Trond
> --------------------------------------------------------------------------------------------------------
> NFSv4: Fix open recovery
>
> From: Trond Myklebust <[email protected]>
>
> NFSv4 open recovery is currently broken: since we do not clear the
> state->flags states before attempting recovery, we end up with the
> 'can_open_cached()' function triggering. This again leads to no OPEN
> call
> being put on the wire.
>
> Reported-by: Sachin Prabhu <[email protected]>
> Signed-off-by: Trond Myklebust <[email protected]>
> ---
>
> fs/nfs/nfs4proc.c | 3 +++
> 1 files changed, 3 insertions(+), 0 deletions(-)
>
>
> diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
> index 089da5b..01b4817 100644
> --- a/fs/nfs/nfs4proc.c
> +++ b/fs/nfs/nfs4proc.c
> @@ -1120,6 +1120,7 @@ static int nfs4_open_recover(struct
> nfs4_opendata *opendata, struct nfs4_state *
> clear_bit(NFS_DELEGATED_STATE, &state->flags);
> smp_rmb();
> if (state->n_rdwr != 0) {
> + clear_bit(NFS_O_RDWR_STATE, &state->flags);
> ret = nfs4_open_recover_helper(opendata, FMODE_READ|FMODE_WRITE,
> &newstate);
> if (ret != 0)
> return ret;
> @@ -1127,6 +1128,7 @@ static int nfs4_open_recover(struct
> nfs4_opendata *opendata, struct nfs4_state *
> return -ESTALE;
> }
> if (state->n_wronly != 0) {
> + clear_bit(NFS_O_WRONLY_STATE, &state->flags);
> ret = nfs4_open_recover_helper(opendata, FMODE_WRITE, &newstate);
> if (ret != 0)
> return ret;
> @@ -1134,6 +1136,7 @@ static int nfs4_open_recover(struct
> nfs4_opendata *opendata, struct nfs4_state *
> return -ESTALE;
> }
> if (state->n_rdonly != 0) {
> + clear_bit(NFS_O_RDONLY_STATE, &state->flags);
> ret = nfs4_open_recover_helper(opendata, FMODE_READ, &newstate);
> if (ret != 0)
> return ret;
>


Yes. The patch works.

As expected, repeated open calls are made with claim-type set to NULL. For each of these calls, a NFS4ERR_GRACE is returned by the server as long as it is in Grace period. Once the grace period has completed, the open call succeeds, a new stateid is set and the write operation continues.

Thank You
Sachin Prabhu

2010-11-22 16:14:55

by Timo Aaltonen

[permalink] [raw]
Subject: Re: NFS4 clients cannot reclaim locks

On Tue, 5 Oct 2010, Timo Aaltonen wrote:

> On Fri, 1 Oct 2010, Sachin Prabhu wrote:
>
>> NFS4 clients appear to have problems reclaiming locks after a server
>> reboot. I can recreate the issue on 2.6.34.7-56.fc13.x86_64 on a Fedora
>> system.
>>
>> The problem appears to happen in cases where after a reboot, a WRITE call
>> is made just before the RENEW call. In that case, the NFS4ERR_STALE_STATEID
>> is returned for the WRITE call which results in NFS_STATE_RECLAIM_REBOOT
>> being set in the state flags. However the NFS4ERR_STALE_CLIENTID returned
>> for the subsequent RENEW call is handled by
>> nfs4_recovery_handle_error() -> nfs4_state_end_reclaim_reboot(clp);
>> which ends up setting the state flag to NFS_STATE_RECLAIM_NOGRACE and
>> clearing the NFS_STATE_RECLAIM_REBOOT in nfs4_state_mark_reclaim_nograce().
>>
>> The process of reclaiming the locks then seem to hit another roadblock in
>> nfs4_open_expired() where it fails to open the file and reset the state. It
>> ends up calling nfs4_reclaim_locks() in a loop with the old stateid in
>> nfs4_reclaim_open_state().
>>
>> By commenting out the call to nfs4_state_end_reclaim_reboot(clp) in
>> nfs4_recovery_handle_error(), the client was able to handle this particular
>> scenario properly.
>>
>> Has any one else seen this issue?
>
> could this be related to the bug I was seeing with nfsv4 (now using v3 with
> success):
>
> https://bugzilla.kernel.org/show_bug.cgi?id=15973
>
> though the error returned by the server is BAD_STATEID..

At least testing .37rc2 has so far been positive, suggesting that the bug
is fixed there.


--
Timo Aaltonen
Systems Specialist, Aalto IT