2023-07-01 12:13:23

by Benjamin Coddington

[permalink] [raw]
Subject: [PATCH v3] NFSv4: Fix dropped lock for racing OPEN and delegation return

Commmit f5ea16137a3f ("NFSv4: Retry LOCK on OLD_STATEID during delegation
return") attempted to solve this problem by using nfs4's generic async error
handling, but introduced a regression where v4.0 lock recovery would hang.
The additional complexity introduced by overloading that error handling is
not necessary for this case. This patch expects that commit to be
reverted.

The problem as originally explained in the above commit is:

There's a small window where a LOCK sent during a delegation return can
race with another OPEN on client, but the open stateid has not yet been
updated. In this case, the client doesn't handle the OLD_STATEID error
from the server and will lose this lock, emitting:
"NFS: nfs4_handle_delegation_recall_error: unhandled error -10024".

Fix this by using the old_stateid refresh helpers if the server replies
with OLD_STATEID.

Suggested-by: Trond Myklebust <[email protected]>
Signed-off-by: Benjamin Coddington <[email protected]>
---
fs/nfs/nfs4proc.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 6bb14f6cfbc0..f350f41e1967 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -7180,8 +7180,15 @@ static void nfs4_lock_done(struct rpc_task *task, void *calldata)
} else if (!nfs4_update_lock_stateid(lsp, &data->res.stateid))
goto out_restart;
break;
- case -NFS4ERR_BAD_STATEID:
case -NFS4ERR_OLD_STATEID:
+ if (data->arg.new_lock_owner != 0 &&
+ nfs4_refresh_open_old_stateid(&data->arg.open_stateid,
+ lsp->ls_state))
+ goto out_restart;
+ else if (nfs4_refresh_lock_old_stateid(&data->arg.lock_stateid, lsp))
+ goto out_restart;
+ fallthrough;
+ case -NFS4ERR_BAD_STATEID:
case -NFS4ERR_STALE_STATEID:
case -NFS4ERR_EXPIRED:
if (data->arg.new_lock_owner != 0) {
--
2.40.1



2023-10-04 18:56:04

by Olga Kornievskaia

[permalink] [raw]
Subject: Re: [PATCH v3] NFSv4: Fix dropped lock for racing OPEN and delegation return

Hi Trond/Ben,

Did this ever go to stable? I don't know if I missed a mail from Greg
that it was picked up or it never got picked up because it wasn't
marked for stable?

Thank you.

On Sat, Jul 1, 2023 at 8:13 AM Benjamin Coddington <[email protected]> wrote:
>
> Commmit f5ea16137a3f ("NFSv4: Retry LOCK on OLD_STATEID during delegation
> return") attempted to solve this problem by using nfs4's generic async error
> handling, but introduced a regression where v4.0 lock recovery would hang.
> The additional complexity introduced by overloading that error handling is
> not necessary for this case. This patch expects that commit to be
> reverted.
>
> The problem as originally explained in the above commit is:
>
> There's a small window where a LOCK sent during a delegation return can
> race with another OPEN on client, but the open stateid has not yet been
> updated. In this case, the client doesn't handle the OLD_STATEID error
> from the server and will lose this lock, emitting:
> "NFS: nfs4_handle_delegation_recall_error: unhandled error -10024".
>
> Fix this by using the old_stateid refresh helpers if the server replies
> with OLD_STATEID.
>
> Suggested-by: Trond Myklebust <[email protected]>
> Signed-off-by: Benjamin Coddington <[email protected]>
> ---
> fs/nfs/nfs4proc.c | 9 ++++++++-
> 1 file changed, 8 insertions(+), 1 deletion(-)
>
> diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
> index 6bb14f6cfbc0..f350f41e1967 100644
> --- a/fs/nfs/nfs4proc.c
> +++ b/fs/nfs/nfs4proc.c
> @@ -7180,8 +7180,15 @@ static void nfs4_lock_done(struct rpc_task *task, void *calldata)
> } else if (!nfs4_update_lock_stateid(lsp, &data->res.stateid))
> goto out_restart;
> break;
> - case -NFS4ERR_BAD_STATEID:
> case -NFS4ERR_OLD_STATEID:
> + if (data->arg.new_lock_owner != 0 &&
> + nfs4_refresh_open_old_stateid(&data->arg.open_stateid,
> + lsp->ls_state))
> + goto out_restart;
> + else if (nfs4_refresh_lock_old_stateid(&data->arg.lock_stateid, lsp))
> + goto out_restart;
> + fallthrough;
> + case -NFS4ERR_BAD_STATEID:
> case -NFS4ERR_STALE_STATEID:
> case -NFS4ERR_EXPIRED:
> if (data->arg.new_lock_owner != 0) {
> --
> 2.40.1
>

2023-10-04 19:02:48

by Olga Kornievskaia

[permalink] [raw]
Subject: Re: [PATCH v3] NFSv4: Fix dropped lock for racing OPEN and delegation return

Sorry, I didn't mean this patch. I meant the revert patch.

On Wed, Oct 4, 2023 at 2:53 PM Olga Kornievskaia <[email protected]> wrote:
>
> Hi Trond/Ben,
>
> Did this ever go to stable? I don't know if I missed a mail from Greg
> that it was picked up or it never got picked up because it wasn't
> marked for stable?
>
> Thank you.
>
> On Sat, Jul 1, 2023 at 8:13 AM Benjamin Coddington <[email protected]> wrote:
> >
> > Commmit f5ea16137a3f ("NFSv4: Retry LOCK on OLD_STATEID during delegation
> > return") attempted to solve this problem by using nfs4's generic async error
> > handling, but introduced a regression where v4.0 lock recovery would hang.
> > The additional complexity introduced by overloading that error handling is
> > not necessary for this case. This patch expects that commit to be
> > reverted.
> >
> > The problem as originally explained in the above commit is:
> >
> > There's a small window where a LOCK sent during a delegation return can
> > race with another OPEN on client, but the open stateid has not yet been
> > updated. In this case, the client doesn't handle the OLD_STATEID error
> > from the server and will lose this lock, emitting:
> > "NFS: nfs4_handle_delegation_recall_error: unhandled error -10024".
> >
> > Fix this by using the old_stateid refresh helpers if the server replies
> > with OLD_STATEID.
> >
> > Suggested-by: Trond Myklebust <[email protected]>
> > Signed-off-by: Benjamin Coddington <[email protected]>
> > ---
> > fs/nfs/nfs4proc.c | 9 ++++++++-
> > 1 file changed, 8 insertions(+), 1 deletion(-)
> >
> > diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
> > index 6bb14f6cfbc0..f350f41e1967 100644
> > --- a/fs/nfs/nfs4proc.c
> > +++ b/fs/nfs/nfs4proc.c
> > @@ -7180,8 +7180,15 @@ static void nfs4_lock_done(struct rpc_task *task, void *calldata)
> > } else if (!nfs4_update_lock_stateid(lsp, &data->res.stateid))
> > goto out_restart;
> > break;
> > - case -NFS4ERR_BAD_STATEID:
> > case -NFS4ERR_OLD_STATEID:
> > + if (data->arg.new_lock_owner != 0 &&
> > + nfs4_refresh_open_old_stateid(&data->arg.open_stateid,
> > + lsp->ls_state))
> > + goto out_restart;
> > + else if (nfs4_refresh_lock_old_stateid(&data->arg.lock_stateid, lsp))
> > + goto out_restart;
> > + fallthrough;
> > + case -NFS4ERR_BAD_STATEID:
> > case -NFS4ERR_STALE_STATEID:
> > case -NFS4ERR_EXPIRED:
> > if (data->arg.new_lock_owner != 0) {
> > --
> > 2.40.1
> >

2023-10-05 18:17:45

by Anna Schumaker

[permalink] [raw]
Subject: Re: [PATCH v3] NFSv4: Fix dropped lock for racing OPEN and delegation return

Hi Olga,

On Wed, Oct 4, 2023 at 2:55 PM Olga Kornievskaia <[email protected]> wrote:
>
> Sorry, I didn't mean this patch. I meant the revert patch.
>
> On Wed, Oct 4, 2023 at 2:53 PM Olga Kornievskaia <[email protected]> wrote:
> >
> > Hi Trond/Ben,
> >
> > Did this ever go to stable? I don't know if I missed a mail from Greg
> > that it was picked up or it never got picked up because it wasn't
> > marked for stable?

Looks like the revert went into 6.5 as commit 5b4a82a0724a. It's not
marked for stable, so it probably wasn't picked up.

Anna

> >
> > Thank you.
> >
> > On Sat, Jul 1, 2023 at 8:13 AM Benjamin Coddington <[email protected]> wrote:
> > >
> > > Commmit f5ea16137a3f ("NFSv4: Retry LOCK on OLD_STATEID during delegation
> > > return") attempted to solve this problem by using nfs4's generic async error
> > > handling, but introduced a regression where v4.0 lock recovery would hang.
> > > The additional complexity introduced by overloading that error handling is
> > > not necessary for this case. This patch expects that commit to be
> > > reverted.
> > >
> > > The problem as originally explained in the above commit is:
> > >
> > > There's a small window where a LOCK sent during a delegation return can
> > > race with another OPEN on client, but the open stateid has not yet been
> > > updated. In this case, the client doesn't handle the OLD_STATEID error
> > > from the server and will lose this lock, emitting:
> > > "NFS: nfs4_handle_delegation_recall_error: unhandled error -10024".
> > >
> > > Fix this by using the old_stateid refresh helpers if the server replies
> > > with OLD_STATEID.
> > >
> > > Suggested-by: Trond Myklebust <[email protected]>
> > > Signed-off-by: Benjamin Coddington <[email protected]>
> > > ---
> > > fs/nfs/nfs4proc.c | 9 ++++++++-
> > > 1 file changed, 8 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
> > > index 6bb14f6cfbc0..f350f41e1967 100644
> > > --- a/fs/nfs/nfs4proc.c
> > > +++ b/fs/nfs/nfs4proc.c
> > > @@ -7180,8 +7180,15 @@ static void nfs4_lock_done(struct rpc_task *task, void *calldata)
> > > } else if (!nfs4_update_lock_stateid(lsp, &data->res.stateid))
> > > goto out_restart;
> > > break;
> > > - case -NFS4ERR_BAD_STATEID:
> > > case -NFS4ERR_OLD_STATEID:
> > > + if (data->arg.new_lock_owner != 0 &&
> > > + nfs4_refresh_open_old_stateid(&data->arg.open_stateid,
> > > + lsp->ls_state))
> > > + goto out_restart;
> > > + else if (nfs4_refresh_lock_old_stateid(&data->arg.lock_stateid, lsp))
> > > + goto out_restart;
> > > + fallthrough;
> > > + case -NFS4ERR_BAD_STATEID:
> > > case -NFS4ERR_STALE_STATEID:
> > > case -NFS4ERR_EXPIRED:
> > > if (data->arg.new_lock_owner != 0) {
> > > --
> > > 2.40.1
> > >

2023-10-09 16:30:43

by Benjamin Coddington

[permalink] [raw]
Subject: Re: [PATCH v3] NFSv4: Fix dropped lock for racing OPEN and delegation return

On 5 Oct 2023, at 14:16, Anna Schumaker wrote:

> Hi Olga,
>
> On Wed, Oct 4, 2023 at 2:55 PM Olga Kornievskaia <[email protected]> wrote:
>>
>> Sorry, I didn't mean this patch. I meant the revert patch.
>>
>> On Wed, Oct 4, 2023 at 2:53 PM Olga Kornievskaia <[email protected]> wrote:
>>>
>>> Hi Trond/Ben,
>>>
>>> Did this ever go to stable? I don't know if I missed a mail from Greg
>>> that it was picked up or it never got picked up because it wasn't
>>> marked for stable?
>
> Looks like the revert went into 6.5 as commit 5b4a82a0724a. It's not
> marked for stable, so it probably wasn't picked up.

Gah, thanks for taking this on and fixing it.

Ben