2018-05-09 12:03:22

by Andrew W Elble

[permalink] [raw]
Subject: [PATCH v2] nfsd: fix error handling in nfs4_set_delegation()

I noticed a memory corruption crash in nfsd in
4.17-rc1. This patch corrects the issue.

Fix to return error if the delegation couldn't be hashed or there was
a recall in progress. Use the existing error path instead of
destroy_unhashed_delegation() for readability. Set the fields of the
delegation to indicate that it does not need to be recalled.

Signed-off-by: Andrew Elble <[email protected]>
Fixes: 353601e7d323c ("nfsd: create a separate lease for each delegation")
---
v2: typo in changelog, set delegation recall-suppression
fs/nfsd/nfs4state.c | 16 +++++++++++++++-
1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index 71b87738c015..20463944cd61 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -4372,12 +4372,26 @@ static struct file_lock *nfs4_alloc_init_lease(struct nfs4_delegation *dp,
status = -EAGAIN;
else
status = hash_delegation_locked(dp, fp);
+ /*
+ * This delegation is doomed, tell the recall logic
+ * that it's being destroyed here.
+ */
+
+ if (status) {
+ dp->dl_time++;
+ list_del_init(&dp->dl_recall_lru);
+ dp->dl_stid.sc_type = NFS4_CLOSED_DELEG_STID;
+ }
spin_unlock(&fp->fi_lock);
spin_unlock(&state_lock);

if (status)
- destroy_unhashed_deleg(dp);
+ goto out_unlock;
+
return dp;
+
+out_unlock:
+ vfs_setlease(fp->fi_deleg_file, F_UNLCK, NULL, (void **)&dp);
out_clnt_odstate:
put_clnt_odstate(dp->dl_clnt_odstate);
out_stid:
--
1.8.3.1



2018-05-11 21:30:06

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH v2] nfsd: fix error handling in nfs4_set_delegation()

On Wed, May 09, 2018 at 08:02:49AM -0400, Andrew Elble wrote:
> I noticed a memory corruption crash in nfsd in
> 4.17-rc1. This patch corrects the issue.
>
> Fix to return error if the delegation couldn't be hashed or there was
> a recall in progress. Use the existing error path instead of
> destroy_unhashed_delegation() for readability. Set the fields of the
> delegation to indicate that it does not need to be recalled.
>
> Signed-off-by: Andrew Elble <[email protected]>
> Fixes: 353601e7d323c ("nfsd: create a separate lease for each delegation")
> ---
> v2: typo in changelog, set delegation recall-suppression
> fs/nfsd/nfs4state.c | 16 +++++++++++++++-
> 1 file changed, 15 insertions(+), 1 deletion(-)
>
> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> index 71b87738c015..20463944cd61 100644
> --- a/fs/nfsd/nfs4state.c
> +++ b/fs/nfsd/nfs4state.c
> @@ -4372,12 +4372,26 @@ static struct file_lock *nfs4_alloc_init_lease(struct nfs4_delegation *dp,
> status = -EAGAIN;
> else
> status = hash_delegation_locked(dp, fp);
> + /*
> + * This delegation is doomed, tell the recall logic
> + * that it's being destroyed here.
> + */
> +
> + if (status) {
> + dp->dl_time++;
> + list_del_init(&dp->dl_recall_lru);
> + dp->dl_stid.sc_type = NFS4_CLOSED_DELEG_STID;
> + }

I'm trying to figure out if this fixes an actual bug. The code should
be able to deal with a callback on an already unhashed delegation, so I
think you're right it would at worst just be an unnecessary recall.
This won't catch every such case (could be that nfsd4_cb_recall_prepare
already ran and we're too late), so I wonder if this is worth it.

More interesting to me is what exactly it would take to hit this
case.... Another thread would have to have succesfully hashed a
delegation for this client and file to make our hash_delegation_locked
fail. So there would be two leases for the same file and client, but
with different delegation pointers as the fl_owner. I *think* we handle
that OK. But it was likely problematic previously when we were still
using the file pointer as the fl_owner.

--b.

> spin_unlock(&fp->fi_lock);
> spin_unlock(&state_lock);
>
> if (status)
> - destroy_unhashed_deleg(dp);
> + goto out_unlock;
> +
> return dp;
> +
> +out_unlock:
> + vfs_setlease(fp->fi_deleg_file, F_UNLCK, NULL, (void **)&dp);
> out_clnt_odstate:
> put_clnt_odstate(dp->dl_clnt_odstate);
> out_stid:
> --
> 1.8.3.1

2018-05-14 11:31:53

by Andrew W Elble

[permalink] [raw]
Subject: Re: [PATCH v2] nfsd: fix error handling in nfs4_set_delegation()


>> + /*
>> + * This delegation is doomed, tell the recall logic
>> + * that it's being destroyed here.
>> + */
>> +
>> + if (status) {
>> + dp->dl_time++;
>> + list_del_init(&dp->dl_recall_lru);
>> + dp->dl_stid.sc_type = NFS4_CLOSED_DELEG_STID;
>> + }
>
> I'm trying to figure out if this fixes an actual bug. The code should
> be able to deal with a callback on an already unhashed delegation, so I
> think you're right it would at worst just be an unnecessary recall.

But an 'normal' unhashed delegation would have a persistent refcount,
this one would not. If the recall code gets a hold of it, it will
place it on nn->del_recall_lru, and then free it in nfsd4_cb_recall_release()?

> This won't catch every such case (could be that nfsd4_cb_recall_prepare
> already ran and we're too late), so I wonder if this is worth it.
>
> More interesting to me is what exactly it would take to hit this
> case.... Another thread would have to have succesfully hashed a
> delegation for this client and file to make our hash_delegation_locked
> fail. So there would be two leases for the same file and client, but
> with different delegation pointers as the fl_owner. I *think* we handle
> that OK. But it was likely problematic previously when we were still
> using the file pointer as the fl_owner.

I'm thinking this is more easily hit via fp->fi_had_conflict, if a lease
break comes in at the right time?

Thanks,

Andy

--
Andrew W. Elble
[email protected]
Infrastructure Engineer, Communications Technical Lead
Rochester Institute of Technology
PGP: BFAD 8461 4CCF DC95 DA2C B0EB 965B 082E 863E C912

2018-05-14 15:45:51

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH v2] nfsd: fix error handling in nfs4_set_delegation()

On Mon, May 14, 2018 at 07:31:53AM -0400, Andrew W Elble wrote:
>
> >> + /*
> >> + * This delegation is doomed, tell the recall logic
> >> + * that it's being destroyed here.
> >> + */
> >> +
> >> + if (status) {
> >> + dp->dl_time++;
> >> + list_del_init(&dp->dl_recall_lru);
> >> + dp->dl_stid.sc_type = NFS4_CLOSED_DELEG_STID;
> >> + }
> >
> > I'm trying to figure out if this fixes an actual bug. The code should
> > be able to deal with a callback on an already unhashed delegation, so I
> > think you're right it would at worst just be an unnecessary recall.
>
> But an 'normal' unhashed delegation would have a persistent refcount,
> this one would not. If the recall code gets a hold of it, it will
> place it on nn->del_recall_lru, and then free it in nfsd4_cb_recall_release()?

Sounds right. Do you see any bug there?

> > This won't catch every such case (could be that nfsd4_cb_recall_prepare
> > already ran and we're too late), so I wonder if this is worth it.
> >
> > More interesting to me is what exactly it would take to hit this
> > case.... Another thread would have to have succesfully hashed a
> > delegation for this client and file to make our hash_delegation_locked
> > fail. So there would be two leases for the same file and client, but
> > with different delegation pointers as the fl_owner. I *think* we handle
> > that OK. But it was likely problematic previously when we were still
> > using the file pointer as the fl_owner.
>
> I'm thinking this is more easily hit via fp->fi_had_conflict, if a lease
> break comes in at the right time?

Could be.

--b.

2018-05-23 12:31:07

by Andrew W Elble

[permalink] [raw]
Subject: Re: [PATCH v2] nfsd: fix error handling in nfs4_set_delegation()


>> But an 'normal' unhashed delegation would have a persistent refcount,
>> this one would not. If the recall code gets a hold of it, it will
>> place it on nn->del_recall_lru, and then free it in nfsd4_cb_recall_release()?
>
> Sounds right. Do you see any bug there?

I'd expect it to crash in the laundromat. Unfortunately the larger test rig
that I'd normally use to try to hit this is occupied doing other things
- I may have to come back to this later.

Thanks,

Andy