2018-11-01 17:39:51

by Benjamin Coddington

[permalink] [raw]
Subject: [PATCH] lockd: Show pid of lockd for remote locks

Commit 9d5b86ac13c5 ("fs/locks: Remove fl_nspid and use fs-specific l_pid
for remote locks") specified that the l_pid returned for F_GETLK on a local
file that has a remote lock should be the pid of the lock manager process.
That commit, while updating other filesystems, failed to update lockd, such
that locks created by lockd had their fl_pid set to that of the remote
process holding the lock. Fix that here to be the pid of lockd.

Also, fix the client case so that the returned lock pid is negative, which
indicates a remote lock on a remote file.

Fixes: 9d5b86ac13c5 ("fs/locks: Remove fl_nspid and use fs-specific...")
Cc: [email protected]

Signed-off-by: Benjamin Coddington <[email protected]>
---
fs/lockd/clntproc.c | 2 +-
fs/lockd/xdr.c | 4 ++--
fs/lockd/xdr4.c | 4 ++--
3 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/fs/lockd/clntproc.c b/fs/lockd/clntproc.c
index d20b92f271c2..0a67dd4250e9 100644
--- a/fs/lockd/clntproc.c
+++ b/fs/lockd/clntproc.c
@@ -442,7 +442,7 @@ nlmclnt_test(struct nlm_rqst *req, struct file_lock *fl)
fl->fl_start = req->a_res.lock.fl.fl_start;
fl->fl_end = req->a_res.lock.fl.fl_end;
fl->fl_type = req->a_res.lock.fl.fl_type;
- fl->fl_pid = 0;
+ fl->fl_pid = -req->a_res.lock.fl.fl_pid;
break;
default:
status = nlm_stat_to_errno(req->a_res.status);
diff --git a/fs/lockd/xdr.c b/fs/lockd/xdr.c
index 7147e4aebecc..9846f7e95282 100644
--- a/fs/lockd/xdr.c
+++ b/fs/lockd/xdr.c
@@ -127,7 +127,7 @@ nlm_decode_lock(__be32 *p, struct nlm_lock *lock)

locks_init_lock(fl);
fl->fl_owner = current->files;
- fl->fl_pid = (pid_t)lock->svid;
+ fl->fl_pid = current->tgid;
fl->fl_flags = FL_POSIX;
fl->fl_type = F_RDLCK; /* as good as anything else */
start = ntohl(*p++);
@@ -269,7 +269,7 @@ nlmsvc_decode_shareargs(struct svc_rqst *rqstp, __be32 *p)
memset(lock, 0, sizeof(*lock));
locks_init_lock(&lock->fl);
lock->svid = ~(u32) 0;
- lock->fl.fl_pid = (pid_t)lock->svid;
+ lock->fl.fl_pid = current->tgid;

if (!(p = nlm_decode_cookie(p, &argp->cookie))
|| !(p = xdr_decode_string_inplace(p, &lock->caller,
diff --git a/fs/lockd/xdr4.c b/fs/lockd/xdr4.c
index 7ed9edf9aed4..70154f376695 100644
--- a/fs/lockd/xdr4.c
+++ b/fs/lockd/xdr4.c
@@ -119,7 +119,7 @@ nlm4_decode_lock(__be32 *p, struct nlm_lock *lock)

locks_init_lock(fl);
fl->fl_owner = current->files;
- fl->fl_pid = (pid_t)lock->svid;
+ fl->fl_pid = current->tgid;
fl->fl_flags = FL_POSIX;
fl->fl_type = F_RDLCK; /* as good as anything else */
p = xdr_decode_hyper(p, &start);
@@ -266,7 +266,7 @@ nlm4svc_decode_shareargs(struct svc_rqst *rqstp, __be32 *p)
memset(lock, 0, sizeof(*lock));
locks_init_lock(&lock->fl);
lock->svid = ~(u32) 0;
- lock->fl.fl_pid = (pid_t)lock->svid;
+ lock->fl.fl_pid = current->tgid;

if (!(p = nlm4_decode_cookie(p, &argp->cookie))
|| !(p = xdr_decode_string_inplace(p, &lock->caller,
--
2.14.3



2018-11-02 18:45:18

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH] lockd: Show pid of lockd for remote locks

On Thu, Nov 01, 2018 at 01:39:49PM -0400, Benjamin Coddington wrote:
> Commit 9d5b86ac13c5 ("fs/locks: Remove fl_nspid and use fs-specific l_pid
> for remote locks") specified that the l_pid returned for F_GETLK on a local
> file that has a remote lock should be the pid of the lock manager process.
> That commit, while updating other filesystems, failed to update lockd, such
> that locks created by lockd had their fl_pid set to that of the remote
> process holding the lock. Fix that here to be the pid of lockd.
>
> Also, fix the client case so that the returned lock pid is negative, which
> indicates a remote lock on a remote file.

ACK.

Uh, I guess I'll take this if nobody else speaks up.

--b.

>
> Fixes: 9d5b86ac13c5 ("fs/locks: Remove fl_nspid and use fs-specific...")
> Cc: [email protected]
>
> Signed-off-by: Benjamin Coddington <[email protected]>
> ---
> fs/lockd/clntproc.c | 2 +-
> fs/lockd/xdr.c | 4 ++--
> fs/lockd/xdr4.c | 4 ++--
> 3 files changed, 5 insertions(+), 5 deletions(-)
>
> diff --git a/fs/lockd/clntproc.c b/fs/lockd/clntproc.c
> index d20b92f271c2..0a67dd4250e9 100644
> --- a/fs/lockd/clntproc.c
> +++ b/fs/lockd/clntproc.c
> @@ -442,7 +442,7 @@ nlmclnt_test(struct nlm_rqst *req, struct file_lock *fl)
> fl->fl_start = req->a_res.lock.fl.fl_start;
> fl->fl_end = req->a_res.lock.fl.fl_end;
> fl->fl_type = req->a_res.lock.fl.fl_type;
> - fl->fl_pid = 0;
> + fl->fl_pid = -req->a_res.lock.fl.fl_pid;
> break;
> default:
> status = nlm_stat_to_errno(req->a_res.status);
> diff --git a/fs/lockd/xdr.c b/fs/lockd/xdr.c
> index 7147e4aebecc..9846f7e95282 100644
> --- a/fs/lockd/xdr.c
> +++ b/fs/lockd/xdr.c
> @@ -127,7 +127,7 @@ nlm_decode_lock(__be32 *p, struct nlm_lock *lock)
>
> locks_init_lock(fl);
> fl->fl_owner = current->files;
> - fl->fl_pid = (pid_t)lock->svid;
> + fl->fl_pid = current->tgid;
> fl->fl_flags = FL_POSIX;
> fl->fl_type = F_RDLCK; /* as good as anything else */
> start = ntohl(*p++);
> @@ -269,7 +269,7 @@ nlmsvc_decode_shareargs(struct svc_rqst *rqstp, __be32 *p)
> memset(lock, 0, sizeof(*lock));
> locks_init_lock(&lock->fl);
> lock->svid = ~(u32) 0;
> - lock->fl.fl_pid = (pid_t)lock->svid;
> + lock->fl.fl_pid = current->tgid;
>
> if (!(p = nlm_decode_cookie(p, &argp->cookie))
> || !(p = xdr_decode_string_inplace(p, &lock->caller,
> diff --git a/fs/lockd/xdr4.c b/fs/lockd/xdr4.c
> index 7ed9edf9aed4..70154f376695 100644
> --- a/fs/lockd/xdr4.c
> +++ b/fs/lockd/xdr4.c
> @@ -119,7 +119,7 @@ nlm4_decode_lock(__be32 *p, struct nlm_lock *lock)
>
> locks_init_lock(fl);
> fl->fl_owner = current->files;
> - fl->fl_pid = (pid_t)lock->svid;
> + fl->fl_pid = current->tgid;
> fl->fl_flags = FL_POSIX;
> fl->fl_type = F_RDLCK; /* as good as anything else */
> p = xdr_decode_hyper(p, &start);
> @@ -266,7 +266,7 @@ nlm4svc_decode_shareargs(struct svc_rqst *rqstp, __be32 *p)
> memset(lock, 0, sizeof(*lock));
> locks_init_lock(&lock->fl);
> lock->svid = ~(u32) 0;
> - lock->fl.fl_pid = (pid_t)lock->svid;
> + lock->fl.fl_pid = current->tgid;
>
> if (!(p = nlm4_decode_cookie(p, &argp->cookie))
> || !(p = xdr_decode_string_inplace(p, &lock->caller,
> --
> 2.14.3

2018-12-14 17:50:55

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH] lockd: Show pid of lockd for remote locks

On Fri, Nov 02, 2018 at 02:45:16PM -0400, J. Bruce Fields wrote:
> On Thu, Nov 01, 2018 at 01:39:49PM -0400, Benjamin Coddington wrote:
> > Commit 9d5b86ac13c5 ("fs/locks: Remove fl_nspid and use fs-specific l_pid
> > for remote locks") specified that the l_pid returned for F_GETLK on a local
> > file that has a remote lock should be the pid of the lock manager process.
> > That commit, while updating other filesystems, failed to update lockd, such
> > that locks created by lockd had their fl_pid set to that of the remote
> > process holding the lock. Fix that here to be the pid of lockd.
> >
> > Also, fix the client case so that the returned lock pid is negative, which
> > indicates a remote lock on a remote file.
>
> ACK.
>
> Uh, I guess I'll take this if nobody else speaks up.

Applied for 4.21.--b.

>
> --b.
>
> >
> > Fixes: 9d5b86ac13c5 ("fs/locks: Remove fl_nspid and use fs-specific...")
> > Cc: [email protected]
> >
> > Signed-off-by: Benjamin Coddington <[email protected]>
> > ---
> > fs/lockd/clntproc.c | 2 +-
> > fs/lockd/xdr.c | 4 ++--
> > fs/lockd/xdr4.c | 4 ++--
> > 3 files changed, 5 insertions(+), 5 deletions(-)
> >
> > diff --git a/fs/lockd/clntproc.c b/fs/lockd/clntproc.c
> > index d20b92f271c2..0a67dd4250e9 100644
> > --- a/fs/lockd/clntproc.c
> > +++ b/fs/lockd/clntproc.c
> > @@ -442,7 +442,7 @@ nlmclnt_test(struct nlm_rqst *req, struct file_lock *fl)
> > fl->fl_start = req->a_res.lock.fl.fl_start;
> > fl->fl_end = req->a_res.lock.fl.fl_end;
> > fl->fl_type = req->a_res.lock.fl.fl_type;
> > - fl->fl_pid = 0;
> > + fl->fl_pid = -req->a_res.lock.fl.fl_pid;
> > break;
> > default:
> > status = nlm_stat_to_errno(req->a_res.status);
> > diff --git a/fs/lockd/xdr.c b/fs/lockd/xdr.c
> > index 7147e4aebecc..9846f7e95282 100644
> > --- a/fs/lockd/xdr.c
> > +++ b/fs/lockd/xdr.c
> > @@ -127,7 +127,7 @@ nlm_decode_lock(__be32 *p, struct nlm_lock *lock)
> >
> > locks_init_lock(fl);
> > fl->fl_owner = current->files;
> > - fl->fl_pid = (pid_t)lock->svid;
> > + fl->fl_pid = current->tgid;
> > fl->fl_flags = FL_POSIX;
> > fl->fl_type = F_RDLCK; /* as good as anything else */
> > start = ntohl(*p++);
> > @@ -269,7 +269,7 @@ nlmsvc_decode_shareargs(struct svc_rqst *rqstp, __be32 *p)
> > memset(lock, 0, sizeof(*lock));
> > locks_init_lock(&lock->fl);
> > lock->svid = ~(u32) 0;
> > - lock->fl.fl_pid = (pid_t)lock->svid;
> > + lock->fl.fl_pid = current->tgid;
> >
> > if (!(p = nlm_decode_cookie(p, &argp->cookie))
> > || !(p = xdr_decode_string_inplace(p, &lock->caller,
> > diff --git a/fs/lockd/xdr4.c b/fs/lockd/xdr4.c
> > index 7ed9edf9aed4..70154f376695 100644
> > --- a/fs/lockd/xdr4.c
> > +++ b/fs/lockd/xdr4.c
> > @@ -119,7 +119,7 @@ nlm4_decode_lock(__be32 *p, struct nlm_lock *lock)
> >
> > locks_init_lock(fl);
> > fl->fl_owner = current->files;
> > - fl->fl_pid = (pid_t)lock->svid;
> > + fl->fl_pid = current->tgid;
> > fl->fl_flags = FL_POSIX;
> > fl->fl_type = F_RDLCK; /* as good as anything else */
> > p = xdr_decode_hyper(p, &start);
> > @@ -266,7 +266,7 @@ nlm4svc_decode_shareargs(struct svc_rqst *rqstp, __be32 *p)
> > memset(lock, 0, sizeof(*lock));
> > locks_init_lock(&lock->fl);
> > lock->svid = ~(u32) 0;
> > - lock->fl.fl_pid = (pid_t)lock->svid;
> > + lock->fl.fl_pid = current->tgid;
> >
> > if (!(p = nlm4_decode_cookie(p, &argp->cookie))
> > || !(p = xdr_decode_string_inplace(p, &lock->caller,
> > --
> > 2.14.3

2019-05-17 21:46:04

by Xuewei Zhang

[permalink] [raw]
Subject: Re: Re: [PATCH] lockd: Show pid of lockd for remote locks

> On Fri, Nov 02, 2018 at 02:45:16PM -0400, J. Bruce Fields wrote:
> > On Thu, Nov 01, 2018 at 01:39:49PM -0400, Benjamin Coddington wrote:
> > > Commit 9d5b86ac13c5 ("fs/locks: Remove fl_nspid and use fs-specific l_pid
> > > for remote locks") specified that the l_pid returned for F_GETLK on a local
> > > file that has a remote lock should be the pid of the lock manager process.
> > > That commit, while updating other filesystems, failed to update lockd, such
> > > that locks created by lockd had their fl_pid set to that of the remote
> > > process holding the lock. Fix that here to be the pid of lockd.
> > >
> > > Also, fix the client case so that the returned lock pid is negative, which
> > > indicates a remote lock on a remote file.

Seems this patch introduced a bug in how lock protocol handles
GRANTED_MSG in nfs.

For example under this scenario:
1. Client1 takes an exclusive lock on a file and holds it
(client1 sends LOCK request, and server replies with success)
2. Client2 attempts to take an exclusive lock on the same file and gets blocked
(client2 sends LOCK request, and server replies with NLM_BLOCKED)
3. Client1 release the lock
(client1 sends UNLOCK request, and server replies with success)
4. Server grant the lock to Client
(server sends GRANTED_MSG to clients)
5. Client2 looks at all currently blocked locks and see if the
fl_blocked->fl_u.nfs_fl.owner->pid matches lock->svid [1]
6. The owner->pid and lock->svid *should* match, and Client2 will then
take the lock.

Looking closely of how the matching at step 5 is implemented:
https://github.com/torvalds/linux/blob/e93c9c99a629c61837d5a7fc2120cd2b6c70dbdd/fs/lockd/clntlock.c#L186
The comment says that the owner->pid should be the lockowner's PID, rather
than lock manager process(lockd)'s PID.

This patch b8eee0e90f97 ("lockd: Show pid of lockd for remote locks") changed
the behavior of lockd so that it sends the PID of lockd instead of lockowner's
PID at lock->svid, which results in the bug in the lock protocol.

The bug can be reproduced by opening two nfs clients on /nfs. And do this:
1. On client1: sudo flock /nfs/lock -c read
2. On client2: sudo flock /nfs/lock -c date
3. Now quit client 1 by ctrl-D.

You will see that client2 will take ~30 seconds before taking the lock (this is
because the GRANTED_MSG got discarded by client2 since svid doesn't match.
And then client2 retries after 30 seconds). The expected behavior is client2
should take the lock immediately.

On a kernel built with this b8eee0e90f97 ("lockd: Show pid of lockd for remote
locks") reverted, it shows the expected behavior.

I suspect 9d5b86ac13c5 ("fs/locks: Remove fl_nspid and use fs-specific l_pid for
remote locks") probably chose to not update lockd for this reason.

Should we consider maybe reverting or fixing b8eee0e90f97 ("lockd:
Show pid of lockd
for remote locks")?

Thanks!
Xuewei

[1] https://github.com/torvalds/linux/blob/e93c9c99a629c61837d5a7fc2120cd2b6c70dbdd/fs/lockd/clntlock.c#L186

> >
> > ACK.
> >
> > Uh, I guess I'll take this if nobody else speaks up.
>
> Applied for 4.21.--b.
>
> >
> > --b.
> >
> > >
> > > Fixes: 9d5b86ac13c5 ("fs/locks: Remove fl_nspid and use fs-specific...")
> > > Cc: [email protected]
> > >
> > > Signed-off-by: Benjamin Coddington <[email protected]>
> > > ---
> > > fs/lockd/clntproc.c | 2 +-
> > > fs/lockd/xdr.c | 4 ++--
> > > fs/lockd/xdr4.c | 4 ++--
> > > 3 files changed, 5 insertions(+), 5 deletions(-)
> > >
> > > diff --git a/fs/lockd/clntproc.c b/fs/lockd/clntproc.c
> > > index d20b92f271c2..0a67dd4250e9 100644
> > > --- a/fs/lockd/clntproc.c
> > > +++ b/fs/lockd/clntproc.c
> > > @@ -442,7 +442,7 @@ nlmclnt_test(struct nlm_rqst *req, struct file_lock *fl)
> > > fl->fl_start = req->a_res.lock.fl.fl_start;
> > > fl->fl_end = req->a_res.lock.fl.fl_end;
> > > fl->fl_type = req->a_res.lock.fl.fl_type;
> > > - fl->fl_pid = 0;
> > > + fl->fl_pid = -req->a_res.lock.fl.fl_pid;
> > > break;
> > > default:
> > > status = nlm_stat_to_errno(req->a_res.status);
> > > diff --git a/fs/lockd/xdr.c b/fs/lockd/xdr.c
> > > index 7147e4aebecc..9846f7e95282 100644
> > > --- a/fs/lockd/xdr.c
> > > +++ b/fs/lockd/xdr.c
> > > @@ -127,7 +127,7 @@ nlm_decode_lock(__be32 *p, struct nlm_lock *lock)
> > >
> > > locks_init_lock(fl);
> > > fl->fl_owner = current->files;
> > > - fl->fl_pid = (pid_t)lock->svid;
> > > + fl->fl_pid = current->tgid;
> > > fl->fl_flags = FL_POSIX;
> > > fl->fl_type = F_RDLCK; /* as good as anything else */
> > > start = ntohl(*p++);
> > > @@ -269,7 +269,7 @@ nlmsvc_decode_shareargs(struct svc_rqst *rqstp, __be32 *p)
> > > memset(lock, 0, sizeof(*lock));
> > > locks_init_lock(&lock->fl);
> > > lock->svid = ~(u32) 0;
> > > - lock->fl.fl_pid = (pid_t)lock->svid;
> > > + lock->fl.fl_pid = current->tgid;
> > >
> > > if (!(p = nlm_decode_cookie(p, &argp->cookie))
> > > || !(p = xdr_decode_string_inplace(p, &lock->caller,
> > > diff --git a/fs/lockd/xdr4.c b/fs/lockd/xdr4.c
> > > index 7ed9edf9aed4..70154f376695 100644
> > > --- a/fs/lockd/xdr4.c
> > > +++ b/fs/lockd/xdr4.c
> > > @@ -119,7 +119,7 @@ nlm4_decode_lock(__be32 *p, struct nlm_lock *lock)
> > >
> > > locks_init_lock(fl);
> > > fl->fl_owner = current->files;
> > > - fl->fl_pid = (pid_t)lock->svid;
> > > + fl->fl_pid = current->tgid;
> > > fl->fl_flags = FL_POSIX;
> > > fl->fl_type = F_RDLCK; /* as good as anything else */
> > > p = xdr_decode_hyper(p, &start);
> > > @@ -266,7 +266,7 @@ nlm4svc_decode_shareargs(struct svc_rqst *rqstp, __be32 *p)
> > > memset(lock, 0, sizeof(*lock));
> > > locks_init_lock(&lock->fl);
> > > lock->svid = ~(u32) 0;
> > > - lock->fl.fl_pid = (pid_t)lock->svid;
> > > + lock->fl.fl_pid = current->tgid;
> > >
> > > if (!(p = nlm4_decode_cookie(p, &argp->cookie))
> > > || !(p = xdr_decode_string_inplace(p, &lock->caller,
> > > --
> > > 2.14.3

2019-05-18 14:21:14

by Benjamin Coddington

[permalink] [raw]
Subject: Re: [PATCH] lockd: Show pid of lockd for remote locks

On 17 May 2019, at 17:45, Xuewei Zhang wrote:
> Seems this patch introduced a bug in how lock protocol handles
> GRANTED_MSG in nfs.

Yes, you're right: it's broken, and broken badly because we find conflicting
locks based on lockd's fl_pid and lockd's fl_owner, which is current->files.
That means that clients are not differentiated, and that means that v3 locks
are broken.

I'd really like to see the fl_pid value make sense on the server when we
show it to userspace, so I think that we should stuff the svid in fl_owner.

Clearly I need to be more careful making changes here, so I am going to take
my time fixing this, and I won't get to it until Monday. A revert would get
us back to safe behavior.

Ben

2019-05-19 02:17:45

by Xuewei Zhang

[permalink] [raw]
Subject: Re: [PATCH] lockd: Show pid of lockd for remote locks

On Sat, May 18, 2019 at 5:09 AM Benjamin Coddington <[email protected]> wrote:
>
> On 17 May 2019, at 17:45, Xuewei Zhang wrote:
> > Seems this patch introduced a bug in how lock protocol handles
> > GRANTED_MSG in nfs.
>
> Yes, you're right: it's broken, and broken badly because we find conflicting
> locks based on lockd's fl_pid and lockd's fl_owner, which is current->files.
> That means that clients are not differentiated, and that means that v3 locks
> are broken.

Thanks a lot for the quick response and confirming the problem!

>
> I'd really like to see the fl_pid value make sense on the server when we
> show it to userspace, so I think that we should stuff the svid in fl_owner.
>
> Clearly I need to be more careful making changes here, so I am going to take
> my time fixing this, and I won't get to it until Monday. A revert would get
> us back to safe behavior.

From my limited understanding, b8eee0e90f97 ("lockd: Show pid of lockd
for remote locks")
exists only for fixing lockd in 9d5b86ac13c5 ("fs/locks: Remove
fl_nspid and use fs-specific...").

But I don't see anything wrong in 9d5b86ac13c5 ("fs/locks: Remove
fl_nspid and use fs-specific..."). Could you let me know what's the
problem? Thanks a lot!

If 9d5b86ac13c5 ("fs/locks: Remove fl_nspid and use fs-specific...")
is correct, we
probably don't need to add another fixing patch. Perhaps reverting b8eee0e90f97
("lockd: Show pid of lockd for remote locks") would be the best way then.

>
> Ben

Best regards,
Xuewei

2019-05-20 18:00:42

by Benjamin Coddington

[permalink] [raw]
Subject: Re: [PATCH] lockd: Show pid of lockd for remote locks

On 18 May 2019, at 22:15, Xuewei Zhang wrote:

> On Sat, May 18, 2019 at 5:09 AM Benjamin Coddington
> <[email protected]> wrote:
>>
>> On 17 May 2019, at 17:45, Xuewei Zhang wrote:
>>> Seems this patch introduced a bug in how lock protocol handles
>>> GRANTED_MSG in nfs.
>>
>> Yes, you're right: it's broken, and broken badly because we find
>> conflicting
>> locks based on lockd's fl_pid and lockd's fl_owner, which is
>> current->files.
>> That means that clients are not differentiated, and that means that
>> v3 locks
>> are broken.
>
> Thanks a lot for the quick response and confirming the problem!
>
>>
>> I'd really like to see the fl_pid value make sense on the server when
>> we
>> show it to userspace, so I think that we should stuff the svid in
>> fl_owner.
>>
>> Clearly I need to be more careful making changes here, so I am going
>> to take
>> my time fixing this, and I won't get to it until Monday. A revert
>> would get
>> us back to safe behavior.
>
> From my limited understanding, b8eee0e90f97 ("lockd: Show pid of lockd
> for remote locks")
> exists only for fixing lockd in 9d5b86ac13c5 ("fs/locks: Remove
> fl_nspid and use fs-specific...").
>
> But I don't see anything wrong in 9d5b86ac13c5 ("fs/locks: Remove
> fl_nspid and use fs-specific..."). Could you let me know what's the
> problem? Thanks a lot!
>
> If 9d5b86ac13c5 ("fs/locks: Remove fl_nspid and use fs-specific...")
> is correct, we
> probably don't need to add another fixing patch. Perhaps reverting
> b8eee0e90f97
> ("lockd: Show pid of lockd for remote locks") would be the best way
> then.

I think we have an existing problem: the NLM server is setting fl_owner
to
current->files and (before the bad patch) fl_pid to svid.

That means that we can't tell the difference between locks from
different
clients that may have the same svid. The bad patch just made the
problem
far more likely to occur, that's what you're now noticing.

What needs to happen is that we generate our own fl_owner_t based on the
nlm_host and svid, so that we end up with unique fl_owner for each
client/svid pair, the same way that nlmclnt does. That way the nlm
server
can let the kernel do the lock matching based on unique fl_owner for
each
client/svid.

The mech in clntproc.c for all this can probably be shared, so I'll try
to
make that common code.

Jeff, any words of wisdom?

Ben

2019-05-20 18:08:35

by Benjamin Coddington

[permalink] [raw]
Subject: Re: [PATCH] lockd: Show pid of lockd for remote locks

On 20 May 2019, at 9:12, Benjamin Coddington wrote:

> On 18 May 2019, at 22:15, Xuewei Zhang wrote:
>
>> On Sat, May 18, 2019 at 5:09 AM Benjamin Coddington
>> <[email protected]> wrote:
>>>
>>> On 17 May 2019, at 17:45, Xuewei Zhang wrote:
>>>> Seems this patch introduced a bug in how lock protocol handles
>>>> GRANTED_MSG in nfs.
>>>
>>> Yes, you're right: it's broken, and broken badly because we find
>>> conflicting
>>> locks based on lockd's fl_pid and lockd's fl_owner, which is
>>> current->files.
>>> That means that clients are not differentiated, and that means that
>>> v3 locks
>>> are broken.
>>
>> Thanks a lot for the quick response and confirming the problem!
>>
>>>
>>> I'd really like to see the fl_pid value make sense on the server
>>> when we
>>> show it to userspace, so I think that we should stuff the svid in
>>> fl_owner.
>>>
>>> Clearly I need to be more careful making changes here, so I am going
>>> to take
>>> my time fixing this, and I won't get to it until Monday. A revert
>>> would get
>>> us back to safe behavior.
>>
>> From my limited understanding, b8eee0e90f97 ("lockd: Show pid of
>> lockd
>> for remote locks")
>> exists only for fixing lockd in 9d5b86ac13c5 ("fs/locks: Remove
>> fl_nspid and use fs-specific...").
>>
>> But I don't see anything wrong in 9d5b86ac13c5 ("fs/locks: Remove
>> fl_nspid and use fs-specific..."). Could you let me know what's the
>> problem? Thanks a lot!
>>
>> If 9d5b86ac13c5 ("fs/locks: Remove fl_nspid and use fs-specific...")
>> is correct, we
>> probably don't need to add another fixing patch. Perhaps reverting
>> b8eee0e90f97
>> ("lockd: Show pid of lockd for remote locks") would be the best way
>> then.
>
> I think we have an existing problem: the NLM server is setting
> fl_owner to
> current->files and (before the bad patch) fl_pid to svid.
>
> That means that we can't tell the difference between locks from
> different
> clients that may have the same svid. The bad patch just made the
> problem
> far more likely to occur, that's what you're now noticing.

Ok, I just noticed that we set fl_owner to the nlm_host in
nlm4svc_retrieve_args, so things are not as dire as I thought. What
would
be nice is a sane set of tests for NLM..

Since we already were placing the nlm_host in fl_owner, I think
reverting
9d5b86ac13c5 at this point is the proper thing to do.

Ben

2019-05-20 20:51:35

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH] lockd: Show pid of lockd for remote locks

On Mon, May 20, 2019 at 10:22:00AM -0400, Benjamin Coddington wrote:
> Ok, I just noticed that we set fl_owner to the nlm_host in
> nlm4svc_retrieve_args, so things are not as dire as I thought. What
> would be nice is a sane set of tests for NLM..

What would we have needed to catch this? Sounds like it turns
multi-client testing wouldn't have been required? (Not that that would
be a bad idea.)

--b.

2019-05-21 11:19:34

by Benjamin Coddington

[permalink] [raw]
Subject: Re: [PATCH] lockd: Show pid of lockd for remote locks

On 20 May 2019, at 16:51, J. Bruce Fields wrote:

> On Mon, May 20, 2019 at 10:22:00AM -0400, Benjamin Coddington wrote:
>> Ok, I just noticed that we set fl_owner to the nlm_host in
>> nlm4svc_retrieve_args, so things are not as dire as I thought. What
>> would be nice is a sane set of tests for NLM..
>
> What would we have needed to catch this? Sounds like it turns
> multi-client testing wouldn't have been required? (Not that that
> would
> be a bad idea.)

Two NLM clients would be ideal to exercise the full range of expected
lock behavior. I suspect that's something I can do with what's in pynfs
today, but I haven't looked yet. I suppose if there's a test for NLM I
should make one for v4 too..

Ben

2019-05-21 14:50:57

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH] lockd: Show pid of lockd for remote locks

On Tue, May 21, 2019 at 07:18:57AM -0400, Benjamin Coddington wrote:
> On 20 May 2019, at 16:51, J. Bruce Fields wrote:
>
> >On Mon, May 20, 2019 at 10:22:00AM -0400, Benjamin Coddington wrote:
> >>Ok, I just noticed that we set fl_owner to the nlm_host in
> >>nlm4svc_retrieve_args, so things are not as dire as I thought. What
> >>would be nice is a sane set of tests for NLM..
> >
> >What would we have needed to catch this? Sounds like it turns
> >multi-client testing wouldn't have been required? (Not that that
> >would
> >be a bad idea.)
>
> Two NLM clients would be ideal to exercise the full range of
> expected lock behavior. I suspect that's something I can do with
> what's in pynfs today, but I haven't looked yet. I suppose if
> there's a test for NLM I should make one for v4 too..

There isn't any pynfs NLM code. Some isilon folks did NLM/NSM/NFSv2/v3
pynfs tests:

https://github.com/sthaber/pynfs

I just never got a chance to incorporate them and try them. It's been a
while, and I think there were one or two odd things about it, but maybe
it'd be a good starting point.

--b.