LinuxLists.cc - [PATCH RFC] xprtrdma: Fix an LOCK/OPEN race when unlinking an open file

2016-03-28 15:51:20

Subject: [PATCH RFC] xprtrdma: Fix an LOCK/OPEN race when unlinking an open file

At Connectathon 2016, we found that recent upstream Linux clients
would occasionally send a LOCK operation with a zero stateid. This
appeared to happen in close proximity to another thread returning
a delegation before unlinking the same file while it remained open.

Earlier, the client received a write delegation on this file and
returned the open stateid. Now, as it is getting ready to unlink the
file, it returns the write delegation. But there is still an open
file descriptor on that file, so the client must OPEN the file
again before it returns the delegation.

Since commit 24311f884189 ('NFSv4: Recovery of recalled read
delegations is broken'), nfs_open_delegation_recall() clears the
NFS_DELEGATED_STATE flag _before_ it sends the OPEN. This allows a
racing LOCK on the same inode to be put on the wire before the OPEN
operation has returned a valid open stateid.

After the OPEN(CLAIM_DELEG_CUR_FH) returns, the client holds both
a write delegation and a valid open stateid. It is safe to clear
NFS_DELEGATED_STATE at that point, allowing fresh lock requests
to go on the wire using the newly acquired open stateid.

I'm not certain of this fix. nfs4_handle_delegation_recall_error()
is called from both nfs_open_delegation_recall() and
nfs_lock_delegation_recall(). Is it safe and correct to clear
NFS_DELEGATED_STATE after success in both of these code paths?

Fixes: 24311f884189 ('NFSv4: Recovery of recalled read ...')
Signed-off-by: Chuck Lever <[email protected]>
---
fs/nfs/nfs4proc.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 01bef06..3ccdc76 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -1794,10 +1794,12 @@ static int nfs4_handle_delegation_recall_error(struct nfs_server *server, struct
default:
printk(KERN_ERR "NFS: %s: unhandled error "
"%d.\n", __func__, err);
+ break;
case 0:
case -ENOENT:
case -EAGAIN:
case -ESTALE:
+ clear_bit(NFS_DELEGATED_STATE, &state->flags);
break;
case -NFS4ERR_BADSESSION:
case -NFS4ERR_BADSLOT:
@@ -1857,7 +1859,6 @@ int nfs4_open_delegation_recall(struct nfs_open_context *ctx,
write_seqlock(&state->seqlock);
nfs4_stateid_copy(&state->stateid, &state->open_stateid);
write_sequnlock(&state->seqlock);
- clear_bit(NFS_DELEGATED_STATE, &state->flags);
switch (type & (FMODE_READ|FMODE_WRITE)) {
case FMODE_READ|FMODE_WRITE:
case FMODE_WRITE:

2016-03-28 16:30:57

by Trond Myklebust

[permalink] [raw]

Subject: Re: [PATCH RFC] xprtrdma: Fix an LOCK/OPEN race when unlinking an open file

Hi Chuck,

On Mon, Mar 28, 2016 at 11:51 AM, Chuck Lever <[email protected]> wrote:
> At Connectathon 2016, we found that recent upstream Linux clients
> would occasionally send a LOCK operation with a zero stateid. This
> appeared to happen in close proximity to another thread returning
> a delegation before unlinking the same file while it remained open.
>
> Earlier, the client received a write delegation on this file and
> returned the open stateid. Now, as it is getting ready to unlink the
> file, it returns the write delegation. But there is still an open
> file descriptor on that file, so the client must OPEN the file
> again before it returns the delegation.
>
> Since commit 24311f884189 ('NFSv4: Recovery of recalled read
> delegations is broken'), nfs_open_delegation_recall() clears the
> NFS_DELEGATED_STATE flag _before_ it sends the OPEN. This allows a
> racing LOCK on the same inode to be put on the wire before the OPEN
> operation has returned a valid open stateid.
>
> After the OPEN(CLAIM_DELEG_CUR_FH) returns, the client holds both
> a write delegation and a valid open stateid. It is safe to clear
> NFS_DELEGATED_STATE at that point, allowing fresh lock requests
> to go on the wire using the newly acquired open stateid.
>
> I'm not certain of this fix. nfs4_handle_delegation_recall_error()
> is called from both nfs_open_delegation_recall() and
> nfs_lock_delegation_recall(). Is it safe and correct to clear
> NFS_DELEGATED_STATE after success in both of these code paths?
>

I'm not seeing why the subject line is tagged as describing an
xprtrdma issue. Was that intentional?
Secondly, would it perhaps make more sense to have the locking code
simply wait for the outstanding delegation return recovery? Otherwise,
I worry that we are exchanging one timing-specific problem for
another.

Thanks
Trond

2016-03-28 17:00:47

by Chuck Lever III

[permalink] [raw]

Subject: Re: [PATCH RFC] xprtrdma: Fix an LOCK/OPEN race when unlinking an open file

> On Mar 28, 2016, at 12:30 PM, Trond Myklebust <[email protected]> wrote:
>
> Hi Chuck,
>
> On Mon, Mar 28, 2016 at 11:51 AM, Chuck Lever <[email protected]> wrote:
>> At Connectathon 2016, we found that recent upstream Linux clients
>> would occasionally send a LOCK operation with a zero stateid. This
>> appeared to happen in close proximity to another thread returning
>> a delegation before unlinking the same file while it remained open.
>>
>> Earlier, the client received a write delegation on this file and
>> returned the open stateid. Now, as it is getting ready to unlink the
>> file, it returns the write delegation. But there is still an open
>> file descriptor on that file, so the client must OPEN the file
>> again before it returns the delegation.
>>
>> Since commit 24311f884189 ('NFSv4: Recovery of recalled read
>> delegations is broken'), nfs_open_delegation_recall() clears the
>> NFS_DELEGATED_STATE flag _before_ it sends the OPEN. This allows a
>> racing LOCK on the same inode to be put on the wire before the OPEN
>> operation has returned a valid open stateid.
>>
>> After the OPEN(CLAIM_DELEG_CUR_FH) returns, the client holds both
>> a write delegation and a valid open stateid. It is safe to clear
>> NFS_DELEGATED_STATE at that point, allowing fresh lock requests
>> to go on the wire using the newly acquired open stateid.
>>
>> I'm not certain of this fix. nfs4_handle_delegation_recall_error()
>> is called from both nfs_open_delegation_recall() and
>> nfs_lock_delegation_recall(). Is it safe and correct to clear
>> NFS_DELEGATED_STATE after success in both of these code paths?
>>
>
> I'm not seeing why the subject line is tagged as describing an
> xprtrdma issue. Was that intentional?

No, it was just habit. I'll remove it.

> Secondly, would it perhaps make more sense to have the locking code
> simply wait for the outstanding delegation return recovery? Otherwise,
> I worry that we are exchanging one timing-specific problem for
> another.

Sure, that's a doubt I have as well.

My understanding of this is not sophisticated, but
it seems safe to switch to using on-the-wire locks
once the client holds a delegation and an open
stateid.

Then after that, flush any remaining cached locks.
New application lock requests can go to the server
concurrently. The VFS layer should sort out conflicts
(copying Jeff for his thoughts).

Finally, return the delegation, allowing other
clients to open/lock the file.

That seems to be what that code path is doing already.

Is there another window in there where an application
might drive open or lock system calls that could result
in an indeterminant result?

Serializing locking and delegation return would have a
negative performance impact on common locking operations
(even when they are cached), I would think, as it would
have to be a mutex or something even more heavyweight.

--
Chuck Lever