At Connectathon 2016, we found that recent upstream Linux clients
would occasionally send a LOCK operation with a zero stateid. This
appeared to happen in close proximity to another thread returning
a delegation before unlinking the same file while it remained open.
Earlier, the client received a write delegation on this file and
returned the open stateid. Now, as it is getting ready to unlink the
file, it returns the write delegation. But there is still an open
file descriptor on that file, so the client must OPEN the file
again before it returns the delegation.
Since commit 24311f884189 ('NFSv4: Recovery of recalled read
delegations is broken'), nfs_open_delegation_recall() clears the
NFS_DELEGATED_STATE flag _before_ it sends the OPEN. This allows a
racing LOCK on the same inode to be put on the wire before the OPEN
operation has returned a valid open stateid.
To eliminate this race, serialize delegation return with the
acquisition of a file lock on the same file. Adopt the same approach
as is used in the unlock path.
Fixes: 24311f884189 ('NFSv4: Recovery of recalled read ... ')
Signed-off-by: Chuck Lever <[email protected]>
---
Hi-
This fix appears to be both safe and effective. Please consider
it for v4.7 and for stable. Thanks!
fs/nfs/nfs4proc.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 01bef06..c40f1b6 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -6054,6 +6054,7 @@ static int nfs41_lock_expired(struct nfs4_state *state, struct file_lock *reques
static int _nfs4_proc_setlk(struct nfs4_state *state, int cmd, struct file_lock *request)
{
struct nfs_inode *nfsi = NFS_I(state->inode);
+ struct nfs4_state_owner *sp = state->owner;
unsigned char fl_flags = request->fl_flags;
int status = -ENOLCK;
@@ -6068,6 +6069,7 @@ static int _nfs4_proc_setlk(struct nfs4_state *state, int cmd, struct file_lock
status = do_vfs_lock(state->inode, request);
if (status < 0)
goto out;
+ mutex_lock(&sp->so_delegreturn_mutex);
down_read(&nfsi->rwsem);
if (test_bit(NFS_DELEGATED_STATE, &state->flags)) {
/* Yes: cache locks! */
@@ -6075,9 +6077,11 @@ static int _nfs4_proc_setlk(struct nfs4_state *state, int cmd, struct file_lock
request->fl_flags = fl_flags & ~FL_SLEEP;
status = do_vfs_lock(state->inode, request);
up_read(&nfsi->rwsem);
+ mutex_unlock(&sp->so_delegreturn_mutex);
goto out;
}
up_read(&nfsi->rwsem);
+ mutex_unlock(&sp->so_delegreturn_mutex);
status = _nfs4_do_setlk(state, cmd, request, NFS_LOCK_NEW);
out:
request->fl_flags = fl_flags;
Hello,
On Mon, Apr 11, 2016 at 10:20 PM, Chuck Lever <[email protected]> wrote:
> At Connectathon 2016, we found that recent upstream Linux clients
> would occasionally send a LOCK operation with a zero stateid. This
> appeared to happen in close proximity to another thread returning
> a delegation before unlinking the same file while it remained open.
>
> Earlier, the client received a write delegation on this file and
> returned the open stateid. Now, as it is getting ready to unlink the
> file, it returns the write delegation. But there is still an open
> file descriptor on that file, so the client must OPEN the file
> again before it returns the delegation.
>
> Since commit 24311f884189 ('NFSv4: Recovery of recalled read
> delegations is broken'), nfs_open_delegation_recall() clears the
> NFS_DELEGATED_STATE flag _before_ it sends the OPEN. This allows a
> racing LOCK on the same inode to be put on the wire before the OPEN
> operation has returned a valid open stateid.
>
> To eliminate this race, serialize delegation return with the
> acquisition of a file lock on the same file. Adopt the same approach
> as is used in the unlock path.
>
> Fixes: 24311f884189 ('NFSv4: Recovery of recalled read ... ')
> Signed-off-by: Chuck Lever <[email protected]>
Is it something which is going to be merged in v4.7?
It also might be a good candidate for stable?
Best regards,
--
William
> On Apr 25, 2016, at 10:34 AM, William Dauchy <[email protected]> wrote:
>
> Hello,
>
> On Mon, Apr 11, 2016 at 10:20 PM, Chuck Lever <[email protected]> wrote:
>> At Connectathon 2016, we found that recent upstream Linux clients
>> would occasionally send a LOCK operation with a zero stateid. This
>> appeared to happen in close proximity to another thread returning
>> a delegation before unlinking the same file while it remained open.
>>
>> Earlier, the client received a write delegation on this file and
>> returned the open stateid. Now, as it is getting ready to unlink the
>> file, it returns the write delegation. But there is still an open
>> file descriptor on that file, so the client must OPEN the file
>> again before it returns the delegation.
>>
>> Since commit 24311f884189 ('NFSv4: Recovery of recalled read
>> delegations is broken'), nfs_open_delegation_recall() clears the
>> NFS_DELEGATED_STATE flag _before_ it sends the OPEN. This allows a
>> racing LOCK on the same inode to be put on the wire before the OPEN
>> operation has returned a valid open stateid.
>>
>> To eliminate this race, serialize delegation return with the
>> acquisition of a file lock on the same file. Adopt the same approach
>> as is used in the unlock path.
>>
>> Fixes: 24311f884189 ('NFSv4: Recovery of recalled read ... ')
>> Signed-off-by: Chuck Lever <[email protected]>
>
>
> Is it something which is going to be merged in v4.7?
> It also might be a good candidate for stable?
William, only the NFS maintainers can answer these questions.
I really don't know.
--
Chuck Lever
I believe this patch also helps with a race between a LOCK and
CB_RECALL. Application does a lock as the delegation is being
recalled. The lock thread sees the delegated state and acquires a
local lock. At the same time delegation doesn't see it the lock yet
and returns the delegation. Application proceeds to do IO. It ends up
using an open stateid for the IO (as there is no delegation stateid or
lock stateid). The server is unaware of the lock so it can give that
lock to somebody else. Yet this client thinks it has a local lock. It
leads to inconsistent data between clients.
On Mon, Apr 11, 2016 at 4:20 PM, Chuck Lever <[email protected]> wrote:
> At Connectathon 2016, we found that recent upstream Linux clients
> would occasionally send a LOCK operation with a zero stateid. This
> appeared to happen in close proximity to another thread returning
> a delegation before unlinking the same file while it remained open.
>
> Earlier, the client received a write delegation on this file and
> returned the open stateid. Now, as it is getting ready to unlink the
> file, it returns the write delegation. But there is still an open
> file descriptor on that file, so the client must OPEN the file
> again before it returns the delegation.
>
> Since commit 24311f884189 ('NFSv4: Recovery of recalled read
> delegations is broken'), nfs_open_delegation_recall() clears the
> NFS_DELEGATED_STATE flag _before_ it sends the OPEN. This allows a
> racing LOCK on the same inode to be put on the wire before the OPEN
> operation has returned a valid open stateid.
>
> To eliminate this race, serialize delegation return with the
> acquisition of a file lock on the same file. Adopt the same approach
> as is used in the unlock path.
>
> Fixes: 24311f884189 ('NFSv4: Recovery of recalled read ... ')
> Signed-off-by: Chuck Lever <[email protected]>
> ---
> Hi-
>
> This fix appears to be both safe and effective. Please consider
> it for v4.7 and for stable. Thanks!
>
>
> fs/nfs/nfs4proc.c | 4 ++++
> 1 file changed, 4 insertions(+)
>
> diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
> index 01bef06..c40f1b6 100644
> --- a/fs/nfs/nfs4proc.c
> +++ b/fs/nfs/nfs4proc.c
> @@ -6054,6 +6054,7 @@ static int nfs41_lock_expired(struct nfs4_state *state, struct file_lock *reques
> static int _nfs4_proc_setlk(struct nfs4_state *state, int cmd, struct file_lock *request)
> {
> struct nfs_inode *nfsi = NFS_I(state->inode);
> + struct nfs4_state_owner *sp = state->owner;
> unsigned char fl_flags = request->fl_flags;
> int status = -ENOLCK;
>
> @@ -6068,6 +6069,7 @@ static int _nfs4_proc_setlk(struct nfs4_state *state, int cmd, struct file_lock
> status = do_vfs_lock(state->inode, request);
> if (status < 0)
> goto out;
> + mutex_lock(&sp->so_delegreturn_mutex);
> down_read(&nfsi->rwsem);
> if (test_bit(NFS_DELEGATED_STATE, &state->flags)) {
> /* Yes: cache locks! */
> @@ -6075,9 +6077,11 @@ static int _nfs4_proc_setlk(struct nfs4_state *state, int cmd, struct file_lock
> request->fl_flags = fl_flags & ~FL_SLEEP;
> status = do_vfs_lock(state->inode, request);
> up_read(&nfsi->rwsem);
> + mutex_unlock(&sp->so_delegreturn_mutex);
> goto out;
> }
> up_read(&nfsi->rwsem);
> + mutex_unlock(&sp->so_delegreturn_mutex);
> status = _nfs4_do_setlk(state, cmd, request, NFS_LOCK_NEW);
> out:
> request->fl_flags = fl_flags;
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
Hello Anna,
Could you have a look at this one please?
Thanks,
William
On Tue, Apr 26, 2016 at 4:24 PM, Olga Kornievskaia <[email protected]> wrote:
> I believe this patch also helps with a race between a LOCK and
> CB_RECALL. Application does a lock as the delegation is being
> recalled. The lock thread sees the delegated state and acquires a
> local lock. At the same time delegation doesn't see it the lock yet
> and returns the delegation. Application proceeds to do IO. It ends up
> using an open stateid for the IO (as there is no delegation stateid or
> lock stateid). The server is unaware of the lock so it can give that
> lock to somebody else. Yet this client thinks it has a local lock. It
> leads to inconsistent data between clients.
>
> On Mon, Apr 11, 2016 at 4:20 PM, Chuck Lever <[email protected]> wrote:
>> At Connectathon 2016, we found that recent upstream Linux clients
>> would occasionally send a LOCK operation with a zero stateid. This
>> appeared to happen in close proximity to another thread returning
>> a delegation before unlinking the same file while it remained open.
>>
>> Earlier, the client received a write delegation on this file and
>> returned the open stateid. Now, as it is getting ready to unlink the
>> file, it returns the write delegation. But there is still an open
>> file descriptor on that file, so the client must OPEN the file
>> again before it returns the delegation.
>>
>> Since commit 24311f884189 ('NFSv4: Recovery of recalled read
>> delegations is broken'), nfs_open_delegation_recall() clears the
>> NFS_DELEGATED_STATE flag _before_ it sends the OPEN. This allows a
>> racing LOCK on the same inode to be put on the wire before the OPEN
>> operation has returned a valid open stateid.
>>
>> To eliminate this race, serialize delegation return with the
>> acquisition of a file lock on the same file. Adopt the same approach
>> as is used in the unlock path.
>>
>> Fixes: 24311f884189 ('NFSv4: Recovery of recalled read ... ')
>> Signed-off-by: Chuck Lever <[email protected]>
>> ---
>> Hi-
>>
>> This fix appears to be both safe and effective. Please consider
>> it for v4.7 and for stable. Thanks!
>>
>>
>> fs/nfs/nfs4proc.c | 4 ++++
>> 1 file changed, 4 insertions(+)
>>
>> diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
>> index 01bef06..c40f1b6 100644
>> --- a/fs/nfs/nfs4proc.c
>> +++ b/fs/nfs/nfs4proc.c
>> @@ -6054,6 +6054,7 @@ static int nfs41_lock_expired(struct nfs4_state *state, struct file_lock *reques
>> static int _nfs4_proc_setlk(struct nfs4_state *state, int cmd, struct file_lock *request)
>> {
>> struct nfs_inode *nfsi = NFS_I(state->inode);
>> + struct nfs4_state_owner *sp = state->owner;
>> unsigned char fl_flags = request->fl_flags;
>> int status = -ENOLCK;
>>
>> @@ -6068,6 +6069,7 @@ static int _nfs4_proc_setlk(struct nfs4_state *state, int cmd, struct file_lock
>> status = do_vfs_lock(state->inode, request);
>> if (status < 0)
>> goto out;
>> + mutex_lock(&sp->so_delegreturn_mutex);
>> down_read(&nfsi->rwsem);
>> if (test_bit(NFS_DELEGATED_STATE, &state->flags)) {
>> /* Yes: cache locks! */
>> @@ -6075,9 +6077,11 @@ static int _nfs4_proc_setlk(struct nfs4_state *state, int cmd, struct file_lock
>> request->fl_flags = fl_flags & ~FL_SLEEP;
>> status = do_vfs_lock(state->inode, request);
>> up_read(&nfsi->rwsem);
>> + mutex_unlock(&sp->so_delegreturn_mutex);
>> goto out;
>> }
>> up_read(&nfsi->rwsem);
>> + mutex_unlock(&sp->so_delegreturn_mutex);
>> status = _nfs4_do_setlk(state, cmd, request, NFS_LOCK_NEW);
>> out:
>> request->fl_flags = fl_flags;
>>
--
William
The patch looks pretty straightforward to me, and it sounds like it fixes a few problems that people are seeing. One question (below):
On 04/28/2016 08:43 AM, William Dauchy wrote:
> Hello Anna,
>
> Could you have a look at this one please?
>
> Thanks,
>
> William
>
> On Tue, Apr 26, 2016 at 4:24 PM, Olga Kornievskaia <[email protected]> wrote:
>> I believe this patch also helps with a race between a LOCK and
>> CB_RECALL. Application does a lock as the delegation is being
>> recalled. The lock thread sees the delegated state and acquires a
>> local lock. At the same time delegation doesn't see it the lock yet
>> and returns the delegation. Application proceeds to do IO. It ends up
>> using an open stateid for the IO (as there is no delegation stateid or
>> lock stateid). The server is unaware of the lock so it can give that
>> lock to somebody else. Yet this client thinks it has a local lock. It
>> leads to inconsistent data between clients.
>>
>> On Mon, Apr 11, 2016 at 4:20 PM, Chuck Lever <[email protected]> wrote:
>>> At Connectathon 2016, we found that recent upstream Linux clients
>>> would occasionally send a LOCK operation with a zero stateid. This
>>> appeared to happen in close proximity to another thread returning
>>> a delegation before unlinking the same file while it remained open.
>>>
>>> Earlier, the client received a write delegation on this file and
>>> returned the open stateid. Now, as it is getting ready to unlink the
>>> file, it returns the write delegation. But there is still an open
>>> file descriptor on that file, so the client must OPEN the file
>>> again before it returns the delegation.
>>>
>>> Since commit 24311f884189 ('NFSv4: Recovery of recalled read
>>> delegations is broken'), nfs_open_delegation_recall() clears the
>>> NFS_DELEGATED_STATE flag _before_ it sends the OPEN. This allows a
>>> racing LOCK on the same inode to be put on the wire before the OPEN
>>> operation has returned a valid open stateid.
>>>
>>> To eliminate this race, serialize delegation return with the
>>> acquisition of a file lock on the same file. Adopt the same approach
>>> as is used in the unlock path.
>>>
>>> Fixes: 24311f884189 ('NFSv4: Recovery of recalled read ... ')
>>> Signed-off-by: Chuck Lever <[email protected]>
>>> ---
>>> Hi-
>>>
>>> This fix appears to be both safe and effective. Please consider
>>> it for v4.7 and for stable. Thanks!
>>>
>>>
>>> fs/nfs/nfs4proc.c | 4 ++++
>>> 1 file changed, 4 insertions(+)
>>>
>>> diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
>>> index 01bef06..c40f1b6 100644
>>> --- a/fs/nfs/nfs4proc.c
>>> +++ b/fs/nfs/nfs4proc.c
>>> @@ -6054,6 +6054,7 @@ static int nfs41_lock_expired(struct nfs4_state *state, struct file_lock *reques
>>> static int _nfs4_proc_setlk(struct nfs4_state *state, int cmd, struct file_lock *request)
>>> {
>>> struct nfs_inode *nfsi = NFS_I(state->inode);
>>> + struct nfs4_state_owner *sp = state->owner;
>>> unsigned char fl_flags = request->fl_flags;
>>> int status = -ENOLCK;
>>>
>>> @@ -6068,6 +6069,7 @@ static int _nfs4_proc_setlk(struct nfs4_state *state, int cmd, struct file_lock
>>> status = do_vfs_lock(state->inode, request);
>>> if (status < 0)
>>> goto out;
>>> + mutex_lock(&sp->so_delegreturn_mutex);
>From what I can tell, the first call to do_vfs_lock() in this function is used to test if we can take the lock locally. Do we need to worry about this racing with delegreturn, too?
Thanks,
Anna
>>> down_read(&nfsi->rwsem);
>>> if (test_bit(NFS_DELEGATED_STATE, &state->flags)) {
>>> /* Yes: cache locks! */
>>> @@ -6075,9 +6077,11 @@ static int _nfs4_proc_setlk(struct nfs4_state *state, int cmd, struct file_lock
>>> request->fl_flags = fl_flags & ~FL_SLEEP;
>>> status = do_vfs_lock(state->inode, request);
>>> up_read(&nfsi->rwsem);
>>> + mutex_unlock(&sp->so_delegreturn_mutex);
>>> goto out;
>>> }
>>> up_read(&nfsi->rwsem);
>>> + mutex_unlock(&sp->so_delegreturn_mutex);
>>> status = _nfs4_do_setlk(state, cmd, request, NFS_LOCK_NEW);
>>> out:
>>> request->fl_flags = fl_flags;
>>>
>
> On Apr 28, 2016, at 9:13 AM, Anna Schumaker <[email protected]> wrote:
>
> The patch looks pretty straightforward to me, and it sounds like it fixes a few problems that people are seeing. One question (below):
>
> On 04/28/2016 08:43 AM, William Dauchy wrote:
>> Hello Anna,
>>
>> Could you have a look at this one please?
>>
>> Thanks,
>>
>> William
>>
>> On Tue, Apr 26, 2016 at 4:24 PM, Olga Kornievskaia <[email protected]> wrote:
>>> I believe this patch also helps with a race between a LOCK and
>>> CB_RECALL. Application does a lock as the delegation is being
>>> recalled. The lock thread sees the delegated state and acquires a
>>> local lock. At the same time delegation doesn't see it the lock yet
>>> and returns the delegation. Application proceeds to do IO. It ends up
>>> using an open stateid for the IO (as there is no delegation stateid or
>>> lock stateid). The server is unaware of the lock so it can give that
>>> lock to somebody else. Yet this client thinks it has a local lock. It
>>> leads to inconsistent data between clients.
>>>
>>> On Mon, Apr 11, 2016 at 4:20 PM, Chuck Lever <[email protected]> wrote:
>>>> At Connectathon 2016, we found that recent upstream Linux clients
>>>> would occasionally send a LOCK operation with a zero stateid. This
>>>> appeared to happen in close proximity to another thread returning
>>>> a delegation before unlinking the same file while it remained open.
>>>>
>>>> Earlier, the client received a write delegation on this file and
>>>> returned the open stateid. Now, as it is getting ready to unlink the
>>>> file, it returns the write delegation. But there is still an open
>>>> file descriptor on that file, so the client must OPEN the file
>>>> again before it returns the delegation.
>>>>
>>>> Since commit 24311f884189 ('NFSv4: Recovery of recalled read
>>>> delegations is broken'), nfs_open_delegation_recall() clears the
>>>> NFS_DELEGATED_STATE flag _before_ it sends the OPEN. This allows a
>>>> racing LOCK on the same inode to be put on the wire before the OPEN
>>>> operation has returned a valid open stateid.
>>>>
>>>> To eliminate this race, serialize delegation return with the
>>>> acquisition of a file lock on the same file. Adopt the same approach
>>>> as is used in the unlock path.
>>>>
>>>> Fixes: 24311f884189 ('NFSv4: Recovery of recalled read ... ')
>>>> Signed-off-by: Chuck Lever <[email protected]>
>>>> ---
>>>> Hi-
>>>>
>>>> This fix appears to be both safe and effective. Please consider
>>>> it for v4.7 and for stable. Thanks!
>>>>
>>>>
>>>> fs/nfs/nfs4proc.c | 4 ++++
>>>> 1 file changed, 4 insertions(+)
>>>>
>>>> diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
>>>> index 01bef06..c40f1b6 100644
>>>> --- a/fs/nfs/nfs4proc.c
>>>> +++ b/fs/nfs/nfs4proc.c
>>>> @@ -6054,6 +6054,7 @@ static int nfs41_lock_expired(struct nfs4_state *state, struct file_lock *reques
>>>> static int _nfs4_proc_setlk(struct nfs4_state *state, int cmd, struct file_lock *request)
>>>> {
>>>> struct nfs_inode *nfsi = NFS_I(state->inode);
>>>> + struct nfs4_state_owner *sp = state->owner;
>>>> unsigned char fl_flags = request->fl_flags;
>>>> int status = -ENOLCK;
>>>>
>>>> @@ -6068,6 +6069,7 @@ static int _nfs4_proc_setlk(struct nfs4_state *state, int cmd, struct file_lock
>>>> status = do_vfs_lock(state->inode, request);
>>>> if (status < 0)
>>>> goto out;
>>>> + mutex_lock(&sp->so_delegreturn_mutex);
>
> From what I can tell, the first call to do_vfs_lock() in this function is used to test if we can take the lock locally. Do we need to worry about this racing with delegreturn, too?
When I included that call in the critical section,
cthon04 locking tests deadlocked.
> Thanks,
> Anna
>
>>>> down_read(&nfsi->rwsem);
>>>> if (test_bit(NFS_DELEGATED_STATE, &state->flags)) {
>>>> /* Yes: cache locks! */
>>>> @@ -6075,9 +6077,11 @@ static int _nfs4_proc_setlk(struct nfs4_state *state, int cmd, struct file_lock
>>>> request->fl_flags = fl_flags & ~FL_SLEEP;
>>>> status = do_vfs_lock(state->inode, request);
>>>> up_read(&nfsi->rwsem);
>>>> + mutex_unlock(&sp->so_delegreturn_mutex);
>>>> goto out;
>>>> }
>>>> up_read(&nfsi->rwsem);
>>>> + mutex_unlock(&sp->so_delegreturn_mutex);
>>>> status = _nfs4_do_setlk(state, cmd, request, NFS_LOCK_NEW);
>>>> out:
>>>> request->fl_flags = fl_flags;
--
Chuck Lever
On 04/28/2016 10:06 AM, Chuck Lever wrote:
>
>> On Apr 28, 2016, at 9:13 AM, Anna Schumaker <[email protected]> wrote:
>>
>> The patch looks pretty straightforward to me, and it sounds like it fixes a few problems that people are seeing. One question (below):
>>
>> On 04/28/2016 08:43 AM, William Dauchy wrote:
>>> Hello Anna,
>>>
>>> Could you have a look at this one please?
>>>
>>> Thanks,
>>>
>>> William
>>>
>>> On Tue, Apr 26, 2016 at 4:24 PM, Olga Kornievskaia <[email protected]> wrote:
>>>> I believe this patch also helps with a race between a LOCK and
>>>> CB_RECALL. Application does a lock as the delegation is being
>>>> recalled. The lock thread sees the delegated state and acquires a
>>>> local lock. At the same time delegation doesn't see it the lock yet
>>>> and returns the delegation. Application proceeds to do IO. It ends up
>>>> using an open stateid for the IO (as there is no delegation stateid or
>>>> lock stateid). The server is unaware of the lock so it can give that
>>>> lock to somebody else. Yet this client thinks it has a local lock. It
>>>> leads to inconsistent data between clients.
>>>>
>>>> On Mon, Apr 11, 2016 at 4:20 PM, Chuck Lever <[email protected]> wrote:
>>>>> At Connectathon 2016, we found that recent upstream Linux clients
>>>>> would occasionally send a LOCK operation with a zero stateid. This
>>>>> appeared to happen in close proximity to another thread returning
>>>>> a delegation before unlinking the same file while it remained open.
>>>>>
>>>>> Earlier, the client received a write delegation on this file and
>>>>> returned the open stateid. Now, as it is getting ready to unlink the
>>>>> file, it returns the write delegation. But there is still an open
>>>>> file descriptor on that file, so the client must OPEN the file
>>>>> again before it returns the delegation.
>>>>>
>>>>> Since commit 24311f884189 ('NFSv4: Recovery of recalled read
>>>>> delegations is broken'), nfs_open_delegation_recall() clears the
>>>>> NFS_DELEGATED_STATE flag _before_ it sends the OPEN. This allows a
>>>>> racing LOCK on the same inode to be put on the wire before the OPEN
>>>>> operation has returned a valid open stateid.
>>>>>
>>>>> To eliminate this race, serialize delegation return with the
>>>>> acquisition of a file lock on the same file. Adopt the same approach
>>>>> as is used in the unlock path.
>>>>>
>>>>> Fixes: 24311f884189 ('NFSv4: Recovery of recalled read ... ')
>>>>> Signed-off-by: Chuck Lever <[email protected]>
>>>>> ---
>>>>> Hi-
>>>>>
>>>>> This fix appears to be both safe and effective. Please consider
>>>>> it for v4.7 and for stable. Thanks!
>>>>>
>>>>>
>>>>> fs/nfs/nfs4proc.c | 4 ++++
>>>>> 1 file changed, 4 insertions(+)
>>>>>
>>>>> diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
>>>>> index 01bef06..c40f1b6 100644
>>>>> --- a/fs/nfs/nfs4proc.c
>>>>> +++ b/fs/nfs/nfs4proc.c
>>>>> @@ -6054,6 +6054,7 @@ static int nfs41_lock_expired(struct nfs4_state *state, struct file_lock *reques
>>>>> static int _nfs4_proc_setlk(struct nfs4_state *state, int cmd, struct file_lock *request)
>>>>> {
>>>>> struct nfs_inode *nfsi = NFS_I(state->inode);
>>>>> + struct nfs4_state_owner *sp = state->owner;
>>>>> unsigned char fl_flags = request->fl_flags;
>>>>> int status = -ENOLCK;
>>>>>
>>>>> @@ -6068,6 +6069,7 @@ static int _nfs4_proc_setlk(struct nfs4_state *state, int cmd, struct file_lock
>>>>> status = do_vfs_lock(state->inode, request);
>>>>> if (status < 0)
>>>>> goto out;
>>>>> + mutex_lock(&sp->so_delegreturn_mutex);
>>
>> From what I can tell, the first call to do_vfs_lock() in this function is used to test if we can take the lock locally. Do we need to worry about this racing with delegreturn, too?
>
> When I included that call in the critical section,
> cthon04 locking tests deadlocked.
Got it. Thanks for checking!
Anna
>
>
>> Thanks,
>> Anna
>>
>>>>> down_read(&nfsi->rwsem);
>>>>> if (test_bit(NFS_DELEGATED_STATE, &state->flags)) {
>>>>> /* Yes: cache locks! */
>>>>> @@ -6075,9 +6077,11 @@ static int _nfs4_proc_setlk(struct nfs4_state *state, int cmd, struct file_lock
>>>>> request->fl_flags = fl_flags & ~FL_SLEEP;
>>>>> status = do_vfs_lock(state->inode, request);
>>>>> up_read(&nfsi->rwsem);
>>>>> + mutex_unlock(&sp->so_delegreturn_mutex);
>>>>> goto out;
>>>>> }
>>>>> up_read(&nfsi->rwsem);
>>>>> + mutex_unlock(&sp->so_delegreturn_mutex);
>>>>> status = _nfs4_do_setlk(state, cmd, request, NFS_LOCK_NEW);
>>>>> out:
>>>>> request->fl_flags = fl_flags;
>
> --
> Chuck Lever
>
>
>
Chuck or Anna,
If the patch is accepted, do you mind expanding the commit message to
include the wording about the LOCK and CB_RECALL race (so that it's
documented to look back into it).
On Thu, Apr 28, 2016 at 10:09 AM, Anna Schumaker
<[email protected]> wrote:
> On 04/28/2016 10:06 AM, Chuck Lever wrote:
>>
>>> On Apr 28, 2016, at 9:13 AM, Anna Schumaker <[email protected]> wrote:
>>>
>>> The patch looks pretty straightforward to me, and it sounds like it fixes a few problems that people are seeing. One question (below):
>>>
>>> On 04/28/2016 08:43 AM, William Dauchy wrote:
>>>> Hello Anna,
>>>>
>>>> Could you have a look at this one please?
>>>>
>>>> Thanks,
>>>>
>>>> William
>>>>
>>>> On Tue, Apr 26, 2016 at 4:24 PM, Olga Kornievskaia <[email protected]> wrote:
>>>>> I believe this patch also helps with a race between a LOCK and
>>>>> CB_RECALL. Application does a lock as the delegation is being
>>>>> recalled. The lock thread sees the delegated state and acquires a
>>>>> local lock. At the same time delegation doesn't see it the lock yet
>>>>> and returns the delegation. Application proceeds to do IO. It ends up
>>>>> using an open stateid for the IO (as there is no delegation stateid or
>>>>> lock stateid). The server is unaware of the lock so it can give that
>>>>> lock to somebody else. Yet this client thinks it has a local lock. It
>>>>> leads to inconsistent data between clients.
>>>>>
>>>>> On Mon, Apr 11, 2016 at 4:20 PM, Chuck Lever <[email protected]> wrote:
>>>>>> At Connectathon 2016, we found that recent upstream Linux clients
>>>>>> would occasionally send a LOCK operation with a zero stateid. This
>>>>>> appeared to happen in close proximity to another thread returning
>>>>>> a delegation before unlinking the same file while it remained open.
>>>>>>
>>>>>> Earlier, the client received a write delegation on this file and
>>>>>> returned the open stateid. Now, as it is getting ready to unlink the
>>>>>> file, it returns the write delegation. But there is still an open
>>>>>> file descriptor on that file, so the client must OPEN the file
>>>>>> again before it returns the delegation.
>>>>>>
>>>>>> Since commit 24311f884189 ('NFSv4: Recovery of recalled read
>>>>>> delegations is broken'), nfs_open_delegation_recall() clears the
>>>>>> NFS_DELEGATED_STATE flag _before_ it sends the OPEN. This allows a
>>>>>> racing LOCK on the same inode to be put on the wire before the OPEN
>>>>>> operation has returned a valid open stateid.
>>>>>>
>>>>>> To eliminate this race, serialize delegation return with the
>>>>>> acquisition of a file lock on the same file. Adopt the same approach
>>>>>> as is used in the unlock path.
>>>>>>
>>>>>> Fixes: 24311f884189 ('NFSv4: Recovery of recalled read ... ')
>>>>>> Signed-off-by: Chuck Lever <[email protected]>
>>>>>> ---
>>>>>> Hi-
>>>>>>
>>>>>> This fix appears to be both safe and effective. Please consider
>>>>>> it for v4.7 and for stable. Thanks!
>>>>>>
>>>>>>
>>>>>> fs/nfs/nfs4proc.c | 4 ++++
>>>>>> 1 file changed, 4 insertions(+)
>>>>>>
>>>>>> diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
>>>>>> index 01bef06..c40f1b6 100644
>>>>>> --- a/fs/nfs/nfs4proc.c
>>>>>> +++ b/fs/nfs/nfs4proc.c
>>>>>> @@ -6054,6 +6054,7 @@ static int nfs41_lock_expired(struct nfs4_state *state, struct file_lock *reques
>>>>>> static int _nfs4_proc_setlk(struct nfs4_state *state, int cmd, struct file_lock *request)
>>>>>> {
>>>>>> struct nfs_inode *nfsi = NFS_I(state->inode);
>>>>>> + struct nfs4_state_owner *sp = state->owner;
>>>>>> unsigned char fl_flags = request->fl_flags;
>>>>>> int status = -ENOLCK;
>>>>>>
>>>>>> @@ -6068,6 +6069,7 @@ static int _nfs4_proc_setlk(struct nfs4_state *state, int cmd, struct file_lock
>>>>>> status = do_vfs_lock(state->inode, request);
>>>>>> if (status < 0)
>>>>>> goto out;
>>>>>> + mutex_lock(&sp->so_delegreturn_mutex);
>>>
>>> From what I can tell, the first call to do_vfs_lock() in this function is used to test if we can take the lock locally. Do we need to worry about this racing with delegreturn, too?
>>
>> When I included that call in the critical section,
>> cthon04 locking tests deadlocked.
>
> Got it. Thanks for checking!
>
> Anna
>
>>
>>
>>> Thanks,
>>> Anna
>>>
>>>>>> down_read(&nfsi->rwsem);
>>>>>> if (test_bit(NFS_DELEGATED_STATE, &state->flags)) {
>>>>>> /* Yes: cache locks! */
>>>>>> @@ -6075,9 +6077,11 @@ static int _nfs4_proc_setlk(struct nfs4_state *state, int cmd, struct file_lock
>>>>>> request->fl_flags = fl_flags & ~FL_SLEEP;
>>>>>> status = do_vfs_lock(state->inode, request);
>>>>>> up_read(&nfsi->rwsem);
>>>>>> + mutex_unlock(&sp->so_delegreturn_mutex);
>>>>>> goto out;
>>>>>> }
>>>>>> up_read(&nfsi->rwsem);
>>>>>> + mutex_unlock(&sp->so_delegreturn_mutex);
>>>>>> status = _nfs4_do_setlk(state, cmd, request, NFS_LOCK_NEW);
>>>>>> out:
>>>>>> request->fl_flags = fl_flags;
>>
>> --
>> Chuck Lever
>>
>>
>>
>
> On Apr 28, 2016, at 11:56 AM, Olga Kornievskaia <[email protected]> wrote:
>
> Chuck or Anna,
>
> If the patch is accepted, do you mind expanding the commit message to
> include the wording about the LOCK and CB_RECALL race (so that it's
> documented to look back into it).
Anna's choice.
> On Thu, Apr 28, 2016 at 10:09 AM, Anna Schumaker
> <[email protected]> wrote:
>> On 04/28/2016 10:06 AM, Chuck Lever wrote:
>>>
>>>> On Apr 28, 2016, at 9:13 AM, Anna Schumaker <[email protected]> wrote:
>>>>
>>>> The patch looks pretty straightforward to me, and it sounds like it fixes a few problems that people are seeing. One question (below):
>>>>
>>>> On 04/28/2016 08:43 AM, William Dauchy wrote:
>>>>> Hello Anna,
>>>>>
>>>>> Could you have a look at this one please?
>>>>>
>>>>> Thanks,
>>>>>
>>>>> William
>>>>>
>>>>> On Tue, Apr 26, 2016 at 4:24 PM, Olga Kornievskaia <[email protected]> wrote:
>>>>>> I believe this patch also helps with a race between a LOCK and
>>>>>> CB_RECALL. Application does a lock as the delegation is being
>>>>>> recalled. The lock thread sees the delegated state and acquires a
>>>>>> local lock. At the same time delegation doesn't see it the lock yet
>>>>>> and returns the delegation. Application proceeds to do IO. It ends up
>>>>>> using an open stateid for the IO (as there is no delegation stateid or
>>>>>> lock stateid). The server is unaware of the lock so it can give that
>>>>>> lock to somebody else. Yet this client thinks it has a local lock. It
>>>>>> leads to inconsistent data between clients.
>>>>>>
>>>>>> On Mon, Apr 11, 2016 at 4:20 PM, Chuck Lever <[email protected]> wrote:
>>>>>>> At Connectathon 2016, we found that recent upstream Linux clients
>>>>>>> would occasionally send a LOCK operation with a zero stateid. This
>>>>>>> appeared to happen in close proximity to another thread returning
>>>>>>> a delegation before unlinking the same file while it remained open.
>>>>>>>
>>>>>>> Earlier, the client received a write delegation on this file and
>>>>>>> returned the open stateid. Now, as it is getting ready to unlink the
>>>>>>> file, it returns the write delegation. But there is still an open
>>>>>>> file descriptor on that file, so the client must OPEN the file
>>>>>>> again before it returns the delegation.
>>>>>>>
>>>>>>> Since commit 24311f884189 ('NFSv4: Recovery of recalled read
>>>>>>> delegations is broken'), nfs_open_delegation_recall() clears the
>>>>>>> NFS_DELEGATED_STATE flag _before_ it sends the OPEN. This allows a
>>>>>>> racing LOCK on the same inode to be put on the wire before the OPEN
>>>>>>> operation has returned a valid open stateid.
>>>>>>>
>>>>>>> To eliminate this race, serialize delegation return with the
>>>>>>> acquisition of a file lock on the same file. Adopt the same approach
>>>>>>> as is used in the unlock path.
>>>>>>>
>>>>>>> Fixes: 24311f884189 ('NFSv4: Recovery of recalled read ... ')
>>>>>>> Signed-off-by: Chuck Lever <[email protected]>
>>>>>>> ---
>>>>>>> Hi-
>>>>>>>
>>>>>>> This fix appears to be both safe and effective. Please consider
>>>>>>> it for v4.7 and for stable. Thanks!
>>>>>>>
>>>>>>>
>>>>>>> fs/nfs/nfs4proc.c | 4 ++++
>>>>>>> 1 file changed, 4 insertions(+)
>>>>>>>
>>>>>>> diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
>>>>>>> index 01bef06..c40f1b6 100644
>>>>>>> --- a/fs/nfs/nfs4proc.c
>>>>>>> +++ b/fs/nfs/nfs4proc.c
>>>>>>> @@ -6054,6 +6054,7 @@ static int nfs41_lock_expired(struct nfs4_state *state, struct file_lock *reques
>>>>>>> static int _nfs4_proc_setlk(struct nfs4_state *state, int cmd, struct file_lock *request)
>>>>>>> {
>>>>>>> struct nfs_inode *nfsi = NFS_I(state->inode);
>>>>>>> + struct nfs4_state_owner *sp = state->owner;
>>>>>>> unsigned char fl_flags = request->fl_flags;
>>>>>>> int status = -ENOLCK;
>>>>>>>
>>>>>>> @@ -6068,6 +6069,7 @@ static int _nfs4_proc_setlk(struct nfs4_state *state, int cmd, struct file_lock
>>>>>>> status = do_vfs_lock(state->inode, request);
>>>>>>> if (status < 0)
>>>>>>> goto out;
>>>>>>> + mutex_lock(&sp->so_delegreturn_mutex);
>>>>
>>>> From what I can tell, the first call to do_vfs_lock() in this function is used to test if we can take the lock locally. Do we need to worry about this racing with delegreturn, too?
>>>
>>> When I included that call in the critical section,
>>> cthon04 locking tests deadlocked.
>>
>> Got it. Thanks for checking!
>>
>> Anna
>>
>>>
>>>
>>>> Thanks,
>>>> Anna
>>>>
>>>>>>> down_read(&nfsi->rwsem);
>>>>>>> if (test_bit(NFS_DELEGATED_STATE, &state->flags)) {
>>>>>>> /* Yes: cache locks! */
>>>>>>> @@ -6075,9 +6077,11 @@ static int _nfs4_proc_setlk(struct nfs4_state *state, int cmd, struct file_lock
>>>>>>> request->fl_flags = fl_flags & ~FL_SLEEP;
>>>>>>> status = do_vfs_lock(state->inode, request);
>>>>>>> up_read(&nfsi->rwsem);
>>>>>>> + mutex_unlock(&sp->so_delegreturn_mutex);
>>>>>>> goto out;
>>>>>>> }
>>>>>>> up_read(&nfsi->rwsem);
>>>>>>> + mutex_unlock(&sp->so_delegreturn_mutex);
>>>>>>> status = _nfs4_do_setlk(state, cmd, request, NFS_LOCK_NEW);
>>>>>>> out:
>>>>>>> request->fl_flags = fl_flags;
>>>
>>> --
>>> Chuck Lever
>>>
>>>
>>>
>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
Chuck Lever
On 04/28/2016 12:05 PM, Chuck Lever wrote:
>
>> On Apr 28, 2016, at 11:56 AM, Olga Kornievskaia <[email protected]> wrote:
>>
>> Chuck or Anna,
>>
>> If the patch is accepted, do you mind expanding the commit message to
>> include the wording about the LOCK and CB_RECALL race (so that it's
>> documented to look back into it).
>
> Anna's choice.
Sounds like a good idea. Is there any particular wording that you want? If not, then I can try to base something off of your email from Tuesday (4/26).
Anna
>
>
>> On Thu, Apr 28, 2016 at 10:09 AM, Anna Schumaker
>> <[email protected]> wrote:
>>> On 04/28/2016 10:06 AM, Chuck Lever wrote:
>>>>
>>>>> On Apr 28, 2016, at 9:13 AM, Anna Schumaker <[email protected]> wrote:
>>>>>
>>>>> The patch looks pretty straightforward to me, and it sounds like it fixes a few problems that people are seeing. One question (below):
>>>>>
>>>>> On 04/28/2016 08:43 AM, William Dauchy wrote:
>>>>>> Hello Anna,
>>>>>>
>>>>>> Could you have a look at this one please?
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> William
>>>>>>
>>>>>> On Tue, Apr 26, 2016 at 4:24 PM, Olga Kornievskaia <[email protected]> wrote:
>>>>>>> I believe this patch also helps with a race between a LOCK and
>>>>>>> CB_RECALL. Application does a lock as the delegation is being
>>>>>>> recalled. The lock thread sees the delegated state and acquires a
>>>>>>> local lock. At the same time delegation doesn't see it the lock yet
>>>>>>> and returns the delegation. Application proceeds to do IO. It ends up
>>>>>>> using an open stateid for the IO (as there is no delegation stateid or
>>>>>>> lock stateid). The server is unaware of the lock so it can give that
>>>>>>> lock to somebody else. Yet this client thinks it has a local lock. It
>>>>>>> leads to inconsistent data between clients.
>>>>>>>
>>>>>>> On Mon, Apr 11, 2016 at 4:20 PM, Chuck Lever <[email protected]> wrote:
>>>>>>>> At Connectathon 2016, we found that recent upstream Linux clients
>>>>>>>> would occasionally send a LOCK operation with a zero stateid. This
>>>>>>>> appeared to happen in close proximity to another thread returning
>>>>>>>> a delegation before unlinking the same file while it remained open.
>>>>>>>>
>>>>>>>> Earlier, the client received a write delegation on this file and
>>>>>>>> returned the open stateid. Now, as it is getting ready to unlink the
>>>>>>>> file, it returns the write delegation. But there is still an open
>>>>>>>> file descriptor on that file, so the client must OPEN the file
>>>>>>>> again before it returns the delegation.
>>>>>>>>
>>>>>>>> Since commit 24311f884189 ('NFSv4: Recovery of recalled read
>>>>>>>> delegations is broken'), nfs_open_delegation_recall() clears the
>>>>>>>> NFS_DELEGATED_STATE flag _before_ it sends the OPEN. This allows a
>>>>>>>> racing LOCK on the same inode to be put on the wire before the OPEN
>>>>>>>> operation has returned a valid open stateid.
>>>>>>>>
>>>>>>>> To eliminate this race, serialize delegation return with the
>>>>>>>> acquisition of a file lock on the same file. Adopt the same approach
>>>>>>>> as is used in the unlock path.
>>>>>>>>
>>>>>>>> Fixes: 24311f884189 ('NFSv4: Recovery of recalled read ... ')
>>>>>>>> Signed-off-by: Chuck Lever <[email protected]>
>>>>>>>> ---
>>>>>>>> Hi-
>>>>>>>>
>>>>>>>> This fix appears to be both safe and effective. Please consider
>>>>>>>> it for v4.7 and for stable. Thanks!
>>>>>>>>
>>>>>>>>
>>>>>>>> fs/nfs/nfs4proc.c | 4 ++++
>>>>>>>> 1 file changed, 4 insertions(+)
>>>>>>>>
>>>>>>>> diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
>>>>>>>> index 01bef06..c40f1b6 100644
>>>>>>>> --- a/fs/nfs/nfs4proc.c
>>>>>>>> +++ b/fs/nfs/nfs4proc.c
>>>>>>>> @@ -6054,6 +6054,7 @@ static int nfs41_lock_expired(struct nfs4_state *state, struct file_lock *reques
>>>>>>>> static int _nfs4_proc_setlk(struct nfs4_state *state, int cmd, struct file_lock *request)
>>>>>>>> {
>>>>>>>> struct nfs_inode *nfsi = NFS_I(state->inode);
>>>>>>>> + struct nfs4_state_owner *sp = state->owner;
>>>>>>>> unsigned char fl_flags = request->fl_flags;
>>>>>>>> int status = -ENOLCK;
>>>>>>>>
>>>>>>>> @@ -6068,6 +6069,7 @@ static int _nfs4_proc_setlk(struct nfs4_state *state, int cmd, struct file_lock
>>>>>>>> status = do_vfs_lock(state->inode, request);
>>>>>>>> if (status < 0)
>>>>>>>> goto out;
>>>>>>>> + mutex_lock(&sp->so_delegreturn_mutex);
>>>>>
>>>>> From what I can tell, the first call to do_vfs_lock() in this function is used to test if we can take the lock locally. Do we need to worry about this racing with delegreturn, too?
>>>>
>>>> When I included that call in the critical section,
>>>> cthon04 locking tests deadlocked.
>>>
>>> Got it. Thanks for checking!
>>>
>>> Anna
>>>
>>>>
>>>>
>>>>> Thanks,
>>>>> Anna
>>>>>
>>>>>>>> down_read(&nfsi->rwsem);
>>>>>>>> if (test_bit(NFS_DELEGATED_STATE, &state->flags)) {
>>>>>>>> /* Yes: cache locks! */
>>>>>>>> @@ -6075,9 +6077,11 @@ static int _nfs4_proc_setlk(struct nfs4_state *state, int cmd, struct file_lock
>>>>>>>> request->fl_flags = fl_flags & ~FL_SLEEP;
>>>>>>>> status = do_vfs_lock(state->inode, request);
>>>>>>>> up_read(&nfsi->rwsem);
>>>>>>>> + mutex_unlock(&sp->so_delegreturn_mutex);
>>>>>>>> goto out;
>>>>>>>> }
>>>>>>>> up_read(&nfsi->rwsem);
>>>>>>>> + mutex_unlock(&sp->so_delegreturn_mutex);
>>>>>>>> status = _nfs4_do_setlk(state, cmd, request, NFS_LOCK_NEW);
>>>>>>>> out:
>>>>>>>> request->fl_flags = fl_flags;
>>>>
>>>> --
>>>> Chuck Lever
>>>>
>>>>
>>>>
>>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>> the body of a message to [email protected]
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
> --
> Chuck Lever
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
On Thu, Apr 28, 2016 at 1:22 PM, Anna Schumaker
<[email protected]> wrote:
> On 04/28/2016 12:05 PM, Chuck Lever wrote:
>>
>>> On Apr 28, 2016, at 11:56 AM, Olga Kornievskaia <[email protected]> wrote:
>>>
>>> Chuck or Anna,
>>>
>>> If the patch is accepted, do you mind expanding the commit message to
>>> include the wording about the LOCK and CB_RECALL race (so that it's
>>> documented to look back into it).
>>
>> Anna's choice.
>
> Sounds like a good idea. Is there any particular wording that you want? If not, then I can try to base something off of your email from Tuesday (4/26).
No particular wording. Could be as little as: "helps with LOCK and
CB_RECALL race" or could include my explanation of what happens from
Tuesday.
>
> Anna
>
>>
>>
>>> On Thu, Apr 28, 2016 at 10:09 AM, Anna Schumaker
>>> <[email protected]> wrote:
>>>> On 04/28/2016 10:06 AM, Chuck Lever wrote:
>>>>>
>>>>>> On Apr 28, 2016, at 9:13 AM, Anna Schumaker <[email protected]> wrote:
>>>>>>
>>>>>> The patch looks pretty straightforward to me, and it sounds like it fixes a few problems that people are seeing. One question (below):
>>>>>>
>>>>>> On 04/28/2016 08:43 AM, William Dauchy wrote:
>>>>>>> Hello Anna,
>>>>>>>
>>>>>>> Could you have a look at this one please?
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> William
>>>>>>>
>>>>>>> On Tue, Apr 26, 2016 at 4:24 PM, Olga Kornievskaia <[email protected]> wrote:
>>>>>>>> I believe this patch also helps with a race between a LOCK and
>>>>>>>> CB_RECALL. Application does a lock as the delegation is being
>>>>>>>> recalled. The lock thread sees the delegated state and acquires a
>>>>>>>> local lock. At the same time delegation doesn't see it the lock yet
>>>>>>>> and returns the delegation. Application proceeds to do IO. It ends up
>>>>>>>> using an open stateid for the IO (as there is no delegation stateid or
>>>>>>>> lock stateid). The server is unaware of the lock so it can give that
>>>>>>>> lock to somebody else. Yet this client thinks it has a local lock. It
>>>>>>>> leads to inconsistent data between clients.
>>>>>>>>
>>>>>>>> On Mon, Apr 11, 2016 at 4:20 PM, Chuck Lever <[email protected]> wrote:
>>>>>>>>> At Connectathon 2016, we found that recent upstream Linux clients
>>>>>>>>> would occasionally send a LOCK operation with a zero stateid. This
>>>>>>>>> appeared to happen in close proximity to another thread returning
>>>>>>>>> a delegation before unlinking the same file while it remained open.
>>>>>>>>>
>>>>>>>>> Earlier, the client received a write delegation on this file and
>>>>>>>>> returned the open stateid. Now, as it is getting ready to unlink the
>>>>>>>>> file, it returns the write delegation. But there is still an open
>>>>>>>>> file descriptor on that file, so the client must OPEN the file
>>>>>>>>> again before it returns the delegation.
>>>>>>>>>
>>>>>>>>> Since commit 24311f884189 ('NFSv4: Recovery of recalled read
>>>>>>>>> delegations is broken'), nfs_open_delegation_recall() clears the
>>>>>>>>> NFS_DELEGATED_STATE flag _before_ it sends the OPEN. This allows a
>>>>>>>>> racing LOCK on the same inode to be put on the wire before the OPEN
>>>>>>>>> operation has returned a valid open stateid.
>>>>>>>>>
>>>>>>>>> To eliminate this race, serialize delegation return with the
>>>>>>>>> acquisition of a file lock on the same file. Adopt the same approach
>>>>>>>>> as is used in the unlock path.
>>>>>>>>>
>>>>>>>>> Fixes: 24311f884189 ('NFSv4: Recovery of recalled read ... ')
>>>>>>>>> Signed-off-by: Chuck Lever <[email protected]>
>>>>>>>>> ---
>>>>>>>>> Hi-
>>>>>>>>>
>>>>>>>>> This fix appears to be both safe and effective. Please consider
>>>>>>>>> it for v4.7 and for stable. Thanks!
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> fs/nfs/nfs4proc.c | 4 ++++
>>>>>>>>> 1 file changed, 4 insertions(+)
>>>>>>>>>
>>>>>>>>> diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
>>>>>>>>> index 01bef06..c40f1b6 100644
>>>>>>>>> --- a/fs/nfs/nfs4proc.c
>>>>>>>>> +++ b/fs/nfs/nfs4proc.c
>>>>>>>>> @@ -6054,6 +6054,7 @@ static int nfs41_lock_expired(struct nfs4_state *state, struct file_lock *reques
>>>>>>>>> static int _nfs4_proc_setlk(struct nfs4_state *state, int cmd, struct file_lock *request)
>>>>>>>>> {
>>>>>>>>> struct nfs_inode *nfsi = NFS_I(state->inode);
>>>>>>>>> + struct nfs4_state_owner *sp = state->owner;
>>>>>>>>> unsigned char fl_flags = request->fl_flags;
>>>>>>>>> int status = -ENOLCK;
>>>>>>>>>
>>>>>>>>> @@ -6068,6 +6069,7 @@ static int _nfs4_proc_setlk(struct nfs4_state *state, int cmd, struct file_lock
>>>>>>>>> status = do_vfs_lock(state->inode, request);
>>>>>>>>> if (status < 0)
>>>>>>>>> goto out;
>>>>>>>>> + mutex_lock(&sp->so_delegreturn_mutex);
>>>>>>
>>>>>> From what I can tell, the first call to do_vfs_lock() in this function is used to test if we can take the lock locally. Do we need to worry about this racing with delegreturn, too?
>>>>>
>>>>> When I included that call in the critical section,
>>>>> cthon04 locking tests deadlocked.
>>>>
>>>> Got it. Thanks for checking!
>>>>
>>>> Anna
>>>>
>>>>>
>>>>>
>>>>>> Thanks,
>>>>>> Anna
>>>>>>
>>>>>>>>> down_read(&nfsi->rwsem);
>>>>>>>>> if (test_bit(NFS_DELEGATED_STATE, &state->flags)) {
>>>>>>>>> /* Yes: cache locks! */
>>>>>>>>> @@ -6075,9 +6077,11 @@ static int _nfs4_proc_setlk(struct nfs4_state *state, int cmd, struct file_lock
>>>>>>>>> request->fl_flags = fl_flags & ~FL_SLEEP;
>>>>>>>>> status = do_vfs_lock(state->inode, request);
>>>>>>>>> up_read(&nfsi->rwsem);
>>>>>>>>> + mutex_unlock(&sp->so_delegreturn_mutex);
>>>>>>>>> goto out;
>>>>>>>>> }
>>>>>>>>> up_read(&nfsi->rwsem);
>>>>>>>>> + mutex_unlock(&sp->so_delegreturn_mutex);
>>>>>>>>> status = _nfs4_do_setlk(state, cmd, request, NFS_LOCK_NEW);
>>>>>>>>> out:
>>>>>>>>> request->fl_flags = fl_flags;
>>>>>
>>>>> --
>>>>> Chuck Lever
>>>>>
>>>>>
>>>>>
>>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>>> the body of a message to [email protected]
>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
>> --
>> Chuck Lever
>>
>>
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>> the body of a message to [email protected]
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
>
On 04/28/2016 01:40 PM, Olga Kornievskaia wrote:
> On Thu, Apr 28, 2016 at 1:22 PM, Anna Schumaker
> <[email protected]> wrote:
>> On 04/28/2016 12:05 PM, Chuck Lever wrote:
>>>
>>>> On Apr 28, 2016, at 11:56 AM, Olga Kornievskaia <[email protected]> wrote:
>>>>
>>>> Chuck or Anna,
>>>>
>>>> If the patch is accepted, do you mind expanding the commit message to
>>>> include the wording about the LOCK and CB_RECALL race (so that it's
>>>> documented to look back into it).
>>>
>>> Anna's choice.
>>
>> Sounds like a good idea. Is there any particular wording that you want? If not, then I can try to base something off of your email from Tuesday (4/26).
>
> No particular wording. Could be as little as: "helps with LOCK and
> CB_RECALL race" or could include my explanation of what happens from
> Tuesday.
Okay, how does this look? http://git.linux-nfs.org/?p=anna/linux-nfs.git;a=commit;h=aa56ecf86281edd8dd488484596675813928f140
As a side note, I just put together a [testing] branch with this patch and all the others I could find from the last month or so. Please let me know if it looks like I'm missing anything!
Thanks,
Anna
>
>>
>> Anna
>>
>>>
>>>
>>>> On Thu, Apr 28, 2016 at 10:09 AM, Anna Schumaker
>>>> <[email protected]> wrote:
>>>>> On 04/28/2016 10:06 AM, Chuck Lever wrote:
>>>>>>
>>>>>>> On Apr 28, 2016, at 9:13 AM, Anna Schumaker <[email protected]> wrote:
>>>>>>>
>>>>>>> The patch looks pretty straightforward to me, and it sounds like it fixes a few problems that people are seeing. One question (below):
>>>>>>>
>>>>>>> On 04/28/2016 08:43 AM, William Dauchy wrote:
>>>>>>>> Hello Anna,
>>>>>>>>
>>>>>>>> Could you have a look at this one please?
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>>
>>>>>>>> William
>>>>>>>>
>>>>>>>> On Tue, Apr 26, 2016 at 4:24 PM, Olga Kornievskaia <[email protected]> wrote:
>>>>>>>>> I believe this patch also helps with a race between a LOCK and
>>>>>>>>> CB_RECALL. Application does a lock as the delegation is being
>>>>>>>>> recalled. The lock thread sees the delegated state and acquires a
>>>>>>>>> local lock. At the same time delegation doesn't see it the lock yet
>>>>>>>>> and returns the delegation. Application proceeds to do IO. It ends up
>>>>>>>>> using an open stateid for the IO (as there is no delegation stateid or
>>>>>>>>> lock stateid). The server is unaware of the lock so it can give that
>>>>>>>>> lock to somebody else. Yet this client thinks it has a local lock. It
>>>>>>>>> leads to inconsistent data between clients.
>>>>>>>>>
>>>>>>>>> On Mon, Apr 11, 2016 at 4:20 PM, Chuck Lever <[email protected]> wrote:
>>>>>>>>>> At Connectathon 2016, we found that recent upstream Linux clients
>>>>>>>>>> would occasionally send a LOCK operation with a zero stateid. This
>>>>>>>>>> appeared to happen in close proximity to another thread returning
>>>>>>>>>> a delegation before unlinking the same file while it remained open.
>>>>>>>>>>
>>>>>>>>>> Earlier, the client received a write delegation on this file and
>>>>>>>>>> returned the open stateid. Now, as it is getting ready to unlink the
>>>>>>>>>> file, it returns the write delegation. But there is still an open
>>>>>>>>>> file descriptor on that file, so the client must OPEN the file
>>>>>>>>>> again before it returns the delegation.
>>>>>>>>>>
>>>>>>>>>> Since commit 24311f884189 ('NFSv4: Recovery of recalled read
>>>>>>>>>> delegations is broken'), nfs_open_delegation_recall() clears the
>>>>>>>>>> NFS_DELEGATED_STATE flag _before_ it sends the OPEN. This allows a
>>>>>>>>>> racing LOCK on the same inode to be put on the wire before the OPEN
>>>>>>>>>> operation has returned a valid open stateid.
>>>>>>>>>>
>>>>>>>>>> To eliminate this race, serialize delegation return with the
>>>>>>>>>> acquisition of a file lock on the same file. Adopt the same approach
>>>>>>>>>> as is used in the unlock path.
>>>>>>>>>>
>>>>>>>>>> Fixes: 24311f884189 ('NFSv4: Recovery of recalled read ... ')
>>>>>>>>>> Signed-off-by: Chuck Lever <[email protected]>
>>>>>>>>>> ---
>>>>>>>>>> Hi-
>>>>>>>>>>
>>>>>>>>>> This fix appears to be both safe and effective. Please consider
>>>>>>>>>> it for v4.7 and for stable. Thanks!
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> fs/nfs/nfs4proc.c | 4 ++++
>>>>>>>>>> 1 file changed, 4 insertions(+)
>>>>>>>>>>
>>>>>>>>>> diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
>>>>>>>>>> index 01bef06..c40f1b6 100644
>>>>>>>>>> --- a/fs/nfs/nfs4proc.c
>>>>>>>>>> +++ b/fs/nfs/nfs4proc.c
>>>>>>>>>> @@ -6054,6 +6054,7 @@ static int nfs41_lock_expired(struct nfs4_state *state, struct file_lock *reques
>>>>>>>>>> static int _nfs4_proc_setlk(struct nfs4_state *state, int cmd, struct file_lock *request)
>>>>>>>>>> {
>>>>>>>>>> struct nfs_inode *nfsi = NFS_I(state->inode);
>>>>>>>>>> + struct nfs4_state_owner *sp = state->owner;
>>>>>>>>>> unsigned char fl_flags = request->fl_flags;
>>>>>>>>>> int status = -ENOLCK;
>>>>>>>>>>
>>>>>>>>>> @@ -6068,6 +6069,7 @@ static int _nfs4_proc_setlk(struct nfs4_state *state, int cmd, struct file_lock
>>>>>>>>>> status = do_vfs_lock(state->inode, request);
>>>>>>>>>> if (status < 0)
>>>>>>>>>> goto out;
>>>>>>>>>> + mutex_lock(&sp->so_delegreturn_mutex);
>>>>>>>
>>>>>>> From what I can tell, the first call to do_vfs_lock() in this function is used to test if we can take the lock locally. Do we need to worry about this racing with delegreturn, too?
>>>>>>
>>>>>> When I included that call in the critical section,
>>>>>> cthon04 locking tests deadlocked.
>>>>>
>>>>> Got it. Thanks for checking!
>>>>>
>>>>> Anna
>>>>>
>>>>>>
>>>>>>
>>>>>>> Thanks,
>>>>>>> Anna
>>>>>>>
>>>>>>>>>> down_read(&nfsi->rwsem);
>>>>>>>>>> if (test_bit(NFS_DELEGATED_STATE, &state->flags)) {
>>>>>>>>>> /* Yes: cache locks! */
>>>>>>>>>> @@ -6075,9 +6077,11 @@ static int _nfs4_proc_setlk(struct nfs4_state *state, int cmd, struct file_lock
>>>>>>>>>> request->fl_flags = fl_flags & ~FL_SLEEP;
>>>>>>>>>> status = do_vfs_lock(state->inode, request);
>>>>>>>>>> up_read(&nfsi->rwsem);
>>>>>>>>>> + mutex_unlock(&sp->so_delegreturn_mutex);
>>>>>>>>>> goto out;
>>>>>>>>>> }
>>>>>>>>>> up_read(&nfsi->rwsem);
>>>>>>>>>> + mutex_unlock(&sp->so_delegreturn_mutex);
>>>>>>>>>> status = _nfs4_do_setlk(state, cmd, request, NFS_LOCK_NEW);
>>>>>>>>>> out:
>>>>>>>>>> request->fl_flags = fl_flags;
>>>>>>
>>>>>> --
>>>>>> Chuck Lever
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>>>> the body of a message to [email protected]
>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>
>>> --
>>> Chuck Lever
>>>
>>>
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>>> the body of a message to [email protected]
>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>
>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
On Thu, Apr 28, 2016 at 5:11 PM, Anna Schumaker
<[email protected]> wrote:
> On 04/28/2016 01:40 PM, Olga Kornievskaia wrote:
>> On Thu, Apr 28, 2016 at 1:22 PM, Anna Schumaker
>> <[email protected]> wrote:
>>> On 04/28/2016 12:05 PM, Chuck Lever wrote:
>>>>
>>>>> On Apr 28, 2016, at 11:56 AM, Olga Kornievskaia <[email protected]> wrote:
>>>>>
>>>>> Chuck or Anna,
>>>>>
>>>>> If the patch is accepted, do you mind expanding the commit message to
>>>>> include the wording about the LOCK and CB_RECALL race (so that it's
>>>>> documented to look back into it).
>>>>
>>>> Anna's choice.
>>>
>>> Sounds like a good idea. Is there any particular wording that you want? If not, then I can try to base something off of your email from Tuesday (4/26).
>>
>> No particular wording. Could be as little as: "helps with LOCK and
>> CB_RECALL race" or could include my explanation of what happens from
>> Tuesday.
>
> Okay, how does this look? http://git.linux-nfs.org/?p=anna/linux-nfs.git;a=commit;h=aa56ecf86281edd8dd488484596675813928f140
looks good to me.
> As a side note, I just put together a [testing] branch with this patch and all the others I could find from the last month or so. Please let me know if it looks like I'm missing anything!
>
> Thanks,
> Anna
>
>>
>>>
>>> Anna
>>>
>>>>
>>>>
>>>>> On Thu, Apr 28, 2016 at 10:09 AM, Anna Schumaker
>>>>> <[email protected]> wrote:
>>>>>> On 04/28/2016 10:06 AM, Chuck Lever wrote:
>>>>>>>
>>>>>>>> On Apr 28, 2016, at 9:13 AM, Anna Schumaker <[email protected]> wrote:
>>>>>>>>
>>>>>>>> The patch looks pretty straightforward to me, and it sounds like it fixes a few problems that people are seeing. One question (below):
>>>>>>>>
>>>>>>>> On 04/28/2016 08:43 AM, William Dauchy wrote:
>>>>>>>>> Hello Anna,
>>>>>>>>>
>>>>>>>>> Could you have a look at this one please?
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>>
>>>>>>>>> William
>>>>>>>>>
>>>>>>>>> On Tue, Apr 26, 2016 at 4:24 PM, Olga Kornievskaia <[email protected]> wrote:
>>>>>>>>>> I believe this patch also helps with a race between a LOCK and
>>>>>>>>>> CB_RECALL. Application does a lock as the delegation is being
>>>>>>>>>> recalled. The lock thread sees the delegated state and acquires a
>>>>>>>>>> local lock. At the same time delegation doesn't see it the lock yet
>>>>>>>>>> and returns the delegation. Application proceeds to do IO. It ends up
>>>>>>>>>> using an open stateid for the IO (as there is no delegation stateid or
>>>>>>>>>> lock stateid). The server is unaware of the lock so it can give that
>>>>>>>>>> lock to somebody else. Yet this client thinks it has a local lock. It
>>>>>>>>>> leads to inconsistent data between clients.
>>>>>>>>>>
>>>>>>>>>> On Mon, Apr 11, 2016 at 4:20 PM, Chuck Lever <[email protected]> wrote:
>>>>>>>>>>> At Connectathon 2016, we found that recent upstream Linux clients
>>>>>>>>>>> would occasionally send a LOCK operation with a zero stateid. This
>>>>>>>>>>> appeared to happen in close proximity to another thread returning
>>>>>>>>>>> a delegation before unlinking the same file while it remained open.
>>>>>>>>>>>
>>>>>>>>>>> Earlier, the client received a write delegation on this file and
>>>>>>>>>>> returned the open stateid. Now, as it is getting ready to unlink the
>>>>>>>>>>> file, it returns the write delegation. But there is still an open
>>>>>>>>>>> file descriptor on that file, so the client must OPEN the file
>>>>>>>>>>> again before it returns the delegation.
>>>>>>>>>>>
>>>>>>>>>>> Since commit 24311f884189 ('NFSv4: Recovery of recalled read
>>>>>>>>>>> delegations is broken'), nfs_open_delegation_recall() clears the
>>>>>>>>>>> NFS_DELEGATED_STATE flag _before_ it sends the OPEN. This allows a
>>>>>>>>>>> racing LOCK on the same inode to be put on the wire before the OPEN
>>>>>>>>>>> operation has returned a valid open stateid.
>>>>>>>>>>>
>>>>>>>>>>> To eliminate this race, serialize delegation return with the
>>>>>>>>>>> acquisition of a file lock on the same file. Adopt the same approach
>>>>>>>>>>> as is used in the unlock path.
>>>>>>>>>>>
>>>>>>>>>>> Fixes: 24311f884189 ('NFSv4: Recovery of recalled read ... ')
>>>>>>>>>>> Signed-off-by: Chuck Lever <[email protected]>
>>>>>>>>>>> ---
>>>>>>>>>>> Hi-
>>>>>>>>>>>
>>>>>>>>>>> This fix appears to be both safe and effective. Please consider
>>>>>>>>>>> it for v4.7 and for stable. Thanks!
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> fs/nfs/nfs4proc.c | 4 ++++
>>>>>>>>>>> 1 file changed, 4 insertions(+)
>>>>>>>>>>>
>>>>>>>>>>> diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
>>>>>>>>>>> index 01bef06..c40f1b6 100644
>>>>>>>>>>> --- a/fs/nfs/nfs4proc.c
>>>>>>>>>>> +++ b/fs/nfs/nfs4proc.c
>>>>>>>>>>> @@ -6054,6 +6054,7 @@ static int nfs41_lock_expired(struct nfs4_state *state, struct file_lock *reques
>>>>>>>>>>> static int _nfs4_proc_setlk(struct nfs4_state *state, int cmd, struct file_lock *request)
>>>>>>>>>>> {
>>>>>>>>>>> struct nfs_inode *nfsi = NFS_I(state->inode);
>>>>>>>>>>> + struct nfs4_state_owner *sp = state->owner;
>>>>>>>>>>> unsigned char fl_flags = request->fl_flags;
>>>>>>>>>>> int status = -ENOLCK;
>>>>>>>>>>>
>>>>>>>>>>> @@ -6068,6 +6069,7 @@ static int _nfs4_proc_setlk(struct nfs4_state *state, int cmd, struct file_lock
>>>>>>>>>>> status = do_vfs_lock(state->inode, request);
>>>>>>>>>>> if (status < 0)
>>>>>>>>>>> goto out;
>>>>>>>>>>> + mutex_lock(&sp->so_delegreturn_mutex);
>>>>>>>>
>>>>>>>> From what I can tell, the first call to do_vfs_lock() in this function is used to test if we can take the lock locally. Do we need to worry about this racing with delegreturn, too?
>>>>>>>
>>>>>>> When I included that call in the critical section,
>>>>>>> cthon04 locking tests deadlocked.
>>>>>>
>>>>>> Got it. Thanks for checking!
>>>>>>
>>>>>> Anna
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Anna
>>>>>>>>
>>>>>>>>>>> down_read(&nfsi->rwsem);
>>>>>>>>>>> if (test_bit(NFS_DELEGATED_STATE, &state->flags)) {
>>>>>>>>>>> /* Yes: cache locks! */
>>>>>>>>>>> @@ -6075,9 +6077,11 @@ static int _nfs4_proc_setlk(struct nfs4_state *state, int cmd, struct file_lock
>>>>>>>>>>> request->fl_flags = fl_flags & ~FL_SLEEP;
>>>>>>>>>>> status = do_vfs_lock(state->inode, request);
>>>>>>>>>>> up_read(&nfsi->rwsem);
>>>>>>>>>>> + mutex_unlock(&sp->so_delegreturn_mutex);
>>>>>>>>>>> goto out;
>>>>>>>>>>> }
>>>>>>>>>>> up_read(&nfsi->rwsem);
>>>>>>>>>>> + mutex_unlock(&sp->so_delegreturn_mutex);
>>>>>>>>>>> status = _nfs4_do_setlk(state, cmd, request, NFS_LOCK_NEW);
>>>>>>>>>>> out:
>>>>>>>>>>> request->fl_flags = fl_flags;
>>>>>>>
>>>>>>> --
>>>>>>> Chuck Lever
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>> --
>>>>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>>>>> the body of a message to [email protected]
>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>>
>>>> --
>>>> Chuck Lever
>>>>
>>>>
>>>>
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>>>> the body of a message to [email protected]
>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>>
>>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>> the body of a message to [email protected]
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
>