There is a race between return of a delegation (due to resource
constraints and inode getting evicted) and an open for the same file
because….
In nfs_inode_return_delegation_noreclaim() we detach that delegation
from the inode first (nfs_inode_detach_delegation()) and then we send
a delegreturn. In between of nfs_inode_detach_delegation() and
nfs_do_return_delegation() an open request comes and it doesn’t see
any delegations for this inode and thus it sends an open request. At
the same time eviction thread is working on doing a delegreturn for
this inode. The server gets request for the open first and returns a
delegation for it (note: server will issue the same delegation stateid
but different seqnum as the one currently being returned by the
client). Then the server processes a delegreturn for this file (from
its perspective delegation stateid is no longer valid). Yet, on the
client side as it processes a reply from the server, it adds a “new”
delegation to the inode and proceeds to use it for an IO request later
which errors in BAD_STATEID.
I don't see how we can make detaching the delegation and the return of
it atomic. Instead I propose the following solution:
I propose to examine the delegation we get on open. If the sequence
number is not 0 and we don't have a delegation already then we must
have experienced this race. Therefore, we should return the delegation
(and ignore if this errors, as the server thinks there is no more
delegation). Something like this. Disclaimer: it doesn't seem to be
the actual fix (as it hangs the machine) so I'm still missing
something ....
diff --git a/fs/nfs/delegation.c b/fs/nfs/delegation.c
index b76166b..01bba63 100644
--- a/fs/nfs/delegation.c
+++ b/fs/nfs/delegation.c
@@ -368,5 +368,12 @@ int nfs_inode_set_delegation(struct inode *inode,
struct rpc_cred *cred, struct
old_delegation, clp);
if (freeme == NULL)
goto out;
+ } else {
+ if (be32_to_cpu(delegation->stateid.seqid) > 0) {
+ nfs_do_return_delegation(inode, delegation, 0);
+ goto out;
+ }
}
list_add_rcu(&delegation->super_list, &server->delegations);
nfsi->delegation_state = delegation->type;
Hi Olga,
On Wed, Jan 21, 2015 at 5:29 PM, Olga Kornievskaia <[email protected]> wrote:
> There is a race between return of a delegation (due to resource
> constraints and inode getting evicted) and an open for the same file
> because….
>
> In nfs_inode_return_delegation_noreclaim() we detach that delegation
> from the inode first (nfs_inode_detach_delegation()) and then we send
> a delegreturn. In between of nfs_inode_detach_delegation() and
> nfs_do_return_delegation() an open request comes and it doesn’t see
> any delegations for this inode and thus it sends an open request. At
> the same time eviction thread is working on doing a delegreturn for
> this inode. The server gets request for the open first and returns a
> delegation for it (note: server will issue the same delegation stateid
> but different seqnum as the one currently being returned by the
> client). Then the server processes a delegreturn for this file (from
> its perspective delegation stateid is no longer valid). Yet, on the
> client side as it processes a reply from the server, it adds a “new”
> delegation to the inode and proceeds to use it for an IO request later
> which errors in BAD_STATEID.
>
> I don't see how we can make detaching the delegation and the return of
> it atomic. Instead I propose the following solution:
> I propose to examine the delegation we get on open. If the sequence
> number is not 0 and we don't have a delegation already then we must
> have experienced this race. Therefore, we should return the delegation
> (and ignore if this errors, as the server thinks there is no more
> delegation). Something like this. Disclaimer: it doesn't seem to be
> the actual fix (as it hangs the machine) so I'm still missing
> something ....
>
>
> diff --git a/fs/nfs/delegation.c b/fs/nfs/delegation.c
> index b76166b..01bba63 100644
> --- a/fs/nfs/delegation.c
> +++ b/fs/nfs/delegation.c
> @@ -368,5 +368,12 @@ int nfs_inode_set_delegation(struct inode *inode,
> struct rpc_cred *cred, struct
> old_delegation, clp);
> if (freeme == NULL)
> goto out;
> + } else {
> + if (be32_to_cpu(delegation->stateid.seqid) > 0) {
> + nfs_do_return_delegation(inode, delegation, 0);
> + goto out;
> + }
> }
> list_add_rcu(&delegation->super_list, &server->delegations);
> nfsi->delegation_state = delegation->type;
Shouldn't the seqid always be non-zero? AFAIK, RFC5661 reserves seqid
== 0 for the special case where the client is asking the server to
'please substitute the most recent value' (see section 8.2.2).
Normally, the server is therefore supposed to start with seqid == 1,
and increment so that when the seqid wraps around, it skips over the
value '0'.
Also, please note that nfs_do_return_delegation() may sleep, so you
cannot call it while holding a spinlock.
--
Trond Myklebust
Linux NFS client maintainer, PrimaryData
[email protected]
On Wed, Jan 21, 2015 at 5:48 PM, Trond Myklebust
<[email protected]> wrote:
> Hi Olga,
>
> On Wed, Jan 21, 2015 at 5:29 PM, Olga Kornievskaia <[email protected]> wrote:
>> There is a race between return of a delegation (due to resource
>> constraints and inode getting evicted) and an open for the same file
>> because….
>>
>> In nfs_inode_return_delegation_noreclaim() we detach that delegation
>> from the inode first (nfs_inode_detach_delegation()) and then we send
>> a delegreturn. In between of nfs_inode_detach_delegation() and
>> nfs_do_return_delegation() an open request comes and it doesn’t see
>> any delegations for this inode and thus it sends an open request. At
>> the same time eviction thread is working on doing a delegreturn for
>> this inode. The server gets request for the open first and returns a
>> delegation for it (note: server will issue the same delegation stateid
>> but different seqnum as the one currently being returned by the
>> client). Then the server processes a delegreturn for this file (from
>> its perspective delegation stateid is no longer valid). Yet, on the
>> client side as it processes a reply from the server, it adds a “new”
>> delegation to the inode and proceeds to use it for an IO request later
>> which errors in BAD_STATEID.
>>
>> I don't see how we can make detaching the delegation and the return of
>> it atomic. Instead I propose the following solution:
>> I propose to examine the delegation we get on open. If the sequence
>> number is not 0 and we don't have a delegation already then we must
>> have experienced this race. Therefore, we should return the delegation
>> (and ignore if this errors, as the server thinks there is no more
>> delegation). Something like this. Disclaimer: it doesn't seem to be
>> the actual fix (as it hangs the machine) so I'm still missing
>> something ....
>>
>>
>> diff --git a/fs/nfs/delegation.c b/fs/nfs/delegation.c
>> index b76166b..01bba63 100644
>> --- a/fs/nfs/delegation.c
>> +++ b/fs/nfs/delegation.c
>> @@ -368,5 +368,12 @@ int nfs_inode_set_delegation(struct inode *inode,
>> struct rpc_cred *cred, struct
>> old_delegation, clp);
>> if (freeme == NULL)
>> goto out;
>> + } else {
>> + if (be32_to_cpu(delegation->stateid.seqid) > 0) {
>> + nfs_do_return_delegation(inode, delegation, 0);
>> + goto out;
>> + }
>> }
>> list_add_rcu(&delegation->super_list, &server->delegations);
>> nfsi->delegation_state = delegation->type;
>
> Shouldn't the seqid always be non-zero? AFAIK, RFC5661 reserves seqid
> == 0 for the special case where the client is asking the server to
> 'please substitute the most recent value' (see section 8.2.2).
> Normally, the server is therefore supposed to start with seqid == 1,
> and increment so that when the seqid wraps around, it skips over the
> value '0'.
BTW: We should probably add in a check to protect against servers that
return seqid == 0. I suspect that could cause some really nasty
behaviour if/when the client is trying to close a file or delegreturn
while there is an on-the-wire OPEN.
> Also, please note that nfs_do_return_delegation() may sleep, so you
> cannot call it while holding a spinlock.
--
Trond Myklebust
Linux NFS client maintainer, PrimaryData
[email protected]
On Wed, Jan 21, 2015 at 5:48 PM, Trond Myklebust
<[email protected]> wrote:
> Hi Olga,
>
> On Wed, Jan 21, 2015 at 5:29 PM, Olga Kornievskaia <[email protected]> wrote:
>> There is a race between return of a delegation (due to resource
>> constraints and inode getting evicted) and an open for the same file
>> because….
>>
>> In nfs_inode_return_delegation_noreclaim() we detach that delegation
>> from the inode first (nfs_inode_detach_delegation()) and then we send
>> a delegreturn. In between of nfs_inode_detach_delegation() and
>> nfs_do_return_delegation() an open request comes and it doesn’t see
>> any delegations for this inode and thus it sends an open request. At
>> the same time eviction thread is working on doing a delegreturn for
>> this inode. The server gets request for the open first and returns a
>> delegation for it (note: server will issue the same delegation stateid
>> but different seqnum as the one currently being returned by the
>> client). Then the server processes a delegreturn for this file (from
>> its perspective delegation stateid is no longer valid). Yet, on the
>> client side as it processes a reply from the server, it adds a “new”
>> delegation to the inode and proceeds to use it for an IO request later
>> which errors in BAD_STATEID.
>>
>> I don't see how we can make detaching the delegation and the return of
>> it atomic. Instead I propose the following solution:
>> I propose to examine the delegation we get on open. If the sequence
>> number is not 0 and we don't have a delegation already then we must
>> have experienced this race. Therefore, we should return the delegation
>> (and ignore if this errors, as the server thinks there is no more
>> delegation). Something like this. Disclaimer: it doesn't seem to be
>> the actual fix (as it hangs the machine) so I'm still missing
>> something ....
>>
>>
>> diff --git a/fs/nfs/delegation.c b/fs/nfs/delegation.c
>> index b76166b..01bba63 100644
>> --- a/fs/nfs/delegation.c
>> +++ b/fs/nfs/delegation.c
>> @@ -368,5 +368,12 @@ int nfs_inode_set_delegation(struct inode *inode,
>> struct rpc_cred *cred, struct
>> old_delegation, clp);
>> if (freeme == NULL)
>> goto out;
>> + } else {
>> + if (be32_to_cpu(delegation->stateid.seqid) > 0) {
>> + nfs_do_return_delegation(inode, delegation, 0);
>> + goto out;
>> + }
>> }
>> list_add_rcu(&delegation->super_list, &server->delegations);
>> nfsi->delegation_state = delegation->type;
>
> Shouldn't the seqid always be non-zero? AFAIK, RFC5661 reserves seqid
> == 0 for the special case where the client is asking the server to
> 'please substitute the most recent value' (see section 8.2.2).
> Normally, the server is therefore supposed to start with seqid == 1,
> and increment so that when the seqid wraps around, it skips over the
> value '0'.
I wrongly assumed that first seq num is 0 because I've been seeing
seqid of 1 being returned in this case. So if we assume the server is
acting properly and returning the next up (so patch would check that >
1). What do you think about that solution?
>
> Also, please note that nfs_do_return_delegation() may sleep, so you
> cannot call it while holding a spin lock.
Yes, that's why I couldn't think of solution where we guarantee
atomicity of detaching the delegation and making sure it's returned
before servicing an open for the same file.
>
> --
> Trond Myklebust
> Linux NFS client maintainer, PrimaryData
> [email protected]
On Wed, Jan 21, 2015 at 6:52 PM, Olga Kornievskaia <[email protected]> wrote:
> On Wed, Jan 21, 2015 at 5:48 PM, Trond Myklebust
> <[email protected]> wrote:
>> Hi Olga,
>>
>> On Wed, Jan 21, 2015 at 5:29 PM, Olga Kornievskaia <[email protected]> wrote:
>>> There is a race between return of a delegation (due to resource
>>> constraints and inode getting evicted) and an open for the same file
>>> because….
>>>
>>> In nfs_inode_return_delegation_noreclaim() we detach that delegation
>>> from the inode first (nfs_inode_detach_delegation()) and then we send
>>> a delegreturn. In between of nfs_inode_detach_delegation() and
>>> nfs_do_return_delegation() an open request comes and it doesn’t see
>>> any delegations for this inode and thus it sends an open request. At
>>> the same time eviction thread is working on doing a delegreturn for
>>> this inode. The server gets request for the open first and returns a
>>> delegation for it (note: server will issue the same delegation stateid
>>> but different seqnum as the one currently being returned by the
>>> client). Then the server processes a delegreturn for this file (from
>>> its perspective delegation stateid is no longer valid). Yet, on the
>>> client side as it processes a reply from the server, it adds a “new”
>>> delegation to the inode and proceeds to use it for an IO request later
>>> which errors in BAD_STATEID.
>>>
>>> I don't see how we can make detaching the delegation and the return of
>>> it atomic. Instead I propose the following solution:
>>> I propose to examine the delegation we get on open. If the sequence
>>> number is not 0 and we don't have a delegation already then we must
>>> have experienced this race. Therefore, we should return the delegation
>>> (and ignore if this errors, as the server thinks there is no more
>>> delegation). Something like this. Disclaimer: it doesn't seem to be
>>> the actual fix (as it hangs the machine) so I'm still missing
>>> something ....
>>>
>>>
>>> diff --git a/fs/nfs/delegation.c b/fs/nfs/delegation.c
>>> index b76166b..01bba63 100644
>>> --- a/fs/nfs/delegation.c
>>> +++ b/fs/nfs/delegation.c
>>> @@ -368,5 +368,12 @@ int nfs_inode_set_delegation(struct inode *inode,
>>> struct rpc_cred *cred, struct
>>> old_delegation, clp);
>>> if (freeme == NULL)
>>> goto out;
>>> + } else {
>>> + if (be32_to_cpu(delegation->stateid.seqid) > 0) {
>>> + nfs_do_return_delegation(inode, delegation, 0);
>>> + goto out;
>>> + }
>>> }
>>> list_add_rcu(&delegation->super_list, &server->delegations);
>>> nfsi->delegation_state = delegation->type;
>>
>> Shouldn't the seqid always be non-zero? AFAIK, RFC5661 reserves seqid
>> == 0 for the special case where the client is asking the server to
>> 'please substitute the most recent value' (see section 8.2.2).
>> Normally, the server is therefore supposed to start with seqid == 1,
>> and increment so that when the seqid wraps around, it skips over the
>> value '0'.
>
> I wrongly assumed that first seq num is 0 because I've been seeing
> seqid of 1 being returned in this case. So if we assume the server is
> acting properly and returning the next up (so patch would check that >
> 1). What do you think about that solution?
If the seqid of the new delegation > seqid of the old delegation, then
I'd expect the in-flight delegreturn for the old delegation to return
NFS4ERR_OLD_STATEID anyway. I don't see why the client needs to throw
away the new delegation.
If, on the other hand, the seqid is the same, and my client is already
holding a delegation, and it sends an OPEN, why does it make sense for
the server to tell it a second time that it is being granted that same
delegation with the same seqid?
Either the client knows that it holds the delegation, and is
deliberately not using it, or it has forgotten (and has forgotten to
send a delegreturn) in which case why would the server want to trust
it with another delegation?
>> Also, please note that nfs_do_return_delegation() may sleep, so you
>> cannot call it while holding a spin lock.
>
> Yes, that's why I couldn't think of solution where we guarantee
> atomicity of detaching the delegation and making sure it's returned
> before servicing an open for the same file.
--
Trond Myklebust
Linux NFS client maintainer, PrimaryData
[email protected]
On Thu, Jan 22, 2015 at 8:55 AM, Trond Myklebust
<[email protected]> wrote:
> On Wed, Jan 21, 2015 at 6:52 PM, Olga Kornievskaia <[email protected]> wrote:
>> On Wed, Jan 21, 2015 at 5:48 PM, Trond Myklebust
>> <[email protected]> wrote:
>>> Hi Olga,
>>>
>>> On Wed, Jan 21, 2015 at 5:29 PM, Olga Kornievskaia <[email protected]> wrote:
>>>> There is a race between return of a delegation (due to resource
>>>> constraints and inode getting evicted) and an open for the same file
>>>> because….
>>>>
>>>> In nfs_inode_return_delegation_noreclaim() we detach that delegation
>>>> from the inode first (nfs_inode_detach_delegation()) and then we send
>>>> a delegreturn. In between of nfs_inode_detach_delegation() and
>>>> nfs_do_return_delegation() an open request comes and it doesn’t see
>>>> any delegations for this inode and thus it sends an open request. At
>>>> the same time eviction thread is working on doing a delegreturn for
>>>> this inode. The server gets request for the open first and returns a
>>>> delegation for it (note: server will issue the same delegation stateid
>>>> but different seqnum as the one currently being returned by the
>>>> client). Then the server processes a delegreturn for this file (from
>>>> its perspective delegation stateid is no longer valid). Yet, on the
>>>> client side as it processes a reply from the server, it adds a “new”
>>>> delegation to the inode and proceeds to use it for an IO request later
>>>> which errors in BAD_STATEID.
>>>>
>>>> I don't see how we can make detaching the delegation and the return of
>>>> it atomic. Instead I propose the following solution:
>>>> I propose to examine the delegation we get on open. If the sequence
>>>> number is not 0 and we don't have a delegation already then we must
>>>> have experienced this race. Therefore, we should return the delegation
>>>> (and ignore if this errors, as the server thinks there is no more
>>>> delegation). Something like this. Disclaimer: it doesn't seem to be
>>>> the actual fix (as it hangs the machine) so I'm still missing
>>>> something ....
>>>>
>>>>
>>>> diff --git a/fs/nfs/delegation.c b/fs/nfs/delegation.c
>>>> index b76166b..01bba63 100644
>>>> --- a/fs/nfs/delegation.c
>>>> +++ b/fs/nfs/delegation.c
>>>> @@ -368,5 +368,12 @@ int nfs_inode_set_delegation(struct inode *inode,
>>>> struct rpc_cred *cred, struct
>>>> old_delegation, clp);
>>>> if (freeme == NULL)
>>>> goto out;
>>>> + } else {
>>>> + if (be32_to_cpu(delegation->stateid.seqid) > 0) {
>>>> + nfs_do_return_delegation(inode, delegation, 0);
>>>> + goto out;
>>>> + }
>>>> }
>>>> list_add_rcu(&delegation->super_list, &server->delegations);
>>>> nfsi->delegation_state = delegation->type;
>>>
>>> Shouldn't the seqid always be non-zero? AFAIK, RFC5661 reserves seqid
>>> == 0 for the special case where the client is asking the server to
>>> 'please substitute the most recent value' (see section 8.2.2).
>>> Normally, the server is therefore supposed to start with seqid == 1,
>>> and increment so that when the seqid wraps around, it skips over the
>>> value '0'.
>>
>> I wrongly assumed that first seq num is 0 because I've been seeing
>> seqid of 1 being returned in this case. So if we assume the server is
>> acting properly and returning the next up (so patch would check that >
>> 1). What do you think about that solution?
>
> If the seqid of the new delegation > seqid of the old delegation, then
> I'd expect the in-flight delegreturn for the old delegation to return
> NFS4ERR_OLD_STATEID anyway. I don't see why the client needs to throw
> away the new delegation.
>
> If, on the other hand, the seqid is the same, and my client is already
> holding a delegation, and it sends an OPEN, why does it make sense for
> the server to tell it a second time that it is being granted that same
> delegation with the same seqid?
> Either the client knows that it holds the delegation, and is
> deliberately not using it, or it has forgotten (and has forgotten to
> send a delegreturn) in which case why would the server want to trust
> it with another delegation?
should read: "....it with another delegation for that same file?"
Obviously, you don't want races such as the above to turn off
delegations altogether; it is just about choosing a strategy to reduce
the consequences of such races.
>>> Also, please note that nfs_do_return_delegation() may sleep, so you
>>> cannot call it while holding a spin lock.
>>
>> Yes, that's why I couldn't think of solution where we guarantee
>> atomicity of detaching the delegation and making sure it's returned
>> before servicing an open for the same file.
>
>
> --
> Trond Myklebust
> Linux NFS client maintainer, PrimaryData
> [email protected]
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Jan 22, 2015 at 8:55 AM, Trond Myklebust
<[email protected]> wrote:
> On Wed, Jan 21, 2015 at 6:52 PM, Olga Kornievskaia <[email protected]> wrote:
>> On Wed, Jan 21, 2015 at 5:48 PM, Trond Myklebust
>> <[email protected]> wrote:
>>> Hi Olga,
>>>
>>> On Wed, Jan 21, 2015 at 5:29 PM, Olga Kornievskaia <[email protected]> wrote:
>>>> There is a race between return of a delegation (due to resource
>>>> constraints and inode getting evicted) and an open for the same file
>>>> because….
>>>>
>>>> In nfs_inode_return_delegation_noreclaim() we detach that delegation
>>>> from the inode first (nfs_inode_detach_delegation()) and then we send
>>>> a delegreturn. In between of nfs_inode_detach_delegation() and
>>>> nfs_do_return_delegation() an open request comes and it doesn’t see
>>>> any delegations for this inode and thus it sends an open request. At
>>>> the same time eviction thread is working on doing a delegreturn for
>>>> this inode. The server gets request for the open first and returns a
>>>> delegation for it (note: server will issue the same delegation stateid
>>>> but different seqnum as the one currently being returned by the
>>>> client). Then the server processes a delegreturn for this file (from
>>>> its perspective delegation stateid is no longer valid). Yet, on the
>>>> client side as it processes a reply from the server, it adds a “new”
>>>> delegation to the inode and proceeds to use it for an IO request later
>>>> which errors in BAD_STATEID.
>>>>
>>>> I don't see how we can make detaching the delegation and the return of
>>>> it atomic. Instead I propose the following solution:
>>>> I propose to examine the delegation we get on open. If the sequence
>>>> number is not 0 and we don't have a delegation already then we must
>>>> have experienced this race. Therefore, we should return the delegation
>>>> (and ignore if this errors, as the server thinks there is no more
>>>> delegation). Something like this. Disclaimer: it doesn't seem to be
>>>> the actual fix (as it hangs the machine) so I'm still missing
>>>> something ....
>>>>
>>>>
>>>> diff --git a/fs/nfs/delegation.c b/fs/nfs/delegation.c
>>>> index b76166b..01bba63 100644
>>>> --- a/fs/nfs/delegation.c
>>>> +++ b/fs/nfs/delegation.c
>>>> @@ -368,5 +368,12 @@ int nfs_inode_set_delegation(struct inode *inode,
>>>> struct rpc_cred *cred, struct
>>>> old_delegation, clp);
>>>> if (freeme == NULL)
>>>> goto out;
>>>> + } else {
>>>> + if (be32_to_cpu(delegation->stateid.seqid) > 0) {
>>>> + nfs_do_return_delegation(inode, delegation, 0);
>>>> + goto out;
>>>> + }
>>>> }
>>>> list_add_rcu(&delegation->super_list, &server->delegations);
>>>> nfsi->delegation_state = delegation->type;
>>>
>>> Shouldn't the seqid always be non-zero? AFAIK, RFC5661 reserves seqid
>>> == 0 for the special case where the client is asking the server to
>>> 'please substitute the most recent value' (see section 8.2.2).
>>> Normally, the server is therefore supposed to start with seqid == 1,
>>> and increment so that when the seqid wraps around, it skips over the
>>> value '0'.
>>
>> I wrongly assumed that first seq num is 0 because I've been seeing
>> seqid of 1 being returned in this case. So if we assume the server is
>> acting properly and returning the next up (so patch would check that >
>> 1). What do you think about that solution?
>
> If the seqid of the new delegation > seqid of the old delegation, then
> I'd expect the in-flight delegreturn for the old delegation to return
> NFS4ERR_OLD_STATEID anyway. I don't see why the client needs to throw
> away the new delegation.
Thanks for the explanation. It make sense. I believe the server is
erroneously sending the exact "same" delegation back and the racing
delegreturn is then indistinguishable from the "new" delegation. It
then can't distinguish between the "old" delegation that's being
returned in this resource constraining scenario and the "new" one it
gave out.
>
> If, on the other hand, the seqid is the same, and my client is already
> holding a delegation, and it sends an OPEN, why does it make sense for
> the server to tell it a second time that it is being granted that same
> delegation with the same seqid?
> Either the client knows that it holds the delegation, and is
> deliberately not using it, or it has forgotten (and has forgotten to
> send a delegreturn) in which case why would the server want to trust
> it with another delegation?
>>> Also, please note that nfs_do_return_delegation() may sleep, so you
>>> cannot call it while holding a spin lock.
>>
>> Yes, that's why I couldn't think of solution where we guarantee
>> atomicity of detaching the delegation and making sure it's returned
>> before servicing an open for the same file.
>
>
> --
> Trond Myklebust
> Linux NFS client maintainer, PrimaryData
> [email protected]