2015-09-14 23:54:37

by Olga Kornievskaia

[permalink] [raw]
Subject: Failing to send a CLOSE if file is opened WRONLY and server reboots on a 4.x mount

A test case is as the description says:
open(foobar, O_WRONLY);
sleep() --> reboot the server
close(foobar)

The bug is because in nfs4state.c in nfs4_reclaim_open_state() a few
line before going to restart, there is
clear_bit(NFS4CLNT_RECLAIM_NOGRACE, &state->flags).

NFS4CLNT_RECLAIM_NOGRACE is a flag for the client states not open
owner states. Value of NFS4CLNT_RECLAIM_NOGRACE is 4 which is the
value of NFS_O_WRONLY_STATE in nfs4_state->flags. So clearing it wipes
out state and when we go to close it, “call_close” doesn’t get set as
state flag is not set and CLOSE doesn’t go on the wire.

That line was introduced to fix an infinite loop for OPEN recovery
upon receiving a BAD_STATEID error: commit e8d975e73. I have tested
injecting BAD_STATEID error using the patch below and the code
recovers without problems. However, I'm not sure the clearing of the
bit is needed any more. I have tested for infinite loop by reverting
the patch and didn't hit the infinite loop.

diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
index da73bc4..5db3246 100644
--- a/fs/nfs/nfs4state.c
+++ b/fs/nfs/nfs4state.c
@@ -1481,7 +1481,7 @@ restart:
spin_unlock(&state->state_lock);
}
nfs4_put_open_state(state);
- clear_bit(NFS4CLNT_RECLAIM_NOGRACE,
+ clear_bit(NFS_STATE_RECLAIM_NOGRACE,
&state->flags);
spin_lock(&sp->so_lock);
goto restart;


2015-09-15 13:39:10

by Trond Myklebust

[permalink] [raw]
Subject: Re: Failing to send a CLOSE if file is opened WRONLY and server reboots on a 4.x mount

On Mon, Sep 14, 2015 at 7:54 PM, Olga Kornievskaia <[email protected]> wrote:
> A test case is as the description says:
> open(foobar, O_WRONLY);
> sleep() --> reboot the server
> close(foobar)
>
> The bug is because in nfs4state.c in nfs4_reclaim_open_state() a few
> line before going to restart, there is
> clear_bit(NFS4CLNT_RECLAIM_NOGRACE, &state->flags).
>
> NFS4CLNT_RECLAIM_NOGRACE is a flag for the client states not open
> owner states. Value of NFS4CLNT_RECLAIM_NOGRACE is 4 which is the
> value of NFS_O_WRONLY_STATE in nfs4_state->flags. So clearing it wipes
> out state and when we go to close it, “call_close” doesn’t get set as
> state flag is not set and CLOSE doesn’t go on the wire.
>
> That line was introduced to fix an infinite loop for OPEN recovery
> upon receiving a BAD_STATEID error: commit e8d975e73. I have tested
> injecting BAD_STATEID error using the patch below and the code
> recovers without problems. However, I'm not sure the clearing of the
> bit is needed any more. I have tested for infinite loop by reverting
> the patch and didn't hit the infinite loop.
>
> diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
> index da73bc4..5db3246 100644
> --- a/fs/nfs/nfs4state.c
> +++ b/fs/nfs/nfs4state.c
> @@ -1481,7 +1481,7 @@ restart:
> spin_unlock(&state->state_lock);
> }
> nfs4_put_open_state(state);
> - clear_bit(NFS4CLNT_RECLAIM_NOGRACE,
> + clear_bit(NFS_STATE_RECLAIM_NOGRACE,
> &state->flags);
> spin_lock(&sp->so_lock);
> goto restart;

That's an obvious typo. Thanks for spotting it!

As for whether or not the bit clear is needed at all, I think it is
for NFSv4 on older kernels. On newer kernels, we do have the NFSv4
state recovery drain the slot table (just like we've always done for
NFSv4.1) and so I agree that those kernels probably won't be
afflicted.

Cheers
Trond

2015-09-15 14:27:47

by Olga Kornievskaia

[permalink] [raw]
Subject: Re: Failing to send a CLOSE if file is opened WRONLY and server reboots on a 4.x mount

On Tue, Sep 15, 2015 at 9:39 AM, Trond Myklebust
<[email protected]> wrote:
> On Mon, Sep 14, 2015 at 7:54 PM, Olga Kornievskaia <[email protected]> wrote:
>> A test case is as the description says:
>> open(foobar, O_WRONLY);
>> sleep() --> reboot the server
>> close(foobar)
>>
>> The bug is because in nfs4state.c in nfs4_reclaim_open_state() a few
>> line before going to restart, there is
>> clear_bit(NFS4CLNT_RECLAIM_NOGRACE, &state->flags).
>>
>> NFS4CLNT_RECLAIM_NOGRACE is a flag for the client states not open
>> owner states. Value of NFS4CLNT_RECLAIM_NOGRACE is 4 which is the
>> value of NFS_O_WRONLY_STATE in nfs4_state->flags. So clearing it wipes
>> out state and when we go to close it, “call_close” doesn’t get set as
>> state flag is not set and CLOSE doesn’t go on the wire.
>>
>> That line was introduced to fix an infinite loop for OPEN recovery
>> upon receiving a BAD_STATEID error: commit e8d975e73. I have tested
>> injecting BAD_STATEID error using the patch below and the code
>> recovers without problems. However, I'm not sure the clearing of the
>> bit is needed any more. I have tested for infinite loop by reverting
>> the patch and didn't hit the infinite loop.
>>
>> diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
>> index da73bc4..5db3246 100644
>> --- a/fs/nfs/nfs4state.c
>> +++ b/fs/nfs/nfs4state.c
>> @@ -1481,7 +1481,7 @@ restart:
>> spin_unlock(&state->state_lock);
>> }
>> nfs4_put_open_state(state);
>> - clear_bit(NFS4CLNT_RECLAIM_NOGRACE,
>> + clear_bit(NFS_STATE_RECLAIM_NOGRACE,
>> &state->flags);
>> spin_lock(&sp->so_lock);
>> goto restart;
>
> That's an obvious typo. Thanks for spotting it!
>
> As for whether or not the bit clear is needed at all, I think it is
> for NFSv4 on older kernels. On newer kernels, we do have the NFSv4
> state recovery drain the slot table (just like we've always done for
> NFSv4.1) and so I agree that those kernels probably won't be
> afflicted.
>

Thanks Trond. Do you need me to resubmit it without the last paragraph
or is the patch ok as is?

> Cheers
> Trond

2015-09-15 15:49:05

by Trond Myklebust

[permalink] [raw]
Subject: Re: Failing to send a CLOSE if file is opened WRONLY and server reboots on a 4.x mount

On Tue, Sep 15, 2015 at 10:27 AM, Olga Kornievskaia <[email protected]> wrote:
> On Tue, Sep 15, 2015 at 9:39 AM, Trond Myklebust
> <[email protected]> wrote:
>> On Mon, Sep 14, 2015 at 7:54 PM, Olga Kornievskaia <[email protected]> wrote:
>>> A test case is as the description says:
>>> open(foobar, O_WRONLY);
>>> sleep() --> reboot the server
>>> close(foobar)
>>>
>>> The bug is because in nfs4state.c in nfs4_reclaim_open_state() a few
>>> line before going to restart, there is
>>> clear_bit(NFS4CLNT_RECLAIM_NOGRACE, &state->flags).
>>>
>>> NFS4CLNT_RECLAIM_NOGRACE is a flag for the client states not open
>>> owner states. Value of NFS4CLNT_RECLAIM_NOGRACE is 4 which is the
>>> value of NFS_O_WRONLY_STATE in nfs4_state->flags. So clearing it wipes
>>> out state and when we go to close it, “call_close” doesn’t get set as
>>> state flag is not set and CLOSE doesn’t go on the wire.
>>>
>>> That line was introduced to fix an infinite loop for OPEN recovery
>>> upon receiving a BAD_STATEID error: commit e8d975e73. I have tested
>>> injecting BAD_STATEID error using the patch below and the code
>>> recovers without problems. However, I'm not sure the clearing of the
>>> bit is needed any more. I have tested for infinite loop by reverting
>>> the patch and didn't hit the infinite loop.
>>>
>>> diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
>>> index da73bc4..5db3246 100644
>>> --- a/fs/nfs/nfs4state.c
>>> +++ b/fs/nfs/nfs4state.c
>>> @@ -1481,7 +1481,7 @@ restart:
>>> spin_unlock(&state->state_lock);
>>> }
>>> nfs4_put_open_state(state);
>>> - clear_bit(NFS4CLNT_RECLAIM_NOGRACE,
>>> + clear_bit(NFS_STATE_RECLAIM_NOGRACE,
>>> &state->flags);
>>> spin_lock(&sp->so_lock);
>>> goto restart;
>>
>> That's an obvious typo. Thanks for spotting it!
>>
>> As for whether or not the bit clear is needed at all, I think it is
>> for NFSv4 on older kernels. On newer kernels, we do have the NFSv4
>> state recovery drain the slot table (just like we've always done for
>> NFSv4.1) and so I agree that those kernels probably won't be
>> afflicted.
>>
>
> Thanks Trond. Do you need me to resubmit it without the last paragraph
> or is the patch ok as is?
>

I can easily remove that paragraph when applying the patch, if you
agree that it is superfluous.

Cheers
Trond

2015-09-15 16:52:43

by Olga Kornievskaia

[permalink] [raw]
Subject: Re: Failing to send a CLOSE if file is opened WRONLY and server reboots on a 4.x mount

On Tue, Sep 15, 2015 at 11:49 AM, Trond Myklebust
<[email protected]> wrote:
> On Tue, Sep 15, 2015 at 10:27 AM, Olga Kornievskaia <[email protected]> wrote:
>> On Tue, Sep 15, 2015 at 9:39 AM, Trond Myklebust
>> <[email protected]> wrote:
>>> On Mon, Sep 14, 2015 at 7:54 PM, Olga Kornievskaia <[email protected]> wrote:
>>>> A test case is as the description says:
>>>> open(foobar, O_WRONLY);
>>>> sleep() --> reboot the server
>>>> close(foobar)
>>>>
>>>> The bug is because in nfs4state.c in nfs4_reclaim_open_state() a few
>>>> line before going to restart, there is
>>>> clear_bit(NFS4CLNT_RECLAIM_NOGRACE, &state->flags).
>>>>
>>>> NFS4CLNT_RECLAIM_NOGRACE is a flag for the client states not open
>>>> owner states. Value of NFS4CLNT_RECLAIM_NOGRACE is 4 which is the
>>>> value of NFS_O_WRONLY_STATE in nfs4_state->flags. So clearing it wipes
>>>> out state and when we go to close it, “call_close” doesn’t get set as
>>>> state flag is not set and CLOSE doesn’t go on the wire.
>>>>
>>>> That line was introduced to fix an infinite loop for OPEN recovery
>>>> upon receiving a BAD_STATEID error: commit e8d975e73. I have tested
>>>> injecting BAD_STATEID error using the patch below and the code
>>>> recovers without problems. However, I'm not sure the clearing of the
>>>> bit is needed any more. I have tested for infinite loop by reverting
>>>> the patch and didn't hit the infinite loop.
>>>>
>>>> diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
>>>> index da73bc4..5db3246 100644
>>>> --- a/fs/nfs/nfs4state.c
>>>> +++ b/fs/nfs/nfs4state.c
>>>> @@ -1481,7 +1481,7 @@ restart:
>>>> spin_unlock(&state->state_lock);
>>>> }
>>>> nfs4_put_open_state(state);
>>>> - clear_bit(NFS4CLNT_RECLAIM_NOGRACE,
>>>> + clear_bit(NFS_STATE_RECLAIM_NOGRACE,
>>>> &state->flags);
>>>> spin_lock(&sp->so_lock);
>>>> goto restart;
>>>
>>> That's an obvious typo. Thanks for spotting it!
>>>
>>> As for whether or not the bit clear is needed at all, I think it is
>>> for NFSv4 on older kernels. On newer kernels, we do have the NFSv4
>>> state recovery drain the slot table (just like we've always done for
>>> NFSv4.1) and so I agree that those kernels probably won't be
>>> afflicted.
>>>
>>
>> Thanks Trond. Do you need me to resubmit it without the last paragraph
>> or is the patch ok as is?
>>
>
> I can easily remove that paragraph when applying the patch, if you
> agree that it is superfluous.

Thanks. Works for me.

>
> Cheers
> Trond

2015-09-17 13:34:48

by Trond Myklebust

[permalink] [raw]
Subject: Re: Failing to send a CLOSE if file is opened WRONLY and server reboots on a 4.x mount

On Tue, Sep 15, 2015 at 12:52 PM, Olga Kornievskaia <[email protected]> wrote:
> On Tue, Sep 15, 2015 at 11:49 AM, Trond Myklebust
> <[email protected]> wrote:
>> On Tue, Sep 15, 2015 at 10:27 AM, Olga Kornievskaia <[email protected]> wrote:
>>> On Tue, Sep 15, 2015 at 9:39 AM, Trond Myklebust
>>> <[email protected]> wrote:
>>>> On Mon, Sep 14, 2015 at 7:54 PM, Olga Kornievskaia <[email protected]> wrote:
>>>>> A test case is as the description says:
>>>>> open(foobar, O_WRONLY);
>>>>> sleep() --> reboot the server
>>>>> close(foobar)
>>>>>
>>>>> The bug is because in nfs4state.c in nfs4_reclaim_open_state() a few
>>>>> line before going to restart, there is
>>>>> clear_bit(NFS4CLNT_RECLAIM_NOGRACE, &state->flags).
>>>>>
>>>>> NFS4CLNT_RECLAIM_NOGRACE is a flag for the client states not open
>>>>> owner states. Value of NFS4CLNT_RECLAIM_NOGRACE is 4 which is the
>>>>> value of NFS_O_WRONLY_STATE in nfs4_state->flags. So clearing it wipes
>>>>> out state and when we go to close it, “call_close” doesn’t get set as
>>>>> state flag is not set and CLOSE doesn’t go on the wire.
>>>>>
>>>>> That line was introduced to fix an infinite loop for OPEN recovery
>>>>> upon receiving a BAD_STATEID error: commit e8d975e73. I have tested
>>>>> injecting BAD_STATEID error using the patch below and the code
>>>>> recovers without problems. However, I'm not sure the clearing of the
>>>>> bit is needed any more. I have tested for infinite loop by reverting
>>>>> the patch and didn't hit the infinite loop.
>>>>>
>>>>> diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
>>>>> index da73bc4..5db3246 100644
>>>>> --- a/fs/nfs/nfs4state.c
>>>>> +++ b/fs/nfs/nfs4state.c
>>>>> @@ -1481,7 +1481,7 @@ restart:
>>>>> spin_unlock(&state->state_lock);
>>>>> }
>>>>> nfs4_put_open_state(state);
>>>>> - clear_bit(NFS4CLNT_RECLAIM_NOGRACE,
>>>>> + clear_bit(NFS_STATE_RECLAIM_NOGRACE,
>>>>> &state->flags);
>>>>> spin_lock(&sp->so_lock);
>>>>> goto restart;
>>>>
>>>> That's an obvious typo. Thanks for spotting it!
>>>>
>>>> As for whether or not the bit clear is needed at all, I think it is
>>>> for NFSv4 on older kernels. On newer kernels, we do have the NFSv4
>>>> state recovery drain the slot table (just like we've always done for
>>>> NFSv4.1) and so I agree that those kernels probably won't be
>>>> afflicted.
>>>>
>>>
>>> Thanks Trond. Do you need me to resubmit it without the last paragraph
>>> or is the patch ok as is?
>>>
>>
>> I can easily remove that paragraph when applying the patch, if you
>> agree that it is superfluous.
>
> Thanks. Works for me.
>

May I also add a signed-off-by line from you? I can't really apply
this (or any other patches) without it.

Cheers
Trond

2015-09-17 13:36:33

by Olga Kornievskaia

[permalink] [raw]
Subject: Re: Failing to send a CLOSE if file is opened WRONLY and server reboots on a 4.x mount

On Thu, Sep 17, 2015 at 9:34 AM, Trond Myklebust
<[email protected]> wrote:
> On Tue, Sep 15, 2015 at 12:52 PM, Olga Kornievskaia <[email protected]> wrote:
>> On Tue, Sep 15, 2015 at 11:49 AM, Trond Myklebust
>> <[email protected]> wrote:
>>> On Tue, Sep 15, 2015 at 10:27 AM, Olga Kornievskaia <[email protected]> wrote:
>>>> On Tue, Sep 15, 2015 at 9:39 AM, Trond Myklebust
>>>> <[email protected]> wrote:
>>>>> On Mon, Sep 14, 2015 at 7:54 PM, Olga Kornievskaia <[email protected]> wrote:
>>>>>> A test case is as the description says:
>>>>>> open(foobar, O_WRONLY);
>>>>>> sleep() --> reboot the server
>>>>>> close(foobar)
>>>>>>
>>>>>> The bug is because in nfs4state.c in nfs4_reclaim_open_state() a few
>>>>>> line before going to restart, there is
>>>>>> clear_bit(NFS4CLNT_RECLAIM_NOGRACE, &state->flags).
>>>>>>
>>>>>> NFS4CLNT_RECLAIM_NOGRACE is a flag for the client states not open
>>>>>> owner states. Value of NFS4CLNT_RECLAIM_NOGRACE is 4 which is the
>>>>>> value of NFS_O_WRONLY_STATE in nfs4_state->flags. So clearing it wipes
>>>>>> out state and when we go to close it, “call_close” doesn’t get set as
>>>>>> state flag is not set and CLOSE doesn’t go on the wire.
>>>>>>
>>>>>> That line was introduced to fix an infinite loop for OPEN recovery
>>>>>> upon receiving a BAD_STATEID error: commit e8d975e73. I have tested
>>>>>> injecting BAD_STATEID error using the patch below and the code
>>>>>> recovers without problems. However, I'm not sure the clearing of the
>>>>>> bit is needed any more. I have tested for infinite loop by reverting
>>>>>> the patch and didn't hit the infinite loop.
>>>>>>
>>>>>> diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
>>>>>> index da73bc4..5db3246 100644
>>>>>> --- a/fs/nfs/nfs4state.c
>>>>>> +++ b/fs/nfs/nfs4state.c
>>>>>> @@ -1481,7 +1481,7 @@ restart:
>>>>>> spin_unlock(&state->state_lock);
>>>>>> }
>>>>>> nfs4_put_open_state(state);
>>>>>> - clear_bit(NFS4CLNT_RECLAIM_NOGRACE,
>>>>>> + clear_bit(NFS_STATE_RECLAIM_NOGRACE,
>>>>>> &state->flags);
>>>>>> spin_lock(&sp->so_lock);
>>>>>> goto restart;
>>>>>
>>>>> That's an obvious typo. Thanks for spotting it!
>>>>>
>>>>> As for whether or not the bit clear is needed at all, I think it is
>>>>> for NFSv4 on older kernels. On newer kernels, we do have the NFSv4
>>>>> state recovery drain the slot table (just like we've always done for
>>>>> NFSv4.1) and so I agree that those kernels probably won't be
>>>>> afflicted.
>>>>>
>>>>
>>>> Thanks Trond. Do you need me to resubmit it without the last paragraph
>>>> or is the patch ok as is?
>>>>
>>>
>>> I can easily remove that paragraph when applying the patch, if you
>>> agree that it is superfluous.
>>
>> Thanks. Works for me.
>>
>
> May I also add a signed-off-by line from you? I can't really apply
> this (or any other patches) without it.

Of course.

>
> Cheers
> Trond
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html