LinuxLists.cc - [PATCH 1/1] Adding issync field to delegreturn

2015-10-12 21:55:08

Subject: [PATCH 1/1] Adding issync field to delegreturn_exit tracepoint

It'll be nice to know when we return delegations synchronously or not.

Signed-off: Olga Kornievskaia <[email protected]>
---
fs/nfs/nfs4proc.c | 5 ++++-
fs/nfs/nfs4trace.h | 12 ++++++++----
2 files changed, 12 insertions(+), 5 deletions(-)

diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index f93b9cd..858c780 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -5283,6 +5283,7 @@ struct nfs4_delegreturndata {
struct inode *inode;
bool roc;
u32 roc_barrier;
+ int issync;
};

static void nfs4_delegreturn_done(struct rpc_task *task, void *calldata)
@@ -5292,7 +5293,8 @@ static void nfs4_delegreturn_done(struct
rpc_task *task, void *calldata)
if (!nfs4_sequence_done(task, &data->res.seq_res))
return;

- trace_nfs4_delegreturn_exit(&data->args, &data->res, task->tk_status);
+ trace_nfs4_delegreturn_exit(&data->args, &data->res, task->tk_status,
+ data->issync);
switch (task->tk_status) {
case 0:
renew_lease(data->res.server, data->timestamp);
@@ -5387,6 +5389,7 @@ static int _nfs4_proc_delegreturn(struct inode
*inode, struct rpc_cred *cred, co
data->inode = nfs_igrab_and_active(inode);
if (data->inode)
data->roc = nfs4_roc(inode);
+ data->issync = issync;

task_setup_data.callback_data = data;
msg.rpc_argp = &data->args;
diff --git a/fs/nfs/nfs4trace.h b/fs/nfs/nfs4trace.h
index 671cf68..be67e87 100644
--- a/fs/nfs/nfs4trace.h
+++ b/fs/nfs/nfs4trace.h
@@ -612,29 +612,33 @@ TRACE_EVENT(nfs4_delegreturn_exit,
TP_PROTO(
const struct nfs4_delegreturnargs *args,
const struct nfs4_delegreturnres *res,
- int error
+ int error,
+ int issync
),

- TP_ARGS(args, res, error),
+ TP_ARGS(args, res, error, issync),

TP_STRUCT__entry(
__field(dev_t, dev)
__field(u32, fhandle)
__field(int, error)
+ __field(int, issync)
),

TP_fast_assign(
__entry->dev = res->server->s_dev;
__entry->fhandle = nfs_fhandle_hash(args->fhandle);
__entry->error = error;
+ __entry->issync = issync;
),

TP_printk(
- "error=%d (%s) dev=%02x:%02x fhandle=0x%08x",
+ "error=%d (%s) dev=%02x:%02x fhandle=0x%08x issync=%d",
__entry->error,
show_nfsv4_errors(__entry->error),
MAJOR(__entry->dev), MINOR(__entry->dev),
- __entry->fhandle
+ __entry->fhandle,
+ __entry->issync
)
);

--
1.8.3.1

2015-10-13 03:47:11

by Trond Myklebust

[permalink] [raw]

Subject: Re: [PATCH 1/1] Adding issync field to delegreturn_exit tracepoint

On Mon, Oct 12, 2015 at 5:55 PM, Olga Kornievskaia <[email protected]> wrote:
> It'll be nice to know when we return delegations synchronously or not.

Why? This patch forces us to carry an otherwise completely unnecessary
parameter, so at the very minimum we should have a discussion of what
the real use cases are.

2015-10-13 12:26:44

by Olga Kornievskaia

[permalink] [raw]

Subject: Re: [PATCH 1/1] Adding issync field to delegreturn_exit tracepoint

On Mon, Oct 12, 2015 at 11:47 PM, Trond Myklebust
<[email protected]> wrote:
> On Mon, Oct 12, 2015 at 5:55 PM, Olga Kornievskaia <[email protected]> wrote:
>> It'll be nice to know when we return delegations synchronously or not.
>
> Why? This patch forces us to carry an otherwise completely unnecessary
> parameter, so at the very minimum we should have a discussion of what
> the real use cases are.

I used it to diagnose the race of open and delegreturn. If it's kept
that some delegreturns are synchronous and others are not I think the
information is useful.

Speaking of there is a race between state manager thread returning
used delegations and new open. Previously I thought it was evict
inode...

> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2015-10-13 13:27:16

by Trond Myklebust

[permalink] [raw]

Subject: Re: [PATCH 1/1] Adding issync field to delegreturn_exit tracepoint

On Tue, Oct 13, 2015 at 8:26 AM, Olga Kornievskaia <[email protected]> wrote:
>
> On Mon, Oct 12, 2015 at 11:47 PM, Trond Myklebust
> <[email protected]> wrote:
> > On Mon, Oct 12, 2015 at 5:55 PM, Olga Kornievskaia <[email protected]> wrote:
> >> It'll be nice to know when we return delegations synchronously or not.
> >
> > Why? This patch forces us to carry an otherwise completely unnecessary
> > parameter, so at the very minimum we should have a discussion of what
> > the real use cases are.
>
> I used it to diagnose the race of open and delegreturn. If it's kept

How were you using it?

> that some delegreturns are synchronous and others are not I think the
> information is useful.

The only difference between synchronous and asynchronous in this case
is whether or not the process that launched the delegreturn actually
waits for it to complete; a signal could easily prevent it from doing
so without interrupting the delegreturn call itself.
IOW: for complete information when debugging races here, you really
need to examine the return value from the wait call.

> Speaking of there is a race between state manager thread returning
> used delegations and new open. Previously I thought it was evict
> inode...

Is this with commit 5e99b532bb95 ("nfs4: reset states to use
open_stateid when returning delegation voluntarily") applied?

2015-10-13 14:13:06

by Olga Kornievskaia

[permalink] [raw]

Subject: Re: [PATCH 1/1] Adding issync field to delegreturn_exit tracepoint

On Tue, Oct 13, 2015 at 9:27 AM, Trond Myklebust
<[email protected]> wrote:
> On Tue, Oct 13, 2015 at 8:26 AM, Olga Kornievskaia <[email protected]> wrote:
>>
>> On Mon, Oct 12, 2015 at 11:47 PM, Trond Myklebust
>> <[email protected]> wrote:
>> > On Mon, Oct 12, 2015 at 5:55 PM, Olga Kornievskaia <[email protected]> wrote:
>> >> It'll be nice to know when we return delegations synchronously or not.
>> >
>> > Why? This patch forces us to carry an otherwise completely unnecessary
>> > parameter, so at the very minimum we should have a discussion of what
>> > the real use cases are.
>>
>> I used it to diagnose the race of open and delegreturn. If it's kept
>
> How were you using it?

I added two more traces points in the beginning of delegreturn and in
nfs4_do_open before sending the rpc. I can see that a given file
handle:
-- delegreturn prepare tracepoint is happening,
-- then the tracepoint of before sending the open is logged,
-- then delegreturn prepare is logged again,
-- then tracepoint for nfs4_open_file which is after receiving reply
to the open from the server
-- then delegreturn_exit tracepoint

kworker/1:0H-14168 [001] .... 576.571636:
nfs4_delegreturn_prepare: error=0 (OK) dev=00:2a fhandle=0x84792ca9
issync=0

hammer-13955 [000] .... 576.942632: nfs4_open_file_begin:
flags=32768 (0x8000) fmode=READ|0x801c fileid=00:2a:0
fhandle=0x00000000 name=00:2a:904/000002CB.ham

hammer-13955 [001] .... 577.043084: nfs4_open_file:
error=0 (OK) flags=32768 (0x8000) fmode=READ|0x801c fileid=00:2a:7708
fhandle=0x84792ca9 name=00:2a:904/000002CB.ham

kworker/0:1H-431 [000] .... 577.064013:
nfs4_delegreturn_prepare: error=0 (OK) dev=00:2a fhandle=0x84792ca9
issync=0

kworker/0:1H-431 [000] .... 577.101076: nfs4_delegreturn_exit:
error=0 (OK) dev=00:2a fhandle=0x84792ca9

kworker/0:1H-431 [000] .... 577.113021: nfs4_read:
error=-10025 (BAD_STATEID) fileid=00:2a:7708 fhandle=0x84792ca9
offset=0 count=64

>
>> that some delegreturns are synchronous and others are not I think the
>> information is useful.
>
> The only difference between synchronous and asynchronous in this case
> is whether or not the process that launched the delegreturn actually
> waits for it to complete; a signal could easily prevent it from doing
> so without interrupting the delegreturn call itself.
> IOW: for complete information when debugging races here, you really
> need to examine the return value from the wait call.
>
>> Speaking of there is a race between state manager thread returning
>> used delegations and new open. Previously I thought it was evict
>> inode...
>
> Is this with commit 5e99b532bb95 ("nfs4: reset states to use
> open_stateid when returning delegation voluntarily") applied?

No I have not. I will try that. Thanks.

2015-10-13 16:24:13

by Olga Kornievskaia

[permalink] [raw]

Subject: Re: [PATCH 1/1] Adding issync field to delegreturn_exit tracepoint

On Tue, Oct 13, 2015 at 10:13 AM, Olga Kornievskaia <[email protected]> wrote:
> On Tue, Oct 13, 2015 at 9:27 AM, Trond Myklebust
> <[email protected]> wrote:
>> On Tue, Oct 13, 2015 at 8:26 AM, Olga Kornievskaia <[email protected]> wrote:
>>>
>>> On Mon, Oct 12, 2015 at 11:47 PM, Trond Myklebust
>>> <[email protected]> wrote:
>>> > On Mon, Oct 12, 2015 at 5:55 PM, Olga Kornievskaia <[email protected]> wrote:
>>> >> It'll be nice to know when we return delegations synchronously or not.
>>> >
>>> > Why? This patch forces us to carry an otherwise completely unnecessary
>>> > parameter, so at the very minimum we should have a discussion of what
>>> > the real use cases are.
>>>
>>> I used it to diagnose the race of open and delegreturn. If it's kept
>>
>> How were you using it?
>
> I added two more traces points in the beginning of delegreturn and in
> nfs4_do_open before sending the rpc. I can see that a given file
> handle:
> -- delegreturn prepare tracepoint is happening,
> -- then the tracepoint of before sending the open is logged,
> -- then delegreturn prepare is logged again,
> -- then tracepoint for nfs4_open_file which is after receiving reply
> to the open from the server
> -- then delegreturn_exit tracepoint
>
> kworker/1:0H-14168 [001] .... 576.571636:
> nfs4_delegreturn_prepare: error=0 (OK) dev=00:2a fhandle=0x84792ca9
> issync=0
>
> hammer-13955 [000] .... 576.942632: nfs4_open_file_begin:
> flags=32768 (0x8000) fmode=READ|0x801c fileid=00:2a:0
> fhandle=0x00000000 name=00:2a:904/000002CB.ham
>
> hammer-13955 [001] .... 577.043084: nfs4_open_file:
> error=0 (OK) flags=32768 (0x8000) fmode=READ|0x801c fileid=00:2a:7708
> fhandle=0x84792ca9 name=00:2a:904/000002CB.ham
>
> kworker/0:1H-431 [000] .... 577.064013:
> nfs4_delegreturn_prepare: error=0 (OK) dev=00:2a fhandle=0x84792ca9
> issync=0
>
> kworker/0:1H-431 [000] .... 577.101076: nfs4_delegreturn_exit:
> error=0 (OK) dev=00:2a fhandle=0x84792ca9
>
> kworker/0:1H-431 [000] .... 577.113021: nfs4_read:
> error=-10025 (BAD_STATEID) fileid=00:2a:7708 fhandle=0x84792ca9
> offset=0 count=64
>
>
>>
>>> that some delegreturns are synchronous and others are not I think the
>>> information is useful.
>>
>> The only difference between synchronous and asynchronous in this case
>> is whether or not the process that launched the delegreturn actually
>> waits for it to complete; a signal could easily prevent it from doing
>> so without interrupting the delegreturn call itself.
>> IOW: for complete information when debugging races here, you really
>> need to examine the return value from the wait call.
>>
>>> Speaking of there is a race between state manager thread returning
>>> used delegations and new open. Previously I thought it was evict
>>> inode...
>>
>> Is this with commit 5e99b532bb95 ("nfs4: reset states to use
>> open_stateid when returning delegation voluntarily") applied?
>
> No I have not. I will try that. Thanks.

This patch does not help. The race is still present.

2015-10-13 16:56:39

by Trond Myklebust

[permalink] [raw]

Subject: Re: [PATCH 1/1] Adding issync field to delegreturn_exit tracepoint

On Tue, Oct 13, 2015 at 12:24 PM, Olga Kornievskaia <[email protected]> wrote:
> On Tue, Oct 13, 2015 at 10:13 AM, Olga Kornievskaia <[email protected]> wrote:
>> On Tue, Oct 13, 2015 at 9:27 AM, Trond Myklebust
>> <[email protected]> wrote:
>>> On Tue, Oct 13, 2015 at 8:26 AM, Olga Kornievskaia <[email protected]> wrote:
>>>>
>>>> On Mon, Oct 12, 2015 at 11:47 PM, Trond Myklebust
>>>> <[email protected]> wrote:
>>>> > On Mon, Oct 12, 2015 at 5:55 PM, Olga Kornievskaia <[email protected]> wrote:
>>>> >> It'll be nice to know when we return delegations synchronously or not.
>>>> >
>>>> > Why? This patch forces us to carry an otherwise completely unnecessary
>>>> > parameter, so at the very minimum we should have a discussion of what
>>>> > the real use cases are.
>>>>
>>>> I used it to diagnose the race of open and delegreturn. If it's kept
>>>
>>> How were you using it?
>>
>> I added two more traces points in the beginning of delegreturn and in
>> nfs4_do_open before sending the rpc. I can see that a given file
>> handle:
>> -- delegreturn prepare tracepoint is happening,
>> -- then the tracepoint of before sending the open is logged,
>> -- then delegreturn prepare is logged again,
>> -- then tracepoint for nfs4_open_file which is after receiving reply
>> to the open from the server
>> -- then delegreturn_exit tracepoint
>>
>> kworker/1:0H-14168 [001] .... 576.571636:
>> nfs4_delegreturn_prepare: error=0 (OK) dev=00:2a fhandle=0x84792ca9
>> issync=0
>>
>> hammer-13955 [000] .... 576.942632: nfs4_open_file_begin:
>> flags=32768 (0x8000) fmode=READ|0x801c fileid=00:2a:0
>> fhandle=0x00000000 name=00:2a:904/000002CB.ham
>>
>> hammer-13955 [001] .... 577.043084: nfs4_open_file:
>> error=0 (OK) flags=32768 (0x8000) fmode=READ|0x801c fileid=00:2a:7708
>> fhandle=0x84792ca9 name=00:2a:904/000002CB.ham
>>
>> kworker/0:1H-431 [000] .... 577.064013:
>> nfs4_delegreturn_prepare: error=0 (OK) dev=00:2a fhandle=0x84792ca9
>> issync=0
>>
>> kworker/0:1H-431 [000] .... 577.101076: nfs4_delegreturn_exit:
>> error=0 (OK) dev=00:2a fhandle=0x84792ca9
>>
>> kworker/0:1H-431 [000] .... 577.113021: nfs4_read:
>> error=-10025 (BAD_STATEID) fileid=00:2a:7708 fhandle=0x84792ca9
>> offset=0 count=64
>>
>>
>>>
>>>> that some delegreturns are synchronous and others are not I think the
>>>> information is useful.
>>>
>>> The only difference between synchronous and asynchronous in this case
>>> is whether or not the process that launched the delegreturn actually
>>> waits for it to complete; a signal could easily prevent it from doing
>>> so without interrupting the delegreturn call itself.
>>> IOW: for complete information when debugging races here, you really
>>> need to examine the return value from the wait call.
>>>
>>>> Speaking of there is a race between state manager thread returning
>>>> used delegations and new open. Previously I thought it was evict
>>>> inode...
>>>
>>> Is this with commit 5e99b532bb95 ("nfs4: reset states to use
>>> open_stateid when returning delegation voluntarily") applied?
>>
>> No I have not. I will try that. Thanks.
>
> This patch does not help. The race is still present.

OK. So what are the symptoms? I'm having trouble seeing how a race can
happen, given a correctly coded server.

2015-10-13 17:23:49

by Olga Kornievskaia

[permalink] [raw]

Subject: Re: [PATCH 1/1] Adding issync field to delegreturn_exit tracepoint

On Tue, Oct 13, 2015 at 12:56 PM, Trond Myklebust
<[email protected]> wrote:
> On Tue, Oct 13, 2015 at 12:24 PM, Olga Kornievskaia <[email protected]> wrote:
>> On Tue, Oct 13, 2015 at 10:13 AM, Olga Kornievskaia <[email protected]> wrote:
>>> On Tue, Oct 13, 2015 at 9:27 AM, Trond Myklebust
>>> <[email protected]> wrote:
>>>> On Tue, Oct 13, 2015 at 8:26 AM, Olga Kornievskaia <[email protected]> wrote:
>>>>>
>>>>> On Mon, Oct 12, 2015 at 11:47 PM, Trond Myklebust
>>>>> <[email protected]> wrote:
>>>>> > On Mon, Oct 12, 2015 at 5:55 PM, Olga Kornievskaia <[email protected]> wrote:
>>>>> >> It'll be nice to know when we return delegations synchronously or not.
>>>>> >
>>>>> > Why? This patch forces us to carry an otherwise completely unnecessary
>>>>> > parameter, so at the very minimum we should have a discussion of what
>>>>> > the real use cases are.
>>>>>
>>>>> I used it to diagnose the race of open and delegreturn. If it's kept
>>>>
>>>> How were you using it?
>>>
>>> I added two more traces points in the beginning of delegreturn and in
>>> nfs4_do_open before sending the rpc. I can see that a given file
>>> handle:
>>> -- delegreturn prepare tracepoint is happening,
>>> -- then the tracepoint of before sending the open is logged,
>>> -- then delegreturn prepare is logged again,
>>> -- then tracepoint for nfs4_open_file which is after receiving reply
>>> to the open from the server
>>> -- then delegreturn_exit tracepoint
>>>
>>> kworker/1:0H-14168 [001] .... 576.571636:
>>> nfs4_delegreturn_prepare: error=0 (OK) dev=00:2a fhandle=0x84792ca9
>>> issync=0
>>>
>>> hammer-13955 [000] .... 576.942632: nfs4_open_file_begin:
>>> flags=32768 (0x8000) fmode=READ|0x801c fileid=00:2a:0
>>> fhandle=0x00000000 name=00:2a:904/000002CB.ham
>>>
>>> hammer-13955 [001] .... 577.043084: nfs4_open_file:
>>> error=0 (OK) flags=32768 (0x8000) fmode=READ|0x801c fileid=00:2a:7708
>>> fhandle=0x84792ca9 name=00:2a:904/000002CB.ham
>>>
>>> kworker/0:1H-431 [000] .... 577.064013:
>>> nfs4_delegreturn_prepare: error=0 (OK) dev=00:2a fhandle=0x84792ca9
>>> issync=0
>>>
>>> kworker/0:1H-431 [000] .... 577.101076: nfs4_delegreturn_exit:
>>> error=0 (OK) dev=00:2a fhandle=0x84792ca9
>>>
>>> kworker/0:1H-431 [000] .... 577.113021: nfs4_read:
>>> error=-10025 (BAD_STATEID) fileid=00:2a:7708 fhandle=0x84792ca9
>>> offset=0 count=64
>>>
>>>
>>>>
>>>>> that some delegreturns are synchronous and others are not I think the
>>>>> information is useful.
>>>>
>>>> The only difference between synchronous and asynchronous in this case
>>>> is whether or not the process that launched the delegreturn actually
>>>> waits for it to complete; a signal could easily prevent it from doing
>>>> so without interrupting the delegreturn call itself.
>>>> IOW: for complete information when debugging races here, you really
>>>> need to examine the return value from the wait call.
>>>>
>>>>> Speaking of there is a race between state manager thread returning
>>>>> used delegations and new open. Previously I thought it was evict
>>>>> inode...
>>>>
>>>> Is this with commit 5e99b532bb95 ("nfs4: reset states to use
>>>> open_stateid when returning delegation voluntarily") applied?
>>>
>>> No I have not. I will try that. Thanks.
>>
>> This patch does not help. The race is still present.
>
> OK. So what are the symptoms? I'm having trouble seeing how a race can
> happen, given a correctly coded server.

Here's what the server sees:
open (foobar) replies back with a delegation
various operations including a close()
some time goes by...
open (foobar) replies back with the same delegation
delegreturn
read (foobar) using delegation

Here's what the client does:
open (foobar) gets a delegation, stores is
various operations
close (file)
state manager thread kicks off and marks delegation to be returned
-- at this point there are various that could have happened here.
one of which is delegation could be removed from the inode but
delegreturn is not yet on the wire. or it could mark the delegations
unreferenced but not yet return them as a new open comes in. a new
open could actually mark the delegation referenced but the state
manager when it regains control will proceed with returning it. there
is no check in nfs_do_return_delegation() that the delegation is
referenced again. but that's not the problem hit here i think.

same time another open comes
-- if delegation is removed from the inode, the open just proceeds.
-- say delegation is not yet removed from the inode but marked
RETURNING, the open won't use it but still proceeds with doing the
operation and it can (and does) goes on the wire before delegreturn.

Delegation->flags is a shared variable accessed by both state manager
thread and any other thread but it's never accessed under a lock. That
just seems wrong.

Shouldn't there be synchronization between returning the delegation
and new opens? In VFS code, evict inode code and new open are
synchronize that way.

2015-10-13 17:28:10

by Trond Myklebust

[permalink] [raw]

Subject: Re: [PATCH 1/1] Adding issync field to delegreturn_exit tracepoint

On Tue, Oct 13, 2015 at 1:23 PM, Olga Kornievskaia <[email protected]> wrote:
>
> On Tue, Oct 13, 2015 at 12:56 PM, Trond Myklebust
> <[email protected]> wrote:
> > On Tue, Oct 13, 2015 at 12:24 PM, Olga Kornievskaia <[email protected]> wrote:
> >> On Tue, Oct 13, 2015 at 10:13 AM, Olga Kornievskaia <[email protected]> wrote:
> >>> On Tue, Oct 13, 2015 at 9:27 AM, Trond Myklebust
> >>> <[email protected]> wrote:
> >>>> On Tue, Oct 13, 2015 at 8:26 AM, Olga Kornievskaia <[email protected]> wrote:
> >>>>>
> >>>>> On Mon, Oct 12, 2015 at 11:47 PM, Trond Myklebust
> >>>>> <[email protected]> wrote:
> >>>>> > On Mon, Oct 12, 2015 at 5:55 PM, Olga Kornievskaia <[email protected]> wrote:
> >>>>> >> It'll be nice to know when we return delegations synchronously or not.
> >>>>> >
> >>>>> > Why? This patch forces us to carry an otherwise completely unnecessary
> >>>>> > parameter, so at the very minimum we should have a discussion of what
> >>>>> > the real use cases are.
> >>>>>
> >>>>> I used it to diagnose the race of open and delegreturn. If it's kept
> >>>>
> >>>> How were you using it?
> >>>
> >>> I added two more traces points in the beginning of delegreturn and in
> >>> nfs4_do_open before sending the rpc. I can see that a given file
> >>> handle:
> >>> -- delegreturn prepare tracepoint is happening,
> >>> -- then the tracepoint of before sending the open is logged,
> >>> -- then delegreturn prepare is logged again,
> >>> -- then tracepoint for nfs4_open_file which is after receiving reply
> >>> to the open from the server
> >>> -- then delegreturn_exit tracepoint
> >>>
> >>> kworker/1:0H-14168 [001] .... 576.571636:
> >>> nfs4_delegreturn_prepare: error=0 (OK) dev=00:2a fhandle=0x84792ca9
> >>> issync=0
> >>>
> >>> hammer-13955 [000] .... 576.942632: nfs4_open_file_begin:
> >>> flags=32768 (0x8000) fmode=READ|0x801c fileid=00:2a:0
> >>> fhandle=0x00000000 name=00:2a:904/000002CB.ham
> >>>
> >>> hammer-13955 [001] .... 577.043084: nfs4_open_file:
> >>> error=0 (OK) flags=32768 (0x8000) fmode=READ|0x801c fileid=00:2a:7708
> >>> fhandle=0x84792ca9 name=00:2a:904/000002CB.ham
> >>>
> >>> kworker/0:1H-431 [000] .... 577.064013:
> >>> nfs4_delegreturn_prepare: error=0 (OK) dev=00:2a fhandle=0x84792ca9
> >>> issync=0
> >>>
> >>> kworker/0:1H-431 [000] .... 577.101076: nfs4_delegreturn_exit:
> >>> error=0 (OK) dev=00:2a fhandle=0x84792ca9
> >>>
> >>> kworker/0:1H-431 [000] .... 577.113021: nfs4_read:
> >>> error=-10025 (BAD_STATEID) fileid=00:2a:7708 fhandle=0x84792ca9
> >>> offset=0 count=64
> >>>
> >>>
> >>>>
> >>>>> that some delegreturns are synchronous and others are not I think the
> >>>>> information is useful.
> >>>>
> >>>> The only difference between synchronous and asynchronous in this case
> >>>> is whether or not the process that launched the delegreturn actually
> >>>> waits for it to complete; a signal could easily prevent it from doing
> >>>> so without interrupting the delegreturn call itself.
> >>>> IOW: for complete information when debugging races here, you really
> >>>> need to examine the return value from the wait call.
> >>>>
> >>>>> Speaking of there is a race between state manager thread returning
> >>>>> used delegations and new open. Previously I thought it was evict
> >>>>> inode...
> >>>>
> >>>> Is this with commit 5e99b532bb95 ("nfs4: reset states to use
> >>>> open_stateid when returning delegation voluntarily") applied?
> >>>
> >>> No I have not. I will try that. Thanks.
> >>
> >> This patch does not help. The race is still present.
> >
> > OK. So what are the symptoms? I'm having trouble seeing how a race can
> > happen, given a correctly coded server.
>
> Here's what the server sees:
> open (foobar) replies back with a delegation
> various operations including a close()
> some time goes by...
> open (foobar) replies back with the same delegation

Why? Olga, we already had this discussion. That sort of server
behaviour is never going to work without races and is the root cause
of your problem. We simply won't ever support servers that do this.

> delegreturn
> read (foobar) using delegation

2015-10-13 17:54:08

by Olga Kornievskaia

[permalink] [raw]

Subject: Re: [PATCH 1/1] Adding issync field to delegreturn_exit tracepoint

On Tue, Oct 13, 2015 at 1:28 PM, Trond Myklebust
<[email protected]> wrote:
> On Tue, Oct 13, 2015 at 1:23 PM, Olga Kornievskaia <[email protected]> wrote:
>>
>> On Tue, Oct 13, 2015 at 12:56 PM, Trond Myklebust
>> <[email protected]> wrote:
>> > On Tue, Oct 13, 2015 at 12:24 PM, Olga Kornievskaia <[email protected]> wrote:
>> >> On Tue, Oct 13, 2015 at 10:13 AM, Olga Kornievskaia <[email protected]> wrote:
>> >>> On Tue, Oct 13, 2015 at 9:27 AM, Trond Myklebust
>> >>> <[email protected]> wrote:
>> >>>> On Tue, Oct 13, 2015 at 8:26 AM, Olga Kornievskaia <[email protected]> wrote:
>> >>>>>
>> >>>>> On Mon, Oct 12, 2015 at 11:47 PM, Trond Myklebust
>> >>>>> <[email protected]> wrote:
>> >>>>> > On Mon, Oct 12, 2015 at 5:55 PM, Olga Kornievskaia <[email protected]> wrote:
>> >>>>> >> It'll be nice to know when we return delegations synchronously or not.
>> >>>>> >
>> >>>>> > Why? This patch forces us to carry an otherwise completely unnecessary
>> >>>>> > parameter, so at the very minimum we should have a discussion of what
>> >>>>> > the real use cases are.
>> >>>>>
>> >>>>> I used it to diagnose the race of open and delegreturn. If it's kept
>> >>>>
>> >>>> How were you using it?
>> >>>
>> >>> I added two more traces points in the beginning of delegreturn and in
>> >>> nfs4_do_open before sending the rpc. I can see that a given file
>> >>> handle:
>> >>> -- delegreturn prepare tracepoint is happening,
>> >>> -- then the tracepoint of before sending the open is logged,
>> >>> -- then delegreturn prepare is logged again,
>> >>> -- then tracepoint for nfs4_open_file which is after receiving reply
>> >>> to the open from the server
>> >>> -- then delegreturn_exit tracepoint
>> >>>
>> >>> kworker/1:0H-14168 [001] .... 576.571636:
>> >>> nfs4_delegreturn_prepare: error=0 (OK) dev=00:2a fhandle=0x84792ca9
>> >>> issync=0
>> >>>
>> >>> hammer-13955 [000] .... 576.942632: nfs4_open_file_begin:
>> >>> flags=32768 (0x8000) fmode=READ|0x801c fileid=00:2a:0
>> >>> fhandle=0x00000000 name=00:2a:904/000002CB.ham
>> >>>
>> >>> hammer-13955 [001] .... 577.043084: nfs4_open_file:
>> >>> error=0 (OK) flags=32768 (0x8000) fmode=READ|0x801c fileid=00:2a:7708
>> >>> fhandle=0x84792ca9 name=00:2a:904/000002CB.ham
>> >>>
>> >>> kworker/0:1H-431 [000] .... 577.064013:
>> >>> nfs4_delegreturn_prepare: error=0 (OK) dev=00:2a fhandle=0x84792ca9
>> >>> issync=0
>> >>>
>> >>> kworker/0:1H-431 [000] .... 577.101076: nfs4_delegreturn_exit:
>> >>> error=0 (OK) dev=00:2a fhandle=0x84792ca9
>> >>>
>> >>> kworker/0:1H-431 [000] .... 577.113021: nfs4_read:
>> >>> error=-10025 (BAD_STATEID) fileid=00:2a:7708 fhandle=0x84792ca9
>> >>> offset=0 count=64
>> >>>
>> >>>
>> >>>>
>> >>>>> that some delegreturns are synchronous and others are not I think the
>> >>>>> information is useful.
>> >>>>
>> >>>> The only difference between synchronous and asynchronous in this case
>> >>>> is whether or not the process that launched the delegreturn actually
>> >>>> waits for it to complete; a signal could easily prevent it from doing
>> >>>> so without interrupting the delegreturn call itself.
>> >>>> IOW: for complete information when debugging races here, you really
>> >>>> need to examine the return value from the wait call.
>> >>>>
>> >>>>> Speaking of there is a race between state manager thread returning
>> >>>>> used delegations and new open. Previously I thought it was evict
>> >>>>> inode...
>> >>>>
>> >>>> Is this with commit 5e99b532bb95 ("nfs4: reset states to use
>> >>>> open_stateid when returning delegation voluntarily") applied?
>> >>>
>> >>> No I have not. I will try that. Thanks.
>> >>
>> >> This patch does not help. The race is still present.
>> >
>> > OK. So what are the symptoms? I'm having trouble seeing how a race can
>> > happen, given a correctly coded server.
>>
>> Here's what the server sees:
>> open (foobar) replies back with a delegation
>> various operations including a close()
>> some time goes by...
>> open (foobar) replies back with the same delegation
>
> Why? Olga, we already had this discussion. That sort of server
> behaviour is never going to work without races and is the root cause
> of your problem. We simply won't ever support servers that do this.

Given that there is nothing in the spec about this situation, I think
it's not clear that server is acting incorrectly.

Do you think it is not possible to fix the client such that
open/delegreturn race is happening? Isn't VFS layer achieving the same
thing by synchronizing their evict inode code with the new opens?

>
>> delegreturn
>> read (foobar) using delegation

2015-10-13 17:58:30

by Andrew W Elble

[permalink] [raw]

Subject: Re: [PATCH 1/1] Adding issync field to delegreturn_exit tracepoint

Trond Myklebust <[email protected]> writes:

>> > OK. So what are the symptoms? I'm having trouble seeing how a race can
>> > happen, given a correctly coded server.
>>
>> Here's what the server sees:
>> open (foobar) replies back with a delegation
>> various operations including a close()
>> some time goes by...
>> open (foobar) replies back with the same delegation
>
> Why? Olga, we already had this discussion. That sort of server
> behaviour is never going to work without races and is the root cause
> of your problem. We simply won't ever support servers that do this.

Trond,

Specifically, what would the correct behavior be here?

Thanks,

Andy

--
Andrew W. Elble
[email protected]
Infrastructure Engineer, Communications Technical Lead
Rochester Institute of Technology
PGP: BFAD 8461 4CCF DC95 DA2C B0EB 965B 082E 863E C912

2015-10-13 18:10:34

by Trond Myklebust

[permalink] [raw]

Subject: Re: [PATCH 1/1] Adding issync field to delegreturn_exit tracepoint

On Tue, Oct 13, 2015 at 1:54 PM, Olga Kornievskaia <[email protected]> wrote:
> On Tue, Oct 13, 2015 at 1:28 PM, Trond Myklebust
> <[email protected]> wrote:
>> On Tue, Oct 13, 2015 at 1:23 PM, Olga Kornievskaia <[email protected]> wrote:
>>>
>>> On Tue, Oct 13, 2015 at 12:56 PM, Trond Myklebust
>>> <[email protected]> wrote:
>>> > On Tue, Oct 13, 2015 at 12:24 PM, Olga Kornievskaia <[email protected]> wrote:
>>> >> On Tue, Oct 13, 2015 at 10:13 AM, Olga Kornievskaia <[email protected]> wrote:
>>> >>> On Tue, Oct 13, 2015 at 9:27 AM, Trond Myklebust
>>> >>> <[email protected]> wrote:
>>> >>>> On Tue, Oct 13, 2015 at 8:26 AM, Olga Kornievskaia <[email protected]> wrote:
>>> >>>>>
>>> >>>>> On Mon, Oct 12, 2015 at 11:47 PM, Trond Myklebust
>>> >>>>> <[email protected]> wrote:
>>> >>>>> > On Mon, Oct 12, 2015 at 5:55 PM, Olga Kornievskaia <[email protected]> wrote:
>>> >>>>> >> It'll be nice to know when we return delegations synchronously or not.
>>> >>>>> >
>>> >>>>> > Why? This patch forces us to carry an otherwise completely unnecessary
>>> >>>>> > parameter, so at the very minimum we should have a discussion of what
>>> >>>>> > the real use cases are.
>>> >>>>>
>>> >>>>> I used it to diagnose the race of open and delegreturn. If it's kept
>>> >>>>
>>> >>>> How were you using it?
>>> >>>
>>> >>> I added two more traces points in the beginning of delegreturn and in
>>> >>> nfs4_do_open before sending the rpc. I can see that a given file
>>> >>> handle:
>>> >>> -- delegreturn prepare tracepoint is happening,
>>> >>> -- then the tracepoint of before sending the open is logged,
>>> >>> -- then delegreturn prepare is logged again,
>>> >>> -- then tracepoint for nfs4_open_file which is after receiving reply
>>> >>> to the open from the server
>>> >>> -- then delegreturn_exit tracepoint
>>> >>>
>>> >>> kworker/1:0H-14168 [001] .... 576.571636:
>>> >>> nfs4_delegreturn_prepare: error=0 (OK) dev=00:2a fhandle=0x84792ca9
>>> >>> issync=0
>>> >>>
>>> >>> hammer-13955 [000] .... 576.942632: nfs4_open_file_begin:
>>> >>> flags=32768 (0x8000) fmode=READ|0x801c fileid=00:2a:0
>>> >>> fhandle=0x00000000 name=00:2a:904/000002CB.ham
>>> >>>
>>> >>> hammer-13955 [001] .... 577.043084: nfs4_open_file:
>>> >>> error=0 (OK) flags=32768 (0x8000) fmode=READ|0x801c fileid=00:2a:7708
>>> >>> fhandle=0x84792ca9 name=00:2a:904/000002CB.ham
>>> >>>
>>> >>> kworker/0:1H-431 [000] .... 577.064013:
>>> >>> nfs4_delegreturn_prepare: error=0 (OK) dev=00:2a fhandle=0x84792ca9
>>> >>> issync=0
>>> >>>
>>> >>> kworker/0:1H-431 [000] .... 577.101076: nfs4_delegreturn_exit:
>>> >>> error=0 (OK) dev=00:2a fhandle=0x84792ca9
>>> >>>
>>> >>> kworker/0:1H-431 [000] .... 577.113021: nfs4_read:
>>> >>> error=-10025 (BAD_STATEID) fileid=00:2a:7708 fhandle=0x84792ca9
>>> >>> offset=0 count=64
>>> >>>
>>> >>>
>>> >>>>
>>> >>>>> that some delegreturns are synchronous and others are not I think the
>>> >>>>> information is useful.
>>> >>>>
>>> >>>> The only difference between synchronous and asynchronous in this case
>>> >>>> is whether or not the process that launched the delegreturn actually
>>> >>>> waits for it to complete; a signal could easily prevent it from doing
>>> >>>> so without interrupting the delegreturn call itself.
>>> >>>> IOW: for complete information when debugging races here, you really
>>> >>>> need to examine the return value from the wait call.
>>> >>>>
>>> >>>>> Speaking of there is a race between state manager thread returning
>>> >>>>> used delegations and new open. Previously I thought it was evict
>>> >>>>> inode...
>>> >>>>
>>> >>>> Is this with commit 5e99b532bb95 ("nfs4: reset states to use
>>> >>>> open_stateid when returning delegation voluntarily") applied?
>>> >>>
>>> >>> No I have not. I will try that. Thanks.
>>> >>
>>> >> This patch does not help. The race is still present.
>>> >
>>> > OK. So what are the symptoms? I'm having trouble seeing how a race can
>>> > happen, given a correctly coded server.
>>>
>>> Here's what the server sees:
>>> open (foobar) replies back with a delegation
>>> various operations including a close()
>>> some time goes by...
>>> open (foobar) replies back with the same delegation
>>
>> Why? Olga, we already had this discussion. That sort of server
>> behaviour is never going to work without races and is the root cause
>> of your problem. We simply won't ever support servers that do this.
>
> Given that there is nothing in the spec about this situation, I think
> it's not clear that server is acting incorrectly.
>
> Do you think it is not possible to fix the client such that
> open/delegreturn race is happening? Isn't VFS layer achieving the same
> thing by synchronizing their evict inode code with the new opens?

The only 2 ways to prevent this race on the client are:

1) To strictly order _all_ NFSv4 OPENs on the entire filesystem with
_all_ DELEGRETURNs on that same filesystem.
2) Alternatively on NFSv4.1+ only, to replace every single OPEN by
name RPC call with 2 synchronous RPC calls: one doing a LOOKUP and
then a second doing OPEN-by-filehandle after the client has checked
whether or not it holds a delegation so that we make sure we can
serialise the OPEN and the delegation return.

The bottom line is that I'm not going to screw over performance
against all other servers on the planet just in order to save someone
the bother of typing in the 2-3 lines it would take to check that this
delegation was already handed out (and note that the mere existence of
a valid stateid on the server is proof of that).

2015-10-13 19:46:57

by Trond Myklebust

[permalink] [raw]

Subject: Re: [PATCH 1/1] Adding issync field to delegreturn_exit tracepoint

On Tue, Oct 13, 2015 at 1:58 PM, Andrew W Elble <[email protected]> wrote:
>
> Trond Myklebust <[email protected]> writes:
>
> >> > OK. So what are the symptoms? I'm having trouble seeing how a race can
> >> > happen, given a correctly coded server.
> >>
> >> Here's what the server sees:
> >> open (foobar) replies back with a delegation
> >> various operations including a close()
> >> some time goes by...
> >> open (foobar) replies back with the same delegation
> >
> > Why? Olga, we already had this discussion. That sort of server
> > behaviour is never going to work without races and is the root cause
> > of your problem. We simply won't ever support servers that do this.
>
> Trond,
>
> Specifically, what would the correct behavior be here?

The server should honour the open without repeating the delegation.

The client already knows about the delegation due to the first open,
so there is no value whatsoever in repeating it. In addition, as you
see from Olga's example, it causes races with the return of that
delegation.

2015-10-13 19:58:51

by Andrew W Elble

[permalink] [raw]

Subject: Re: [PATCH 1/1] Adding issync field to delegreturn_exit tracepoint

> The server should honour the open without repeating the delegation.
>
> The client already knows about the delegation due to the first open,
> so there is no value whatsoever in repeating it. In addition, as you
> see from Olga's example, it causes races with the return of that
> delegation.

Thanks for the clarification, I'll be updating my v2 patch to suppress both
types of delegations ("duplicate" and "repeated").

Thanks,

Andy

--
Andrew W. Elble
[email protected]
Infrastructure Engineer, Communications Technical Lead
Rochester Institute of Technology
PGP: BFAD 8461 4CCF DC95 DA2C B0EB 965B 082E 863E C912

2015-10-14 14:58:40

by Olga Kornievskaia

[permalink] [raw]

Subject: Re: [PATCH 1/1] Adding issync field to delegreturn_exit tracepoint

On Tue, Oct 13, 2015 at 3:46 PM, Trond Myklebust
<[email protected]> wrote:
> On Tue, Oct 13, 2015 at 1:58 PM, Andrew W Elble <[email protected]> wrote:
>>
>> Trond Myklebust <[email protected]> writes:
>>
>> >> > OK. So what are the symptoms? I'm having trouble seeing how a race can
>> >> > happen, given a correctly coded server.
>> >>
>> >> Here's what the server sees:
>> >> open (foobar) replies back with a delegation
>> >> various operations including a close()
>> >> some time goes by...
>> >> open (foobar) replies back with the same delegation
>> >
>> > Why? Olga, we already had this discussion. That sort of server
>> > behaviour is never going to work without races and is the root cause
>> > of your problem. We simply won't ever support servers that do this.
>>
>> Trond,
>>
>> Specifically, what would the correct behavior be here?
>
> The server should honour the open without repeating the delegation.
>
> The client already knows about the delegation due to the first open,
> so there is no value whatsoever in repeating it. In addition, as you
> see from Olga's example, it causes races with the return of that
> delegation.

Trond thank you for the explanations. While I haven't hit the last
case of the race I would still like to bring up that we've seen a case
where ACCESS is sent and then DELEGRETURN and then delegation stateid
is used by the IO. Perhaps the commit 5e99b532bb95 helps in that case.
But if not, then this case of the race can't be handled by the server.

2015-10-14 17:14:44

by Trond Myklebust

[permalink] [raw]

Subject: Re: [PATCH 1/1] Adding issync field to delegreturn_exit tracepoint

On Wed, Oct 14, 2015 at 10:58 AM, Olga Kornievskaia <[email protected]> wrote:
> On Tue, Oct 13, 2015 at 3:46 PM, Trond Myklebust
> <[email protected]> wrote:
>> On Tue, Oct 13, 2015 at 1:58 PM, Andrew W Elble <[email protected]> wrote:
>>>
>>> Trond Myklebust <[email protected]> writes:
>>>
>>> >> > OK. So what are the symptoms? I'm having trouble seeing how a race can
>>> >> > happen, given a correctly coded server.
>>> >>
>>> >> Here's what the server sees:
>>> >> open (foobar) replies back with a delegation
>>> >> various operations including a close()
>>> >> some time goes by...
>>> >> open (foobar) replies back with the same delegation
>>> >
>>> > Why? Olga, we already had this discussion. That sort of server
>>> > behaviour is never going to work without races and is the root cause
>>> > of your problem. We simply won't ever support servers that do this.
>>>
>>> Trond,
>>>
>>> Specifically, what would the correct behavior be here?
>>
>> The server should honour the open without repeating the delegation.
>>
>> The client already knows about the delegation due to the first open,
>> so there is no value whatsoever in repeating it. In addition, as you
>> see from Olga's example, it causes races with the return of that
>> delegation.
>
> Trond thank you for the explanations. While I haven't hit the last
> case of the race I would still like to bring up that we've seen a case
> where ACCESS is sent and then DELEGRETURN and then delegation stateid
> is used by the IO. Perhaps the commit 5e99b532bb95 helps in that case.
> But if not, then this case of the race can't be handled by the server.

That's a different bug, and is indeed a client issue. It should be
fixed by commit 24311f884189 + 5e99b532bb95. Please let me know if
that is not the case.