2017-03-16 18:05:50

by Olga Kornievskaia

[permalink] [raw]
Subject: question about open_owner sequencing

Hi folks,

I have a question about recovery from the BAD_SEQID and what should happen.

I have the following application that does:

1. open(file1)
2. open(file2)
3. close(file1)
4. open(file3)
5. lock(file2)

If CLOSE gets BAD_SEQID (for whatever reason), I see that LOCK later
fails with BAD_SEQID as well.

step1 OPEN creates open_owner1 seq 0
step2 OPEN uses open_owner1 seq1
step3 CLOSE uses open_owner1 seq2 gets BAD_SEQID
step4 OPEN sends new open_owner2 seq2 and it triggers OPEN_CONFIRM with seq3
step5 sends LOCK with seq4 and open stateid from the reply in step 2.

LOCK gets BAD_SEQID.

Question: is client sending something incorrect? is server not
correct? I tested against two different servers (Linux and NetApp) and
both reply the same way so I'm leaning towards "no". But I don't see
why "seq4" is not a valid sequence given that the open_owner/sequence
was just confirmed.

Thanks.


2017-03-17 18:14:28

by Frank Filz

[permalink] [raw]
Subject: RE: question about open_owner sequencing

Hi folks,
>
> I have a question about recovery from the BAD_SEQID and what should
> happen.
>
> I have the following application that does:
>
> 1. open(file1)
> 2. open(file2)
> 3. close(file1)
> 4. open(file3)
> 5. lock(file2)
>
> If CLOSE gets BAD_SEQID (for whatever reason), I see that LOCK later fails
> with BAD_SEQID as well.
>
> step1 OPEN creates open_owner1 seq 0
> step2 OPEN uses open_owner1 seq1
> step3 CLOSE uses open_owner1 seq2 gets BAD_SEQID
> step4 OPEN sends new open_owner2 seq2 and it triggers OPEN_CONFIRM
> with seq3
> step5 sends LOCK with seq4 and open stateid from the reply in step 2.
>
> LOCK gets BAD_SEQID.
>
> Question: is client sending something incorrect? is server not correct? I
> tested against two different servers (Linux and NetApp) and both reply the
> same way so I'm leaning towards "no". But I don't see why "seq4" is not a
> valid sequence given that the open_owner/sequence was just confirmed.

Wait step4 is using a new open owner? Each open owner has its own seqid (assuming this is V4.0, owner seqid doesn't apply to 4.1 since the sequencing is done for the session with the SEQUENCE op).

Frank


---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus


2017-03-17 20:35:38

by Olga Kornievskaia

[permalink] [raw]
Subject: Re: question about open_owner sequencing

On Fri, Mar 17, 2017 at 1:45 PM, Frank Filz <[email protected]> wrote:
> Hi folks,
>>
>> I have a question about recovery from the BAD_SEQID and what should
>> happen.
>>
>> I have the following application that does:
>>
>> 1. open(file1)
>> 2. open(file2)
>> 3. close(file1)
>> 4. open(file3)
>> 5. lock(file2)
>>
>> If CLOSE gets BAD_SEQID (for whatever reason), I see that LOCK later fails
>> with BAD_SEQID as well.
>>
>> step1 OPEN creates open_owner1 seq 0
>> step2 OPEN uses open_owner1 seq1
>> step3 CLOSE uses open_owner1 seq2 gets BAD_SEQID
>> step4 OPEN sends new open_owner2 seq2 and it triggers OPEN_CONFIRM
>> with seq3
>> step5 sends LOCK with seq4 and open stateid from the reply in step 2.
>>
>> LOCK gets BAD_SEQID.
>>
>> Question: is client sending something incorrect? is server not correct? I
>> tested against two different servers (Linux and NetApp) and both reply the
>> same way so I'm leaning towards "no". But I don't see why "seq4" is not a
>> valid sequence given that the open_owner/sequence was just confirmed.
>
> Wait step4 is using a new open owner? Each open owner has its own seqid (assuming this is V4.0, owner seqid doesn't apply to 4.1 since the sequencing is done for the session with the SEQUENCE op).

Yes this is v4.0. Yes step4 uses new open owner but seq# doesn't go to
0. This is the new behavior to not drop the open owner as per the
following commit (below).

Since LOCK just has the seq# (and not a value of the open_owner) I
thought it's be the "valid" (current) open owner which would be
open_owner2.

So after step4, are the 2 open owners then: one with value open_owner1
(seq2) and one with value open_owner2 (seq3). And then since LOCK is
associated with the OPEN from step1 and then open_owner 1, then should
it send send seq2?

Neil, when would the client remove this open owner1 that would have
been removed prior to this patch?

commit 86cfb0418537460baf0de0b5e9253784be27a6f9
Author: NeilBrown <[email protected]>
Date: Mon Dec 19 11:48:23 2016 +1100

NFS: Don't disconnect open-owner on NFS4ERR_BAD_SEQID

When an NFS4ERR_BAD_SEQID is received the open-owner is removed from
the ->state_owners rbtree so that it will no longer be used.

If any stateids attached to this open-owner are still in use, and if a
request using one gets an NFS4ERR_BAD_STATEID reply, this can for bad.

The state is marked as needing recovery and the nfs4_state_manager()
is scheduled to clean up. nfs4_state_manager() finds states to be
recovered by walking the state_owners rbtree. As the open-owner is
not in the rbtree, the bad state is not found so nfs4_state_manager()
completes having done nothing. The request is then retried, with a
predicatable result (indefinite retries).

If the stateid is for a delegation, this open_owner will be used
to open files when the delegation is returned. For that to work,
a new open-owner needs to be presented to the server.

This patch changes NFS4ERR_BAD_SEQID handling to leave the open-owner
in the rbtree but updates the 'create_time' so it looks like a new
open-owner. With this the indefinite retries no longer happen.

Signed-off-by: NeilBrown <[email protected]>
Signed-off-by: Trond Myklebust <[email protected]>


>
> Frank
>
>
> ---
> This email has been checked for viruses by Avast antivirus software.
> https://www.avast.com/antivirus
>

2017-03-17 21:09:32

by Frank Filz

[permalink] [raw]
Subject: RE: question about open_owner sequencing

> On Fri, Mar 17, 2017 at 1:45 PM, Frank Filz <[email protected]> wrote:
> > Hi folks,
> >>
> >> I have a question about recovery from the BAD_SEQID and what should
> >> happen.
> >>
> >> I have the following application that does:
> >>
> >> 1. open(file1)
> >> 2. open(file2)
> >> 3. close(file1)
> >> 4. open(file3)
> >> 5. lock(file2)
> >>
> >> If CLOSE gets BAD_SEQID (for whatever reason), I see that LOCK later
> >> fails with BAD_SEQID as well.
> >>
> >> step1 OPEN creates open_owner1 seq 0
> >> step2 OPEN uses open_owner1 seq1
> >> step3 CLOSE uses open_owner1 seq2 gets BAD_SEQID
> >> step4 OPEN sends new open_owner2 seq2 and it triggers
> OPEN_CONFIRM
> >> with seq3
> >> step5 sends LOCK with seq4 and open stateid from the reply in step 2.
> >>
> >> LOCK gets BAD_SEQID.
> >>
> >> Question: is client sending something incorrect? is server not
> >> correct? I tested against two different servers (Linux and NetApp)
> >> and both reply the same way so I'm leaning towards "no". But I don't
> >> see why "seq4" is not a valid sequence given that the
> open_owner/sequence was just confirmed.
> >
> > Wait step4 is using a new open owner? Each open owner has its own seqid
> (assuming this is V4.0, owner seqid doesn't apply to 4.1 since the sequencing
> is done for the session with the SEQUENCE op).
>
> Yes this is v4.0. Yes step4 uses new open owner but seq# doesn't go to 0.
> This is the new behavior to not drop the open owner as per the following
> commit (below).
>
> Since LOCK just has the seq# (and not a value of the open_owner) I thought
> it's be the "valid" (current) open owner which would be open_owner2.

Hmm, so in step5, there is not yet a lock stateid?

So it's using this form of the lock?

struct open_to_lock_owner4 {
seqid4 open_seqid;
stateid4 open_stateid;
seqid4 lock_seqid;
lock_owner4 lock_owner;

If so, open_seqid should be 3, lock_seqid can be anything.

At least that's my reading. But I'm not sure how client is supposed to recover from BAD_SEQID...

Frank

> So after step4, are the 2 open owners then: one with value open_owner1
> (seq2) and one with value open_owner2 (seq3). And then since LOCK is
> associated with the OPEN from step1 and then open_owner 1, then should it
> send send seq2?
>
> Neil, when would the client remove this open owner1 that would have been
> removed prior to this patch?
>
> commit 86cfb0418537460baf0de0b5e9253784be27a6f9
> Author: NeilBrown <[email protected]>
> Date: Mon Dec 19 11:48:23 2016 +1100
>
> NFS: Don't disconnect open-owner on NFS4ERR_BAD_SEQID
>
> When an NFS4ERR_BAD_SEQID is received the open-owner is removed
> from
> the ->state_owners rbtree so that it will no longer be used.
>
> If any stateids attached to this open-owner are still in use, and if a
> request using one gets an NFS4ERR_BAD_STATEID reply, this can for bad.
>
> The state is marked as needing recovery and the nfs4_state_manager()
> is scheduled to clean up. nfs4_state_manager() finds states to be
> recovered by walking the state_owners rbtree. As the open-owner is
> not in the rbtree, the bad state is not found so nfs4_state_manager()
> completes having done nothing. The request is then retried, with a
> predicatable result (indefinite retries).
>
> If the stateid is for a delegation, this open_owner will be used
> to open files when the delegation is returned. For that to work,
> a new open-owner needs to be presented to the server.
>
> This patch changes NFS4ERR_BAD_SEQID handling to leave the open-
> owner
> in the rbtree but updates the 'create_time' so it looks like a new
> open-owner. With this the indefinite retries no longer happen.
>
> Signed-off-by: NeilBrown <[email protected]>
> Signed-off-by: Trond Myklebust <[email protected]>
>
>
> >
> > Frank
> >
> >
> > ---
> > This email has been checked for viruses by Avast antivirus software.
> > https://www.avast.com/antivirus
> >


---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus


2017-03-17 21:20:03

by Olga Kornievskaia

[permalink] [raw]
Subject: Re: question about open_owner sequencing

On Fri, Mar 17, 2017 at 4:55 PM, Frank Filz <[email protected]> wrote:
>> On Fri, Mar 17, 2017 at 1:45 PM, Frank Filz <[email protected]> wrote:
>> > Hi folks,
>> >>
>> >> I have a question about recovery from the BAD_SEQID and what should
>> >> happen.
>> >>
>> >> I have the following application that does:
>> >>
>> >> 1. open(file1)
>> >> 2. open(file2)
>> >> 3. close(file1)
>> >> 4. open(file3)
>> >> 5. lock(file2)
>> >>
>> >> If CLOSE gets BAD_SEQID (for whatever reason), I see that LOCK later
>> >> fails with BAD_SEQID as well.
>> >>
>> >> step1 OPEN creates open_owner1 seq 0
>> >> step2 OPEN uses open_owner1 seq1
>> >> step3 CLOSE uses open_owner1 seq2 gets BAD_SEQID
>> >> step4 OPEN sends new open_owner2 seq2 and it triggers
>> OPEN_CONFIRM
>> >> with seq3
>> >> step5 sends LOCK with seq4 and open stateid from the reply in step 2.
>> >>
>> >> LOCK gets BAD_SEQID.
>> >>
>> >> Question: is client sending something incorrect? is server not
>> >> correct? I tested against two different servers (Linux and NetApp)
>> >> and both reply the same way so I'm leaning towards "no". But I don't
>> >> see why "seq4" is not a valid sequence given that the
>> open_owner/sequence was just confirmed.
>> >
>> > Wait step4 is using a new open owner? Each open owner has its own seqid
>> (assuming this is V4.0, owner seqid doesn't apply to 4.1 since the sequencing
>> is done for the session with the SEQUENCE op).
>>
>> Yes this is v4.0. Yes step4 uses new open owner but seq# doesn't go to 0.
>> This is the new behavior to not drop the open owner as per the following
>> commit (below).
>>
>> Since LOCK just has the seq# (and not a value of the open_owner) I thought
>> it's be the "valid" (current) open owner which would be open_owner2.
>
> Hmm, so in step5, there is not yet a lock stateid?
>
> So it's using this form of the lock?
>
> struct open_to_lock_owner4 {
> seqid4 open_seqid;
> stateid4 open_stateid;
> seqid4 lock_seqid;
> lock_owner4 lock_owner;
>
> If so, open_seqid should be 3, lock_seqid can be anything.

Why is it 3? As far as I can tell, 3 is not a valid seq_id for either
open_owner1 or open_owner2. open_owner1 is left at seq_id=2 (because
after "using" seq2 on the CLOSE it got BAD_SEQID so seq_id isn't
incremented) and open_owner2 would have seq_id=4 (OPEN_CONFIRM used up
3)?

2017-03-17 21:39:55

by Frank Filz

[permalink] [raw]
Subject: RE: question about open_owner sequencing

> On Fri, Mar 17, 2017 at 4:55 PM, Frank Filz <[email protected]> wrote:
> >> On Fri, Mar 17, 2017 at 1:45 PM, Frank Filz <[email protected]>
> wrote:
> >> > Hi folks,
> >> >>
> >> >> I have a question about recovery from the BAD_SEQID and what
> >> >> should happen.
> >> >>
> >> >> I have the following application that does:
> >> >>
> >> >> 1. open(file1)
> >> >> 2. open(file2)
> >> >> 3. close(file1)
> >> >> 4. open(file3)
> >> >> 5. lock(file2)
> >> >>
> >> >> If CLOSE gets BAD_SEQID (for whatever reason), I see that LOCK
> >> >> later fails with BAD_SEQID as well.
> >> >>
> >> >> step1 OPEN creates open_owner1 seq 0
> >> >> step2 OPEN uses open_owner1 seq1
> >> >> step3 CLOSE uses open_owner1 seq2 gets BAD_SEQID
> >> >> step4 OPEN sends new open_owner2 seq2 and it triggers
> >> OPEN_CONFIRM
> >> >> with seq3
> >> >> step5 sends LOCK with seq4 and open stateid from the reply in step 2.
> >> >>
> >> >> LOCK gets BAD_SEQID.
> >> >>
> >> >> Question: is client sending something incorrect? is server not
> >> >> correct? I tested against two different servers (Linux and NetApp)
> >> >> and both reply the same way so I'm leaning towards "no". But I
> >> >> don't see why "seq4" is not a valid sequence given that the
> >> open_owner/sequence was just confirmed.
> >> >
> >> > Wait step4 is using a new open owner? Each open owner has its own
> >> > seqid
> >> (assuming this is V4.0, owner seqid doesn't apply to 4.1 since the
> >> sequencing is done for the session with the SEQUENCE op).
> >>
> >> Yes this is v4.0. Yes step4 uses new open owner but seq# doesn't go to 0.
> >> This is the new behavior to not drop the open owner as per the
> >> following commit (below).
> >>
> >> Since LOCK just has the seq# (and not a value of the open_owner) I
> >> thought it's be the "valid" (current) open owner which would be
> open_owner2.
> >
> > Hmm, so in step5, there is not yet a lock stateid?
> >
> > So it's using this form of the lock?
> >
> > struct open_to_lock_owner4 {
> > seqid4 open_seqid;
> > stateid4 open_stateid;
> > seqid4 lock_seqid;
> > lock_owner4 lock_owner;
> >
> > If so, open_seqid should be 3, lock_seqid can be anything.
>
> Why is it 3? As far as I can tell, 3 is not a valid seq_id for either
> open_owner1 or open_owner2. open_owner1 is left at seq_id=2 (because
> after "using" seq2 on the CLOSE it got BAD_SEQID so seq_id isn't
> incremented) and open_owner2 would have seq_id=4 (OPEN_CONFIRM
> used up 3)?
>
> From 7530 section 16.10.5:
>
> Note that
> although the open-owner is not given explicitly, the open_seqid
> associated with it is used to check for open-owner sequencing
> issues. This case provides a method to use the established state
> of the open_stateid to transition to the use of a lock stateid.

I'd love to understand what caused the BAD_SEQID, because I thought the close SHOULD use seqid 2

Hmm, if the stateid really is still valid, the lock should use open_seqid 1, the lock doesn't change the state of the open. I think... darn, this stuff is confusing...

I know I bumbled through some of this with Ganesha. To the extent that has pynfs tests for seqid, Ganesha does what pynfs expects...

Use 4.1 :-)

Frank


---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus