LinuxLists.cc - Question about XID use in sunrpc

2016-02-05 15:37:53

Subject: Question about XID use in sunrpc

I have a question regarding the implementation of sunrpc use of XID
when the client receives an AUTH_ERROR. The code (clnt.c line 1933)
explicitly comments that a new XID should be acquired and releases the
currently rpc task (and gets a new one). Why is that? Since the
operation is "replayed" but with the new credentials, why shouldn't
the same XID be used?

The RPC RFC says that XID is used by the server to detect
retransmissions. It's not clear if in the specs means "retransmission"
== tcp retransmissions. If so then it explains why the client uses the
same XID.

Thank you.

2016-02-05 16:44:11

by Trond Myklebust

[permalink] [raw]

Subject: Re: Question about XID use in sunrpc

On Fri, Feb 5, 2016 at 10:37 AM, Olga Kornievskaia <[email protected]> wrote:
> I have a question regarding the implementation of sunrpc use of XID
> when the client receives an AUTH_ERROR. The code (clnt.c line 1933)
> explicitly comments that a new XID should be acquired and releases the
> currently rpc task (and gets a new one). Why is that? Since the
> operation is "replayed" but with the new credentials, why shouldn't
> the same XID be used?
>
> The RPC RFC says that XID is used by the server to detect
> retransmissions. It's not clear if in the specs means "retransmission"
> == tcp retransmissions. If so then it explains why the client uses the
> same XID.
>

The questions you are asking come under the header "RPC lore" rather
than "RPC law". The use of XIDs as a basis for replay caching is not
speced out in any RFC. The closest thing we have in the form of
documentation is Ric Werme's presentation at the 1996 Connectathon:
http://nfsv4bat.org/Documents/ConnectAThon/1996/werme1.pdf

Basically, those comments are there in the Linux code to denote issues
found when interoperability testing with server implementations that
are probably now long dead, but might still be in use somewhere.

Cheers
Trond

2016-02-05 17:01:22

by Olga Kornievskaia

[permalink] [raw]

Subject: Re: Question about XID use in sunrpc

On Fri, Feb 5, 2016 at 11:44 AM, Trond Myklebust
<[email protected]> wrote:
> On Fri, Feb 5, 2016 at 10:37 AM, Olga Kornievskaia <[email protected]> wrote:
>> I have a question regarding the implementation of sunrpc use of XID
>> when the client receives an AUTH_ERROR. The code (clnt.c line 1933)
>> explicitly comments that a new XID should be acquired and releases the
>> currently rpc task (and gets a new one). Why is that? Since the
>> operation is "replayed" but with the new credentials, why shouldn't
>> the same XID be used?
>>
>> The RPC RFC says that XID is used by the server to detect
>> retransmissions. It's not clear if in the specs means "retransmission"
>> == tcp retransmissions. If so then it explains why the client uses the
>> same XID.
>>
>
> The questions you are asking come under the header "RPC lore" rather
> than "RPC law". The use of XIDs as a basis for replay caching is not
> speced out in any RFC. The closest thing we have in the form of
> documentation is Ric Werme's presentation at the 1996 Connectathon:
> http://nfsv4bat.org/Documents/ConnectAThon/1996/werme1.pdf
>
> Basically, those comments are there in the Linux code to denote issues
> found when interoperability testing with server implementations that
> are probably now long dead, but might still be in use somewhere.

Would you consider changing this to use the same XID in case of
redoing the operation due to the AUTH_ERROR?

The issue it causes (one of the) server's implementation is of the
following nature:
1. client sends an operation to the server. the server process the
operation but before replying back to the server has an issue and
resets the connection.
2. client re-establishes the connection and replays the RPC. the
server now fails with the AUTH_ERROR.
3. client establishes a new connection and replays the same NFS
operation over the new XID. The server cached the operation but since
the last operation arrives with the new XID it won't find the entry in
the cache. It's problematic when the operation is like REMOVE.

I realize this is why nfs4.1 session were introduce to solve these
non-idenpotency issues but using the same XID seems like the right
idea since it is the same operation.

If you don't have objections to the change, I can ask on the IETF list
to see if any servers will object to such change.

>
> Cheers
> Trond

2016-02-05 18:31:24

by Trond Myklebust

[permalink] [raw]

Subject: Re: Question about XID use in sunrpc

On Fri, Feb 5, 2016 at 12:01 PM, Olga Kornievskaia <[email protected]> wrote:
> On Fri, Feb 5, 2016 at 11:44 AM, Trond Myklebust
> <[email protected]> wrote:
>> On Fri, Feb 5, 2016 at 10:37 AM, Olga Kornievskaia <[email protected]> wrote:
>>> I have a question regarding the implementation of sunrpc use of XID
>>> when the client receives an AUTH_ERROR. The code (clnt.c line 1933)
>>> explicitly comments that a new XID should be acquired and releases the
>>> currently rpc task (and gets a new one). Why is that? Since the
>>> operation is "replayed" but with the new credentials, why shouldn't
>>> the same XID be used?
>>>
>>> The RPC RFC says that XID is used by the server to detect
>>> retransmissions. It's not clear if in the specs means "retransmission"
>>> == tcp retransmissions. If so then it explains why the client uses the
>>> same XID.
>>>
>>
>> The questions you are asking come under the header "RPC lore" rather
>> than "RPC law". The use of XIDs as a basis for replay caching is not
>> speced out in any RFC. The closest thing we have in the form of
>> documentation is Ric Werme's presentation at the 1996 Connectathon:
>> http://nfsv4bat.org/Documents/ConnectAThon/1996/werme1.pdf
>>
>> Basically, those comments are there in the Linux code to denote issues
>> found when interoperability testing with server implementations that
>> are probably now long dead, but might still be in use somewhere.
>
> Would you consider changing this to use the same XID in case of
> redoing the operation due to the AUTH_ERROR?
>
> The issue it causes (one of the) server's implementation is of the
> following nature:
> 1. client sends an operation to the server. the server process the
> operation but before replying back to the server has an issue and
> resets the connection.
> 2. client re-establishes the connection and replays the RPC. the
> server now fails with the AUTH_ERROR.
> 3. client establishes a new connection and replays the same NFS
> operation over the new XID. The server cached the operation but since
> the last operation arrives with the new XID it won't find the entry in
> the cache. It's problematic when the operation is like REMOVE.
>
> I realize this is why nfs4.1 session were introduce to solve these
> non-idenpotency issues but using the same XID seems like the right
> idea since it is the same operation.
>
> If you don't have objections to the change, I can ask on the IETF list
> to see if any servers will object to such change.

What you describe is a clear and obvious server bug. It is not a
client bug, and is not something that I'd find acceptable as
justification for changing the client code.

The server should not be replying AUTH_ERROR and then processing the
RPC anyway. That's not behaviour that is sanctioned by the RPC spec.

Cheers
Trond

2016-02-05 19:03:26

by Olga Kornievskaia

[permalink] [raw]

Subject: Re: Question about XID use in sunrpc

On Fri, Feb 5, 2016 at 1:31 PM, Trond Myklebust
<[email protected]> wrote:
> On Fri, Feb 5, 2016 at 12:01 PM, Olga Kornievskaia <[email protected]> wrote:
>> On Fri, Feb 5, 2016 at 11:44 AM, Trond Myklebust
>> <[email protected]> wrote:
>>> On Fri, Feb 5, 2016 at 10:37 AM, Olga Kornievskaia <[email protected]> wrote:
>>>> I have a question regarding the implementation of sunrpc use of XID
>>>> when the client receives an AUTH_ERROR. The code (clnt.c line 1933)
>>>> explicitly comments that a new XID should be acquired and releases the
>>>> currently rpc task (and gets a new one). Why is that? Since the
>>>> operation is "replayed" but with the new credentials, why shouldn't
>>>> the same XID be used?
>>>>
>>>> The RPC RFC says that XID is used by the server to detect
>>>> retransmissions. It's not clear if in the specs means "retransmission"
>>>> == tcp retransmissions. If so then it explains why the client uses the
>>>> same XID.
>>>>
>>>
>>> The questions you are asking come under the header "RPC lore" rather
>>> than "RPC law". The use of XIDs as a basis for replay caching is not
>>> speced out in any RFC. The closest thing we have in the form of
>>> documentation is Ric Werme's presentation at the 1996 Connectathon:
>>> http://nfsv4bat.org/Documents/ConnectAThon/1996/werme1.pdf
>>>
>>> Basically, those comments are there in the Linux code to denote issues
>>> found when interoperability testing with server implementations that
>>> are probably now long dead, but might still be in use somewhere.
>>
>> Would you consider changing this to use the same XID in case of
>> redoing the operation due to the AUTH_ERROR?
>>
>> The issue it causes (one of the) server's implementation is of the
>> following nature:
>> 1. client sends an operation to the server. the server process the
>> operation but before replying back to the server has an issue and
>> resets the connection.
>> 2. client re-establishes the connection and replays the RPC. the
>> server now fails with the AUTH_ERROR.
>> 3. client establishes a new connection and replays the same NFS
>> operation over the new XID. The server cached the operation but since
>> the last operation arrives with the new XID it won't find the entry in
>> the cache. It's problematic when the operation is like REMOVE.
>>
>> I realize this is why nfs4.1 session were introduce to solve these
>> non-idenpotency issues but using the same XID seems like the right
>> idea since it is the same operation.
>>
>> If you don't have objections to the change, I can ask on the IETF list
>> to see if any servers will object to such change.
>
> What you describe is a clear and obvious server bug. It is not a
> client bug, and is not something that I'd find acceptable as
> justification for changing the client code.
>
> The server should not be replying AUTH_ERROR and then processing the
> RPC anyway. That's not behaviour that is sanctioned by the RPC spec.

Perhaps I wasn't clear let me try again. In the first step, the server
processes request and does not reply with an AUTH_ERROR but instead
resets a connection but it has already populated it's replay cache.
Client reestablishes connection resends exactly the same bytes but
gets back an AUTH_ERROR (server does not process the operation). It's
the recovery from this error that's in question.

>
> Cheers
> Trond

2016-02-05 21:08:16

by Trond Myklebust

[permalink] [raw]

Subject: Re: Question about XID use in sunrpc

On Fri, Feb 5, 2016 at 2:03 PM, Olga Kornievskaia <[email protected]> wrote:
> On Fri, Feb 5, 2016 at 1:31 PM, Trond Myklebust
> <[email protected]> wrote:
>> On Fri, Feb 5, 2016 at 12:01 PM, Olga Kornievskaia <[email protected]> wrote:
>>> On Fri, Feb 5, 2016 at 11:44 AM, Trond Myklebust
>>> <[email protected]> wrote:
>>>> On Fri, Feb 5, 2016 at 10:37 AM, Olga Kornievskaia <[email protected]> wrote:
>>>>> I have a question regarding the implementation of sunrpc use of XID
>>>>> when the client receives an AUTH_ERROR. The code (clnt.c line 1933)
>>>>> explicitly comments that a new XID should be acquired and releases the
>>>>> currently rpc task (and gets a new one). Why is that? Since the
>>>>> operation is "replayed" but with the new credentials, why shouldn't
>>>>> the same XID be used?
>>>>>
>>>>> The RPC RFC says that XID is used by the server to detect
>>>>> retransmissions. It's not clear if in the specs means "retransmission"
>>>>> == tcp retransmissions. If so then it explains why the client uses the
>>>>> same XID.
>>>>>
>>>>
>>>> The questions you are asking come under the header "RPC lore" rather
>>>> than "RPC law". The use of XIDs as a basis for replay caching is not
>>>> speced out in any RFC. The closest thing we have in the form of
>>>> documentation is Ric Werme's presentation at the 1996 Connectathon:
>>>> http://nfsv4bat.org/Documents/ConnectAThon/1996/werme1.pdf
>>>>
>>>> Basically, those comments are there in the Linux code to denote issues
>>>> found when interoperability testing with server implementations that
>>>> are probably now long dead, but might still be in use somewhere.
>>>
>>> Would you consider changing this to use the same XID in case of
>>> redoing the operation due to the AUTH_ERROR?
>>>
>>> The issue it causes (one of the) server's implementation is of the
>>> following nature:
>>> 1. client sends an operation to the server. the server process the
>>> operation but before replying back to the server has an issue and
>>> resets the connection.
>>> 2. client re-establishes the connection and replays the RPC. the
>>> server now fails with the AUTH_ERROR.
>>> 3. client establishes a new connection and replays the same NFS
>>> operation over the new XID. The server cached the operation but since
>>> the last operation arrives with the new XID it won't find the entry in
>>> the cache. It's problematic when the operation is like REMOVE.
>>>
>>> I realize this is why nfs4.1 session were introduce to solve these
>>> non-idenpotency issues but using the same XID seems like the right
>>> idea since it is the same operation.
>>>
>>> If you don't have objections to the change, I can ask on the IETF list
>>> to see if any servers will object to such change.
>>
>> What you describe is a clear and obvious server bug. It is not a
>> client bug, and is not something that I'd find acceptable as
>> justification for changing the client code.
>>
>> The server should not be replying AUTH_ERROR and then processing the
>> RPC anyway. That's not behaviour that is sanctioned by the RPC spec.
>
> Perhaps I wasn't clear let me try again. In the first step, the server
> processes request and does not reply with an AUTH_ERROR but instead
> resets a connection but it has already populated it's replay cache.
> Client reestablishes connection resends exactly the same bytes but
> gets back an AUTH_ERROR (server does not process the operation). It's
> the recovery from this error that's in question.
>

Hi Olga,

I understood what you said, but you cannot have multiple replies to
the same RPC call. It doesn't matter if it was a replay, if the server
replies AUTH_ERROR, then it is saying "I'm not executing this".

Cheers
Trond

2016-02-05 21:38:56

by Olga Kornievskaia

[permalink] [raw]

Subject: Re: Question about XID use in sunrpc

On Fri, Feb 5, 2016 at 4:08 PM, Trond Myklebust
<[email protected]> wrote:
> On Fri, Feb 5, 2016 at 2:03 PM, Olga Kornievskaia <[email protected]> wrote:
>> On Fri, Feb 5, 2016 at 1:31 PM, Trond Myklebust
>> <[email protected]> wrote:
>>> On Fri, Feb 5, 2016 at 12:01 PM, Olga Kornievskaia <[email protected]> wrote:
>>>> On Fri, Feb 5, 2016 at 11:44 AM, Trond Myklebust
>>>> <[email protected]> wrote:
>>>>> On Fri, Feb 5, 2016 at 10:37 AM, Olga Kornievskaia <[email protected]> wrote:
>>>>>> I have a question regarding the implementation of sunrpc use of XID
>>>>>> when the client receives an AUTH_ERROR. The code (clnt.c line 1933)
>>>>>> explicitly comments that a new XID should be acquired and releases the
>>>>>> currently rpc task (and gets a new one). Why is that? Since the
>>>>>> operation is "replayed" but with the new credentials, why shouldn't
>>>>>> the same XID be used?
>>>>>>
>>>>>> The RPC RFC says that XID is used by the server to detect
>>>>>> retransmissions. It's not clear if in the specs means "retransmission"
>>>>>> == tcp retransmissions. If so then it explains why the client uses the
>>>>>> same XID.
>>>>>>
>>>>>
>>>>> The questions you are asking come under the header "RPC lore" rather
>>>>> than "RPC law". The use of XIDs as a basis for replay caching is not
>>>>> speced out in any RFC. The closest thing we have in the form of
>>>>> documentation is Ric Werme's presentation at the 1996 Connectathon:
>>>>> http://nfsv4bat.org/Documents/ConnectAThon/1996/werme1.pdf
>>>>>
>>>>> Basically, those comments are there in the Linux code to denote issues
>>>>> found when interoperability testing with server implementations that
>>>>> are probably now long dead, but might still be in use somewhere.
>>>>
>>>> Would you consider changing this to use the same XID in case of
>>>> redoing the operation due to the AUTH_ERROR?
>>>>
>>>> The issue it causes (one of the) server's implementation is of the
>>>> following nature:
>>>> 1. client sends an operation to the server. the server process the
>>>> operation but before replying back to the server has an issue and
>>>> resets the connection.
>>>> 2. client re-establishes the connection and replays the RPC. the
>>>> server now fails with the AUTH_ERROR.
>>>> 3. client establishes a new connection and replays the same NFS
>>>> operation over the new XID. The server cached the operation but since
>>>> the last operation arrives with the new XID it won't find the entry in
>>>> the cache. It's problematic when the operation is like REMOVE.
>>>>
>>>> I realize this is why nfs4.1 session were introduce to solve these
>>>> non-idenpotency issues but using the same XID seems like the right
>>>> idea since it is the same operation.
>>>>
>>>> If you don't have objections to the change, I can ask on the IETF list
>>>> to see if any servers will object to such change.
>>>
>>> What you describe is a clear and obvious server bug. It is not a
>>> client bug, and is not something that I'd find acceptable as
>>> justification for changing the client code.
>>>
>>> The server should not be replying AUTH_ERROR and then processing the
>>> RPC anyway. That's not behaviour that is sanctioned by the RPC spec.
>>
>> Perhaps I wasn't clear let me try again. In the first step, the server
>> processes request and does not reply with an AUTH_ERROR but instead
>> resets a connection but it has already populated it's replay cache.
>> Client reestablishes connection resends exactly the same bytes but
>> gets back an AUTH_ERROR (server does not process the operation). It's
>> the recovery from this error that's in question.
>>
>
> Hi Olga,
>
> I understood what you said, but you cannot have multiple replies to
> the same RPC call. It doesn't matter if it was a replay, if the server
> replies AUTH_ERROR, then it is saying "I'm not executing this".

But "this" could have already been executed.

>
> Cheers
> Trond

2016-02-05 22:37:50

by Olga Kornievskaia

[permalink] [raw]

Subject: Re: Question about XID use in sunrpc

On Fri, Feb 5, 2016 at 4:38 PM, Olga Kornievskaia <[email protected]> wrote:
> On Fri, Feb 5, 2016 at 4:08 PM, Trond Myklebust
> <[email protected]> wrote:
>> On Fri, Feb 5, 2016 at 2:03 PM, Olga Kornievskaia <[email protected]> wrote:
>>> On Fri, Feb 5, 2016 at 1:31 PM, Trond Myklebust
>>> <[email protected]> wrote:
>>>> On Fri, Feb 5, 2016 at 12:01 PM, Olga Kornievskaia <[email protected]> wrote:
>>>>> On Fri, Feb 5, 2016 at 11:44 AM, Trond Myklebust
>>>>> <[email protected]> wrote:
>>>>>> On Fri, Feb 5, 2016 at 10:37 AM, Olga Kornievskaia <[email protected]> wrote:
>>>>>>> I have a question regarding the implementation of sunrpc use of XID
>>>>>>> when the client receives an AUTH_ERROR. The code (clnt.c line 1933)
>>>>>>> explicitly comments that a new XID should be acquired and releases the
>>>>>>> currently rpc task (and gets a new one). Why is that? Since the
>>>>>>> operation is "replayed" but with the new credentials, why shouldn't
>>>>>>> the same XID be used?
>>>>>>>
>>>>>>> The RPC RFC says that XID is used by the server to detect
>>>>>>> retransmissions. It's not clear if in the specs means "retransmission"
>>>>>>> == tcp retransmissions. If so then it explains why the client uses the
>>>>>>> same XID.
>>>>>>>
>>>>>>
>>>>>> The questions you are asking come under the header "RPC lore" rather
>>>>>> than "RPC law". The use of XIDs as a basis for replay caching is not
>>>>>> speced out in any RFC. The closest thing we have in the form of
>>>>>> documentation is Ric Werme's presentation at the 1996 Connectathon:
>>>>>> http://nfsv4bat.org/Documents/ConnectAThon/1996/werme1.pdf
>>>>>>
>>>>>> Basically, those comments are there in the Linux code to denote issues
>>>>>> found when interoperability testing with server implementations that
>>>>>> are probably now long dead, but might still be in use somewhere.
>>>>>
>>>>> Would you consider changing this to use the same XID in case of
>>>>> redoing the operation due to the AUTH_ERROR?
>>>>>
>>>>> The issue it causes (one of the) server's implementation is of the
>>>>> following nature:
>>>>> 1. client sends an operation to the server. the server process the
>>>>> operation but before replying back to the server has an issue and
>>>>> resets the connection.
>>>>> 2. client re-establishes the connection and replays the RPC. the
>>>>> server now fails with the AUTH_ERROR.
>>>>> 3. client establishes a new connection and replays the same NFS
>>>>> operation over the new XID. The server cached the operation but since
>>>>> the last operation arrives with the new XID it won't find the entry in
>>>>> the cache. It's problematic when the operation is like REMOVE.
>>>>>
>>>>> I realize this is why nfs4.1 session were introduce to solve these
>>>>> non-idenpotency issues but using the same XID seems like the right
>>>>> idea since it is the same operation.
>>>>>
>>>>> If you don't have objections to the change, I can ask on the IETF list
>>>>> to see if any servers will object to such change.
>>>>
>>>> What you describe is a clear and obvious server bug. It is not a
>>>> client bug, and is not something that I'd find acceptable as
>>>> justification for changing the client code.
>>>>
>>>> The server should not be replying AUTH_ERROR and then processing the
>>>> RPC anyway. That's not behaviour that is sanctioned by the RPC spec.
>>>
>>> Perhaps I wasn't clear let me try again. In the first step, the server
>>> processes request and does not reply with an AUTH_ERROR but instead
>>> resets a connection but it has already populated it's replay cache.
>>> Client reestablishes connection resends exactly the same bytes but
>>> gets back an AUTH_ERROR (server does not process the operation). It's
>>> the recovery from this error that's in question.
>>>
>>
>> Hi Olga,
>>
>> I understood what you said, but you cannot have multiple replies to
>> the same RPC call. It doesn't matter if it was a replay, if the server
>> replies AUTH_ERROR, then it is saying "I'm not executing this".
>
> But "this" could have already been executed.

Are you saying that receiving a retransmitted XID should trump
authentication problems with the message? I haven't seen anything
about something like in the spec. I would think authentication error
should be generated first. I'd like to understand where your
objections about "multiple replies to the same RPC call" comes from
with respect to the client. The client still can match the calls to
the replies as per spec suggestions and the client doesn't use it for
anything else.

>
>>
>> Cheers
>> Trond

2016-02-05 23:19:07

by Trond Myklebust

[permalink] [raw]

Subject: Re: Question about XID use in sunrpc

On Fri, Feb 5, 2016 at 5:37 PM, Olga Kornievskaia <[email protected]> wrote:
> On Fri, Feb 5, 2016 at 4:38 PM, Olga Kornievskaia <[email protected]> wrote:
>> On Fri, Feb 5, 2016 at 4:08 PM, Trond Myklebust
>> <[email protected]> wrote:
>>> On Fri, Feb 5, 2016 at 2:03 PM, Olga Kornievskaia <[email protected]> wrote:
>>>> On Fri, Feb 5, 2016 at 1:31 PM, Trond Myklebust
>>>> <[email protected]> wrote:
>>>>> On Fri, Feb 5, 2016 at 12:01 PM, Olga Kornievskaia <[email protected]> wrote:
>>>>>> On Fri, Feb 5, 2016 at 11:44 AM, Trond Myklebust
>>>>>> <[email protected]> wrote:
>>>>>>> On Fri, Feb 5, 2016 at 10:37 AM, Olga Kornievskaia <[email protected]> wrote:
>>>>>>>> I have a question regarding the implementation of sunrpc use of XID
>>>>>>>> when the client receives an AUTH_ERROR. The code (clnt.c line 1933)
>>>>>>>> explicitly comments that a new XID should be acquired and releases the
>>>>>>>> currently rpc task (and gets a new one). Why is that? Since the
>>>>>>>> operation is "replayed" but with the new credentials, why shouldn't
>>>>>>>> the same XID be used?
>>>>>>>>
>>>>>>>> The RPC RFC says that XID is used by the server to detect
>>>>>>>> retransmissions. It's not clear if in the specs means "retransmission"
>>>>>>>> == tcp retransmissions. If so then it explains why the client uses the
>>>>>>>> same XID.
>>>>>>>>
>>>>>>>
>>>>>>> The questions you are asking come under the header "RPC lore" rather
>>>>>>> than "RPC law". The use of XIDs as a basis for replay caching is not
>>>>>>> speced out in any RFC. The closest thing we have in the form of
>>>>>>> documentation is Ric Werme's presentation at the 1996 Connectathon:
>>>>>>> http://nfsv4bat.org/Documents/ConnectAThon/1996/werme1.pdf
>>>>>>>
>>>>>>> Basically, those comments are there in the Linux code to denote issues
>>>>>>> found when interoperability testing with server implementations that
>>>>>>> are probably now long dead, but might still be in use somewhere.
>>>>>>
>>>>>> Would you consider changing this to use the same XID in case of
>>>>>> redoing the operation due to the AUTH_ERROR?
>>>>>>
>>>>>> The issue it causes (one of the) server's implementation is of the
>>>>>> following nature:
>>>>>> 1. client sends an operation to the server. the server process the
>>>>>> operation but before replying back to the server has an issue and
>>>>>> resets the connection.
>>>>>> 2. client re-establishes the connection and replays the RPC. the
>>>>>> server now fails with the AUTH_ERROR.
>>>>>> 3. client establishes a new connection and replays the same NFS
>>>>>> operation over the new XID. The server cached the operation but since
>>>>>> the last operation arrives with the new XID it won't find the entry in
>>>>>> the cache. It's problematic when the operation is like REMOVE.
>>>>>>
>>>>>> I realize this is why nfs4.1 session were introduce to solve these
>>>>>> non-idenpotency issues but using the same XID seems like the right
>>>>>> idea since it is the same operation.
>>>>>>
>>>>>> If you don't have objections to the change, I can ask on the IETF list
>>>>>> to see if any servers will object to such change.
>>>>>
>>>>> What you describe is a clear and obvious server bug. It is not a
>>>>> client bug, and is not something that I'd find acceptable as
>>>>> justification for changing the client code.
>>>>>
>>>>> The server should not be replying AUTH_ERROR and then processing the
>>>>> RPC anyway. That's not behaviour that is sanctioned by the RPC spec.
>>>>
>>>> Perhaps I wasn't clear let me try again. In the first step, the server
>>>> processes request and does not reply with an AUTH_ERROR but instead
>>>> resets a connection but it has already populated it's replay cache.
>>>> Client reestablishes connection resends exactly the same bytes but
>>>> gets back an AUTH_ERROR (server does not process the operation). It's
>>>> the recovery from this error that's in question.
>>>>
>>>
>>> Hi Olga,
>>>
>>> I understood what you said, but you cannot have multiple replies to
>>> the same RPC call. It doesn't matter if it was a replay, if the server
>>> replies AUTH_ERROR, then it is saying "I'm not executing this".
>>
>> But "this" could have already been executed.
>
> Are you saying that receiving a retransmitted XID should trump
> authentication problems with the message? I haven't seen anything
> about something like in the spec. I would think authentication error
> should be generated first. I'd like to understand where your
> objections about "multiple replies to the same RPC call" comes from
> with respect to the client. The client still can match the calls to
> the replies as per spec suggestions and the client doesn't use it for
> anything else.

I'm saying that if your server replies AUTH_ERROR to a replayed RPC
call, which is using the exact same credential that it used in the
original RPC call, and which did not receive AUTH_ERROR, then it will
break all the Linux clients out there that have been doing the exact
same thing for more than 10 years.