Return-Path: Received: from mail-oi0-f44.google.com ([209.85.218.44]:33078 "EHLO mail-oi0-f44.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750724AbcBEXTH (ORCPT ); Fri, 5 Feb 2016 18:19:07 -0500 Received: by mail-oi0-f44.google.com with SMTP id j125so50769923oih.0 for ; Fri, 05 Feb 2016 15:19:06 -0800 (PST) MIME-Version: 1.0 In-Reply-To: References: Date: Fri, 5 Feb 2016 18:19:06 -0500 Message-ID: Subject: Re: Question about XID use in sunrpc From: Trond Myklebust To: Olga Kornievskaia Cc: linux-nfs Content-Type: text/plain; charset=UTF-8 Sender: linux-nfs-owner@vger.kernel.org List-ID: On Fri, Feb 5, 2016 at 5:37 PM, Olga Kornievskaia wrote: > On Fri, Feb 5, 2016 at 4:38 PM, Olga Kornievskaia wrote: >> On Fri, Feb 5, 2016 at 4:08 PM, Trond Myklebust >> wrote: >>> On Fri, Feb 5, 2016 at 2:03 PM, Olga Kornievskaia wrote: >>>> On Fri, Feb 5, 2016 at 1:31 PM, Trond Myklebust >>>> wrote: >>>>> On Fri, Feb 5, 2016 at 12:01 PM, Olga Kornievskaia wrote: >>>>>> On Fri, Feb 5, 2016 at 11:44 AM, Trond Myklebust >>>>>> wrote: >>>>>>> On Fri, Feb 5, 2016 at 10:37 AM, Olga Kornievskaia wrote: >>>>>>>> I have a question regarding the implementation of sunrpc use of XID >>>>>>>> when the client receives an AUTH_ERROR. The code (clnt.c line 1933) >>>>>>>> explicitly comments that a new XID should be acquired and releases the >>>>>>>> currently rpc task (and gets a new one). Why is that? Since the >>>>>>>> operation is "replayed" but with the new credentials, why shouldn't >>>>>>>> the same XID be used? >>>>>>>> >>>>>>>> The RPC RFC says that XID is used by the server to detect >>>>>>>> retransmissions. It's not clear if in the specs means "retransmission" >>>>>>>> == tcp retransmissions. If so then it explains why the client uses the >>>>>>>> same XID. >>>>>>>> >>>>>>> >>>>>>> The questions you are asking come under the header "RPC lore" rather >>>>>>> than "RPC law". The use of XIDs as a basis for replay caching is not >>>>>>> speced out in any RFC. The closest thing we have in the form of >>>>>>> documentation is Ric Werme's presentation at the 1996 Connectathon: >>>>>>> http://nfsv4bat.org/Documents/ConnectAThon/1996/werme1.pdf >>>>>>> >>>>>>> Basically, those comments are there in the Linux code to denote issues >>>>>>> found when interoperability testing with server implementations that >>>>>>> are probably now long dead, but might still be in use somewhere. >>>>>> >>>>>> Would you consider changing this to use the same XID in case of >>>>>> redoing the operation due to the AUTH_ERROR? >>>>>> >>>>>> The issue it causes (one of the) server's implementation is of the >>>>>> following nature: >>>>>> 1. client sends an operation to the server. the server process the >>>>>> operation but before replying back to the server has an issue and >>>>>> resets the connection. >>>>>> 2. client re-establishes the connection and replays the RPC. the >>>>>> server now fails with the AUTH_ERROR. >>>>>> 3. client establishes a new connection and replays the same NFS >>>>>> operation over the new XID. The server cached the operation but since >>>>>> the last operation arrives with the new XID it won't find the entry in >>>>>> the cache. It's problematic when the operation is like REMOVE. >>>>>> >>>>>> I realize this is why nfs4.1 session were introduce to solve these >>>>>> non-idenpotency issues but using the same XID seems like the right >>>>>> idea since it is the same operation. >>>>>> >>>>>> If you don't have objections to the change, I can ask on the IETF list >>>>>> to see if any servers will object to such change. >>>>> >>>>> What you describe is a clear and obvious server bug. It is not a >>>>> client bug, and is not something that I'd find acceptable as >>>>> justification for changing the client code. >>>>> >>>>> The server should not be replying AUTH_ERROR and then processing the >>>>> RPC anyway. That's not behaviour that is sanctioned by the RPC spec. >>>> >>>> Perhaps I wasn't clear let me try again. In the first step, the server >>>> processes request and does not reply with an AUTH_ERROR but instead >>>> resets a connection but it has already populated it's replay cache. >>>> Client reestablishes connection resends exactly the same bytes but >>>> gets back an AUTH_ERROR (server does not process the operation). It's >>>> the recovery from this error that's in question. >>>> >>> >>> Hi Olga, >>> >>> I understood what you said, but you cannot have multiple replies to >>> the same RPC call. It doesn't matter if it was a replay, if the server >>> replies AUTH_ERROR, then it is saying "I'm not executing this". >> >> But "this" could have already been executed. > > Are you saying that receiving a retransmitted XID should trump > authentication problems with the message? I haven't seen anything > about something like in the spec. I would think authentication error > should be generated first. I'd like to understand where your > objections about "multiple replies to the same RPC call" comes from > with respect to the client. The client still can match the calls to > the replies as per spec suggestions and the client doesn't use it for > anything else. I'm saying that if your server replies AUTH_ERROR to a replayed RPC call, which is using the exact same credential that it used in the original RPC call, and which did not receive AUTH_ERROR, then it will break all the Linux clients out there that have been doing the exact same thing for more than 10 years.