Received: by 2002:a25:8b91:0:0:0:0:0 with SMTP id j17csp4403173ybl; Mon, 13 Jan 2020 13:06:14 -0800 (PST) X-Google-Smtp-Source: APXvYqzT97Bo/LeHoilRkMVN7aXou3s5SnSADiD4g7yacWG8c5ILAb8kQKeDD4etQEBQ6VY+i8WK X-Received: by 2002:a05:6830:1c81:: with SMTP id v1mr14920912otf.83.1578949574537; Mon, 13 Jan 2020 13:06:14 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1578949574; cv=none; d=google.com; s=arc-20160816; b=UcXWHltnUHlRjLhgHM2bORUPm5FaM/W5soqXOjRMP3EN5jNGWpfTXNitYE0RW0NQi0 WGIK7E7oVxw5XHCONadCDIa2Ynb00+8elSh0rJDeSr+QnuHRMTX4jGP1evco0DeuVHx9 lHrYuYdkFXa6WGvJLyecAmGhsnSID5zjMcwKkQvd6v84WL/lEnYVpQGB88hVf6r8ypLW dJKwTkrfw+bnTNTPtFQJFXrsBQksOzsZ1fA3ZW6tEunn31RR08rLk8bO08szCyKSU6Wl Ra2ohiPOEfkATg6emvOvdNpOKJ3qBKOi209pT94QhwvGKMAHfB9Mhf4x0G4pglBqbZLd Ro9Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=ojM44Hsssl2JIRfk3S91RuVOjzoQEMXweBqgWvm8uo0=; b=SVqpv2uML505LEgkiAFwT5WZ5t+eXZTYWACgxEALzVU4IZqyIFk3srpItbdxen7Uup byuH6RAGctnKlgCBRbMD16YS8fvBawE2Rb3gT5r9YV+mPEqCyK5JvsZWZxJsYxmvLUWx keyeeVV9ePjstc2H4kfSsJEsoRDb9JLdo79nKvrvBYmMP8tLb/4id5XORy/EllkXRIBf QbZD9tTH2vKjC+/c2PZtyn+pSh4eUCinbMuOdIX/1Y+ORMMapFWlwGtGWOFn5EU0Xudp 0JLditkkXpB7rAhArUwWPx+esF4HNDXzEkoc03OUdNYO3Wjw1jly4R0Ozwt3BNqs4hz8 1tfw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@umich.edu header.s=google-2016-06-03 header.b=RRtKu9bw; spf=pass (google.com: best guess record for domain of linux-nfs-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-nfs-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=umich.edu Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id w81si6460724oig.107.2020.01.13.13.05.52; Mon, 13 Jan 2020 13:06:14 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-nfs-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@umich.edu header.s=google-2016-06-03 header.b=RRtKu9bw; spf=pass (google.com: best guess record for domain of linux-nfs-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-nfs-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=umich.edu Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726488AbgAMVFW (ORCPT + 99 others); Mon, 13 Jan 2020 16:05:22 -0500 Received: from mail-vs1-f46.google.com ([209.85.217.46]:40766 "EHLO mail-vs1-f46.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726086AbgAMVFV (ORCPT ); Mon, 13 Jan 2020 16:05:21 -0500 Received: by mail-vs1-f46.google.com with SMTP id g23so6821444vsr.7 for ; Mon, 13 Jan 2020 13:05:20 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=umich.edu; s=google-2016-06-03; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=ojM44Hsssl2JIRfk3S91RuVOjzoQEMXweBqgWvm8uo0=; b=RRtKu9bwmQ1MzKfFfr3l55xzTKaKnduYBmk/fAkaSzz0ObU15ntUe884mWHAjT7QN+ eUn2r7lfDTb6uWMT6Ha/Avqh5EVSaVODfHA2AIcxFonDyuT2LsZYCkyxPbB6reAgDvsg xNI2CEjH9sUCdt8savyuczdsV6VPjubIuqbmN6pzANfjKmX/3a8+qVzQ294P0e8LR5AA B5oWcMGF9St04M5cze/YcDdBNhaRgG5dro/3rGg3A3IwgQqg2UQbnrIZTMPI9YMbc507 yrIAHAw0b+fCpQX/WOUXREgDMKE3L9rpKXKCWsSGTWJToMKdePa1QHNEqZpL613432Gw mFqw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=ojM44Hsssl2JIRfk3S91RuVOjzoQEMXweBqgWvm8uo0=; b=Wi57vCsXVOFZQM6qoFMWUJpq5uuC43qY82G2PJ3qulqN+T6143b9+P3Uja+4qIEN0w Nn1heAq+BDVQxUaMvU47HLWkA2xF/yHkL4bV19MYwqo0aKBsocJ5Foi2C/CdfmIuoWnC ujwM3mVGDQ5roFgPQmfAvgdDGE7WeFbuBRIkSAJ1ixd0gz3Ro1zPTMZwwIy2H1DLbyr+ Gz0vcGlqsQF0nR/YIh9F0SZT+zYQUUre98rhvlTdMACAuAYPnhxsBNlE+aXsxARVJ9sX F4WUaqf/QtxF2Z70L5ZWd1i5kLEmul2rxOGqaovbD7GDg9Rtya52JII2nODB1+I5hLRU AaQQ== X-Gm-Message-State: APjAAAXeib6ceC1u0pArBJkWOGUwS9Uq/tCvjQQ2h5QgCXDwdGVyELrC sFs5V/iOoOw5isGhKhnWuLErDS4hy/nof6d3TMAykA== X-Received: by 2002:a05:6102:7a4:: with SMTP id x4mr6791433vsg.85.1578949519798; Mon, 13 Jan 2020 13:05:19 -0800 (PST) MIME-Version: 1.0 References: <3b89b01911b5149533e45478fdcec941a4f915ba.camel@hammerspace.com> <185a1505f75a36d852df7a9351d6bb776103c506.camel@hammerspace.com> <1538116fa0b35674da7242e9fadf19ddeca5e2c2.camel@hammerspace.com> In-Reply-To: <1538116fa0b35674da7242e9fadf19ddeca5e2c2.camel@hammerspace.com> From: Olga Kornievskaia Date: Mon, 13 Jan 2020 16:05:07 -0500 Message-ID: Subject: Re: interrupted rpcs problem To: Trond Myklebust Cc: "linux-nfs@vger.kernel.org" Content-Type: text/plain; charset="UTF-8" Sender: linux-nfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org On Mon, Jan 13, 2020 at 1:24 PM Trond Myklebust wrote: > > On Mon, 2020-01-13 at 13:09 -0500, Olga Kornievskaia wrote: > > On Mon, Jan 13, 2020 at 11:49 AM Trond Myklebust > > wrote: > > > On Mon, 2020-01-13 at 11:08 -0500, Olga Kornievskaia wrote: > > > > On Fri, Jan 10, 2020 at 4:03 PM Trond Myklebust < > > > > trondmy@hammerspace.com> wrote: > > > > > On Fri, 2020-01-10 at 14:29 -0500, Olga Kornievskaia wrote: > > > > > > Hi folks, > > > > > > > > > > > > We are having an issue with an interrupted RPCs again. Here's > > > > > > what I > > > > > > see when xfstests were ctrl-c-ed. > > > > > > > > > > > > frame 332 SETATTR call slot=0 seqid=0x000013ca (I'm assuming > > > > > > this > > > > > > is > > > > > > interrupted and released) > > > > > > frame 333 CLOSE call slot=0 seqid=0x000013cb (only way the > > > > > > slot > > > > > > could > > > > > > be free before the reply if it was interrupted, right? > > > > > > Otherwise > > > > > > we > > > > > > should never have the slot used by more than one outstanding > > > > > > RPC) > > > > > > frame 334 reply to 333 with SEQ_MIS_ORDERED (I'm assuming > > > > > > server > > > > > > received frame 333 before 332) > > > > > > frame 336 CLOSE call slot=0 seqid=0x000013ca (??? why did we > > > > > > decremented it. I mean I know why it's in the current code :- > > > > > > / ) > > > > > > frame 337 reply to 336 SEQUENCE with ERR_DELAY > > > > > > frame 339 reply to 332 SETATTR which nobody is waiting for > > > > > > frame 543 CLOSE call slot=0 seqid=0x000013ca (retry after > > > > > > waiting > > > > > > for > > > > > > err_delay) > > > > > > frame 544 reply to 543 with SETATTR (out of the cache). > > > > > > > > > > > > What this leads to is: file is never closed on the server. > > > > > > Can't > > > > > > remove it. Unmount fails with CLID_BUSY. > > > > > > > > > > > > I believe that's the result of commit > > > > > > 3453d5708b33efe76f40eca1c0ed60923094b971. > > > > > > We used to have code that bumped the sequence up when the > > > > > > slot > > > > > > was > > > > > > interrupted but after the commit "NFSv4.1: Avoid false > > > > > > retries > > > > > > when > > > > > > RPC calls are interrupted". > > > > > > > > > > > > Commit has this "The obvious fix is to bump the sequence > > > > > > number > > > > > > pre-emptively if an > > > > > > RPC call is interrupted, but in order to deal with the > > > > > > corner > > > > > > cases > > > > > > where the interrupted call is not actually received and > > > > > > processed > > > > > > by > > > > > > the server, we need to interpret the error > > > > > > NFS4ERR_SEQ_MISORDERED > > > > > > as a sign that we need to either wait or locate a correct > > > > > > sequence > > > > > > number that lies between the value we sent, and the last > > > > > > value > > > > > > that > > > > > > was acked by a SEQUENCE call on that slot." > > > > > > > > > > > > If we can't no longer just bump the sequence up, I don't > > > > > > think > > > > > > the > > > > > > correct action is to automatically bump it down (as per > > > > > > example > > > > > > here)? > > > > > > The commit doesn't describe the corner case where it was > > > > > > necessary to > > > > > > bump the sequence up. I wonder if we can return the knowledge > > > > > > of > > > > > > the > > > > > > interrupted slot and make a decision based on that as well as > > > > > > whatever > > > > > > the other corner case is. > > > > > > > > > > > > I guess what I'm getting is, can somebody (Trond) provide the > > > > > > info > > > > > > for > > > > > > the corner case for this that patch was created. I can see if > > > > > > I > > > > > > can > > > > > > fix the "common" case which is now broken and not break the > > > > > > corner > > > > > > case.... > > > > > > > > > > > > > > > > There is no pure client side solution for this problem. > > > > > > > > > > The change was made because if you have multiple interruptions > > > > > of > > > > > the > > > > > RPC call, then the client has to somehow figure out what the > > > > > correct > > > > > slot number is. If it starts low, and then goes high, and the > > > > > server is > > > > > not caching the arguments for the RPC call that is in the > > > > > session > > > > > cache, then we will _always_ hit this bug because we will > > > > > always > > > > > hit > > > > > the replay of the last entry. > > > > > > > > > > At least if we start high, and iterate by low, then we reduce > > > > > the > > > > > problem to being a race with the processing of the interrupted > > > > > request > > > > > as it is in this case. > > > > > > > > > > However, as I said, the real solution here has to involve the > > > > > server. > > > > > > > > Ok I see your point that if the server cached the arguments, then > > > > the > > > > server would tell that 2nd rpc using the same slot+seqid has > > > > different > > > > args and would not use the replay cache. > > > > > > > > However, I wonder if the client can do better. Can't we be more > > > > aware > > > > of when we are interrupting the rpc? For instance, if we are > > > > interrupted after we started to wait on the RPC, doesn't it mean > > > > the > > > > rpc is sent on the network and since network is reliable then > > > > server > > > > must have consumed the seqid for that slot (in this case > > > > increment > > > > seqid)? That's the case that's failing now. > > > > > > > > > > "Reliable transport" does not mean that a client knows what got > > > received and processed by the server and what didn't. All the > > > client > > > knows is that if the connection is still up, then the TCP layer > > > will > > > keep retrying transmission of the request. There are plenty of > > > error > > > scenarios where the client gets no information back as to whether > > > or > > > not the data was received by the server (e.g. due to lost ACKs). > > > > > > Furthermore, if a RPC call is interrupted on the client, either due > > > to > > > a timeout or a signal, > > > > What timeout are you referring to here since 4.1 rcp can't timeout. I > > think it only leaves a signal. > > If you use 'soft' or 'softerr' mount options, then NFSv4.1 will time > out when the server is being unresponsive. That behaviour is different > to the behaviour under a signal, but has the same effect of > interrupting the RPC call without us being able to know if the server > received the data. > > > > then it almost always ends up breaking the > > > connection in order to avoid corruption of the data stream (by > > > interrupting the transmission before the entire RPC call has been > > > sent). You generally have to be lucky to see the timeout/signal > > > occur > > > only when all the RPC calls being cancelled have exactly fit into > > > the > > > socket buffer. > > > > Wouldn't a retransmission (due to a connection reset for whatever > > reason) be different and doesn't involve reprocessing of the slot. > > I'm not talking about retransmissions here. I'm talking only about > NFSv4.x RPC calls that suffer a fatal interruption (i.e. no > retransmission). > > > > Finally, just because the server's TCP layer ACKed receipt of the > > > RPC > > > call data, that does not mean that it will process that call. The > > > connection could break before the call is read out of the receiving > > > socket, or the server may later decide to drop it on the floor and > > > break the connection. > > > > > > IOW: the RPC protocol here is not that "reliable transport implies > > > processing is guaranteed". It is rather that "connection is still > > > up > > > implies processing may eventually occur". > > > > "eventually occur" means that its process of the rpc is guaranteed > > "in > > time". Again unless the client is broken, we can't have more than an > > interrupted rpc (that has nothing waiting) and the next rpc (both of > > which will be re-transmitted if connection is dropped) going to the > > server. > > > > Can we distinguish between interrupted due to re-transmission and > > interrupted due to ctrl-c of the thread? If we can't, then I'll stop > > arguing that client can do better. > > There is no "interrupted due to re-transmission" case. We only > retransmit NFSv4 requests if the TCP connection breaks. > > As far as I'm concerned, this discussion is only about interruptions > that cause the RPC call to be abandoned (i.e. fatal timeouts and > signals). > > > But right now we are left in a bad state. Client leaves opened state > > on the server and will not allow for files to be deleted. I think in > > case the "next rpc" is the write that will never be completed it > > would > > leave the machine in a hung state. I just don't see how can you > > justify that having the current code is any better than having the > > solution that was there before. > > That's a general problem with allowing interruptions that is largely > orthogonal to the question of which strategy we choose when > resynchronising the slot numbers after an interruption has occurred. > I'm re-reading the spec and in section 2.10.6.2 we have "A requester MUST wait for a reply to a request before using the slot for another request". Are we even legally using the slot when we have an interrupted slot? > -- > Trond Myklebust > Linux NFS client maintainer, Hammerspace > trond.myklebust@hammerspace.com > >