Received: by 2002:a25:ab43:0:0:0:0:0 with SMTP id u61csp1475049ybi; Thu, 30 May 2019 18:45:53 -0700 (PDT) X-Google-Smtp-Source: APXvYqwWHrvSBmKUffdpzeEFjVtluQ8i6bmYnuONmJjhYDo1v/v+TRgnLrNI82/trE0ePMun2opK X-Received: by 2002:a17:902:9a4c:: with SMTP id x12mr6079868plv.298.1559267153723; Thu, 30 May 2019 18:45:53 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1559267153; cv=none; d=google.com; s=arc-20160816; b=lDxUmutkFufP+TheFDymExC09blMuQcHZCJ0McBYmTjtDwjlsA0W1UXMJ2WGhBferG ddANMPnPZykvZPeDyaBuBO1Vzfzw74NgwCH8r3V9Yv8UVHhtmC5ISYz/KfkqsGRdfIo0 Q2Nit7pXA2P+5Sa5ZpBc2OLVohynt+tdvJGy1SHPNsaEsqkjyN7gPByaoUXQnsSJEp3c g6WWAw7Cmi/IMYlt05Fni7KeQaWfejzIUG24J8f2PrcI5niKXaaKKWAGn8+qjB9CFkBM TnDJ+jvRiG7GF6NgSQoWE2328CMNaJGCCEmmWeqbZCkxeC/NcHVKCe7rEZs4SyHSdIPn 2vpQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject; bh=ssAUErrpXP0m4pmq/UIttkdS+QSW7sCoHqF04LKLzsY=; b=vZc8TMyzEernxx/RC5xdCVY5Q03LFHW0QZo9q75Bh49xpDXaoJ1PwJBJK1fIk6sVd5 zE0xR8JNe7Jr0jgyLI6AHamLaEGRky+IdEW3NWKdsj/JGKIP1rMOrLDkD2zoggmG7nmk byxa9MSyvMChVykPlr8pkijNDbfBYqy4+OJcaJpF3IOFIL6x45eaC1DT0TBu3I2Zmfvq LWwsJQamI1YqT8WUuZuZezozBZxvk0jc1jGgd4vUfGJDf8W7MEXEJCCcAuc6iUs3r0tS snNeN0P5WdZSdtBYSrNnlEt7uHJZa/S6BNKoMmWSjsiJsvXFWlMjiAPYIwnEtKjde1dK /HlQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-nfs-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-nfs-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id l1si4200222plb.302.2019.05.30.18.45.29; Thu, 30 May 2019 18:45:53 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-nfs-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-nfs-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-nfs-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726666AbfEaBp1 (ORCPT + 99 others); Thu, 30 May 2019 21:45:27 -0400 Received: from p3plsmtpa06-08.prod.phx3.secureserver.net ([173.201.192.109]:49760 "EHLO p3plsmtpa06-08.prod.phx3.secureserver.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726658AbfEaBp1 (ORCPT ); Thu, 30 May 2019 21:45:27 -0400 Received: from [192.168.0.56] ([24.218.182.144]) by :SMTPAUTH: with ESMTPSA id WWbmhT95vmARFWWbmhBzxG; Thu, 30 May 2019 18:45:27 -0700 Subject: Re: [PATCH 0/9] Multiple network connections for a single NFS mount. To: Olga Kornievskaia Cc: NeilBrown , Chuck Lever , Schumaker Anna , Trond Myklebust , linux-nfs References: <155917564898.3988.6096672032831115016.stgit@noble.brown> <1df23ebc-ffe5-1a57-c40a-d5e9a45c8498@talpey.com> <9b64b9d9-b7cf-c818-28e2-58b3a821d39d@talpey.com> From: Tom Talpey Message-ID: Date: Thu, 30 May 2019 21:45:25 -0400 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:60.0) Gecko/20100101 Thunderbird/60.7.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit X-CMAE-Envelope: MS4wfBnZF4shmGHRRBnnAli2vFBY/pJ4I4ey89dPW3PAdvC/Bfe+kweVcJsjHJSuLvnJnDRTPmAp57RyCC47jSKJvFycj9Gjtn4c+7WuvqKEZQsaKi7lIMrV P2YB8/K4VTV3gMvCaNDxjvP8DSqRZooicXjCMZHSu9HmthLs42ugPSY+Nx2lQkff+tdvcEeUICAiUODgdCyBB5lc2G/CzZuNQ24M8XawTSf0CzN+s9YLBuCK 7NPODG5otMQMVLtt8kq6JA0A6ogyFsmp7TUJTXzqUdV+AoQrBKA9/zDwE71Kc5XN+TCfJM4NSnxd5fxXue9Zyj0zn7/ZTyhzoFFWecuu46I= Sender: linux-nfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org On 5/30/2019 2:41 PM, Olga Kornievskaia wrote: > On Thu, May 30, 2019 at 1:41 PM Tom Talpey wrote: >> >> On 5/30/2019 1:20 PM, Olga Kornievskaia wrote: >>> On Thu, May 30, 2019 at 1:05 PM Tom Talpey wrote: >>>> >>>> On 5/29/2019 8:41 PM, NeilBrown wrote: >>>>> I've also re-arrange the patches a bit, merged two, and remove the >>>>> restriction to TCP and NFSV4.x,x>=1. Discussions seemed to suggest >>>>> these restrictions were not needed, I can see no need. >>>> >>>> I believe the need is for the correctness of retries. Because NFSv2, >>>> NFSv3 and NFSv4.0 have no exactly-once semantics of their own, server >>>> duplicate request caches are important (although often imperfect). >>>> These caches use client XID's, source ports and addresses, sometimes >>>> in addition to other methods, to detect retry. Existing clients are >>>> careful to reconnect with the same source port, to ensure this. And >>>> existing servers won't change. >>> >>> Retries are already bound to the same connection so there shouldn't be >>> an issue of a retransmission coming from a different source port. >> >> So, there's no path redundancy? If any connection is lost and can't >> be reestablished, the requests on that connection will time out? > > For v3 and v4.0 in the current code base with a single connection, > when it goes down, you are out of luck. When we have multiple > connections and would like the benefit of using them but not > sacrifices replay cache correctness, it's a small price to restrict > the re-transmissions and suffer the consequence of not being able to > do an operation during network issues. I agree that the corruption resulting from a blown cache lookup would be bad. But I'm also saying that users will be frustrated when random operations time out, even when new ones work. Also, I think it may lead to application issues. >> I think a common configuration will be two NICs and two network paths, > > Are you talking about session trunking here? No, not necessarily. Certianly not when doing what you propose over NFSv3. > Why do you think two NICs would be a common configuration. I have > performance numbers that demonstrate performance improvement for a > single NIC case. I would say a single NIC with a high speed networks > (25/40G) would be a common configuration. They're both common! And sure, it's good for a single NIC because of RSS (receive side scaling). The multiple connections spread interrupts over several cores. The same as would happen with multiple NICs. > >> a so-called shotgun. Admins will be quite frustrated to discover it >> gives no additional robustness, and perhaps even less. >> >> Why not simply restrict this to the fully-correct, fully-functional >> NFSv4.1+ scenario, and not try to paper over the shortcomings? > > I think mainly because customers are still using v3 but want to > improve performance. I'd love for everybody to switch to 4.1 but > that's not happening. Yeah, you and me both. But trying to "fix" NFSv3 with this is not going to move the world forward, and I predict will cost many woeful days ahead when it fails to work transparently. Tom. >>>> Multiple connections will result in multiple source ports, and possibly >>>> multiple source addresses, meaning retried client requests may be >>>> accepted as new, rather than having any chance of being recognized as >>>> retries. >>>> >>>> NFSv4.1+ don't have this issue, but removing the restrictions would >>>> seem to break the downlevel mounts. >>>> >>>> Tom. >>>> >>> >>> > >