Received: by 2002:a05:6a10:f347:0:0:0:0 with SMTP id d7csp540568pxu; Wed, 25 Nov 2020 09:18:10 -0800 (PST) X-Google-Smtp-Source: ABdhPJwRE+8Bwdir8Xi8dtxnvV/zbuXyURz6xhkrhkZgqXooKIL0AgPeU4zWBgx4ywZUQj1413Wo X-Received: by 2002:a50:fd16:: with SMTP id i22mr4321531eds.147.1606324689825; Wed, 25 Nov 2020 09:18:09 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1606324689; cv=none; d=google.com; s=arc-20160816; b=fev1cALjnOsZoAnJ529iiz8WRZ8mSr43UthLMsIuMuncUJDTnUMil+D/eD8/kLSJ5N akUBNOS5i5JQfF4mX78s1klAgStoRo/0MRqR2KKtlx8C0oPqM1NDkRLpkdibAfllgqMy KxIyPhe6hmJ1L9seMafXHRLvhm9IUjAEGrrGCOdMdpX95bI3Pemf35Wtw3HHlQz55yWM hcOt0bpVgT5kI9pre3dXsjKS0PkndDZXHZcByiWD/dgW1marHdtbTQc7HFHLLyLK1iLj IBTbBaImzQObV9UABYjGuMyw0cFdRa/h3wItS1eM5ic3eSSbxwGBm74smcnVFTjK9Lam Uywg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:thread-index:thread-topic :content-transfer-encoding:mime-version:subject:references :in-reply-to:message-id:cc:to:from:date; bh=PcPfHsC6RgWBAY7HegsC9FfDDSRXEIvzQx2gW6MJ6GU=; b=uyZwUwPukysMKb4M2JZ9zWjDg81wRCse36WGwIBc3qGwEQKrn3UwkYFhZQfEDXC1KE Ku13il79Xyy8nZRYYl6F74JRIPQAtVfTQpo6JLAaiZPQA+yviLl8Su35qqTIDAX5k46a nG3C0voRVPX7r07EJdpy2or7eTyFArld+gvOmSVY+E8/ILkLFIVS6fIS8/Za31wrHsgR N4HwphAYxDLJ8Ke07LY0s5Yic6TplM21OARDElVQvHH7RW8GYwmEUYbTSyDeJKqSbkCI /7GujZRqfThW5isx9kmZJFKAbFmX0/yV1U7iGIdSFtgg3WBPx1zL8xUbJTl/u8tK6eEp h3MA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-nfs-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id m17si1607656ejb.268.2020.11.25.09.17.37; Wed, 25 Nov 2020 09:18:09 -0800 (PST) Received-SPF: pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-nfs-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730445AbgKYROy (ORCPT + 99 others); Wed, 25 Nov 2020 12:14:54 -0500 Received: from natter.dneg.com ([193.203.89.68]:40520 "EHLO natter.dneg.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730196AbgKYROx (ORCPT ); Wed, 25 Nov 2020 12:14:53 -0500 Received: from localhost (localhost [127.0.0.1]) by natter.dneg.com (Postfix) with ESMTP id 1F49A39616D; Wed, 25 Nov 2020 17:14:52 +0000 (GMT) X-Virus-Scanned: amavisd-new at mx-dneg Received: from natter.dneg.com ([127.0.0.1]) by localhost (natter.dneg.com [127.0.0.1]) (amavisd-new, port 10024) with LMTP id KOu2BXi-Eo2H; Wed, 25 Nov 2020 17:14:52 +0000 (GMT) Received: from zrozimbrai.dneg.com (zrozimbrai.dneg.com [10.11.20.12]) by natter.dneg.com (Postfix) with ESMTPS id F3F8A39616B; Wed, 25 Nov 2020 17:14:51 +0000 (GMT) Received: from localhost (localhost [127.0.0.1]) by zrozimbrai.dneg.com (Postfix) with ESMTP id E0D688237816; Wed, 25 Nov 2020 17:14:51 +0000 (GMT) Received: from zrozimbrai.dneg.com ([127.0.0.1]) by localhost (zrozimbrai.dneg.com [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id kWBnEEtUyp-l; Wed, 25 Nov 2020 17:14:51 +0000 (GMT) Received: from localhost (localhost [127.0.0.1]) by zrozimbrai.dneg.com (Postfix) with ESMTP id C1074826D259; Wed, 25 Nov 2020 17:14:51 +0000 (GMT) X-Virus-Scanned: amavisd-new at zimbra-dneg Received: from zrozimbrai.dneg.com ([127.0.0.1]) by localhost (zrozimbrai.dneg.com [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id J1MMs1WOxq3A; Wed, 25 Nov 2020 17:14:51 +0000 (GMT) Received: from zrozimbra1.dneg.com (zrozimbra1.dneg.com [10.11.16.16]) by zrozimbrai.dneg.com (Postfix) with ESMTP id A144C8237816; Wed, 25 Nov 2020 17:14:51 +0000 (GMT) Date: Wed, 25 Nov 2020 17:14:51 +0000 (GMT) From: Daire Byrne To: bfields Cc: Trond Myklebust , linux-cachefs , linux-nfs Message-ID: <932244432.93596532.1606324491501.JavaMail.zimbra@dneg.com> In-Reply-To: <20201124211522.GC7173@fieldses.org> References: <943482310.31162206.1599499860595.JavaMail.zimbra@dneg.com> <1188023047.38703514.1600272094778.JavaMail.zimbra@dneg.com> <279389889.68934777.1603124383614.JavaMail.zimbra@dneg.com> <635679406.70384074.1603272832846.JavaMail.zimbra@dneg.com> <20201109160256.GB11144@fieldses.org> <1744768451.86186596.1605186084252.JavaMail.zimbra@dneg.com> <1055884313.92996091.1606250106656.JavaMail.zimbra@dneg.com> <20201124211522.GC7173@fieldses.org> Subject: Re: Adventures in NFS re-exporting MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Mailer: Zimbra 8.7.11_GA_1854 (ZimbraWebClient - GC78 (Linux)/8.7.11_GA_1854) Thread-Topic: Adventures in NFS re-exporting Thread-Index: jRr/eG2N4Ts+gxyP7atxT3VAnkSniQ== Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org ----- On 24 Nov, 2020, at 21:15, bfields bfields@fieldses.org wrote: > On Tue, Nov 24, 2020 at 08:35:06PM +0000, Daire Byrne wrote: >> Sometimes I have seen clusters of 16 GETATTRs for the same file on the >> wire with nothing else inbetween. So if the re-export server is the >> only "client" writing these files to the originating server, why do we >> need to do so many repeat GETATTR calls when using nconnect>1? And why >> are the COMMIT calls required when the writes are coming via nfsd but >> not from userspace on the re-export server? Is that due to some sort >> of memory pressure or locking? >> >> I picked the NFSv3 originating server case because my head starts to >> hurt tracking the equivalent packets, stateids and compound calls with >> NFSv4. But I think it's mostly the same for NFSv4. The writes through >> the re-export server lead to lots of COMMITs and (double) GETATTRs but >> using nconnect>1 at least doesn't seem to make it any worse like it >> does for NFSv3. >> >> But maybe you actually want all the extra COMMITs to help better >> guarantee your writes when putting a re-export server in the way? >> Perhaps all of this is by design... > > Maybe that's close-to-open combined with the server's tendency to > open/close on every IO operation? (Though the file cache should have > helped with that, I thought; as would using version >=4.0 on the final > client.) > > Might be interesting to know whether the nocto mount option makes a > difference. (So, add "nocto" to the mount options for the NFS mount > that you're re-exporting on the re-export server.) The nocto didn't really seem to help but the NFSv4.2 re-export of a NFSv3 server did. I also realised I had done some tests with nconnect on the re-export server's client and consequently mixed things up a bit in my head. So I did some more tests and tried to make the results clear and simple. In all cases I'm just writing a big file with "dd" and capturing the traffic between the originating server and re-export server. First off, writing direct to the originating server mount on the re-export server from userspace shows the ideal behaviour for all combinations: originating server <- (vers=X,actimeo=1800,nconnect=X) <- reexport server writing = WRITE,WRITE .... repeating (good!) Then re-exporting a NFSv4.2 server: originating server <- (vers=4.2) <- reexport server - (vers=3) <- client writing = GETATTR,COMMIT,WRITE .... repeating originating server <- (vers=4.2) <- reexport server - (vers=4.2) <- client writing = GETATTR,WRITE .... repeating And re-exporting a NFSv3 server: originating server <- (vers=3) <- reexport server - (vers=4.2) <- client writing = WRITE,WRITE .... repeating (good!) originating server <- (vers=3) <- reexport server - (vers=3) <- client writing = WRITE,COMMIT .... repeating So of all the combinations, a NFSv4.2 re-export of an NFSv3 server is the only one that matches the "ideal" case where we WRITE continuously without all the extra chatter. And for completeness, taking that "good" case and making it bad with nconnect: originating server <- (vers=3,nconnect=16) <- reexport server - (vers=4.2) <- client writing = WRITE,WRITE .... repeating (good!) originating server <- (vers=3) <- reexport server <- (vers=4.2,nconnect=16) <- client writing = WRITE,COMMIT,GETATTR .... randomly repeating So using nconnect on the re-export's client causes lots more metadata ops. There are reasons for doing that for increasing throughput but it could be that the gain is offset by the extra metadata roundtrips. Similarly, we have mostly been using a NFSv4.2 re-export of a NFSV4.2 server over the WAN because of reduced metadata ops for reading, but it looks like we incur extra metadata ops for writing. Side note: it's hard to decode nconnect enabled packet captures because wireshark doesn't seem to like those extra port streams. > By the way I made a start at a list of issues at > > http://wiki.linux-nfs.org/wiki/index.php/NFS_re-export > > but I was a little vague on which of your issues remained and didn't > take much time over it. Cool. I'm glad there are some notes for others to reference - this thread is now too long for any human to read. The only things I'd consider adding are: * re-export of NFSv4.0 filesystem can give input/output errors when the cache is dropped * a weird interaction with nfs client readahead such that all reads are limited to the default 128k unless you manually increase it to match rsize. The only other thing I can offer are tips & tricks for doing this kind of thing over the WAN (vfs_cache_pressure, actimeo, nocto) and using fscache. Daire