Received: by 2002:a05:6a10:16a7:0:0:0:0 with SMTP id gp39csp730201pxb; Thu, 12 Nov 2020 15:09:48 -0800 (PST) X-Google-Smtp-Source: ABdhPJwCtUNnHV603sHqhEjrRP7yGBnCMBVY4xjvowVX13vVSVb8X1R7TQcLuJhBahnEuLDsgfuP X-Received: by 2002:a17:906:1381:: with SMTP id f1mr1677689ejc.87.1605222588616; Thu, 12 Nov 2020 15:09:48 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1605222588; cv=none; d=google.com; s=arc-20160816; b=GT0pDSXG8HIH4o8+bDRnTDm8ZSzhRzqQxuqukQF0zIbHmHy+uatEPasG4PDxxV9q5Y gWf3K02ecJLH3qPgI+mqGCwmB1s7+eNp96OuZNwxZitgZk2v+SKBToPio+zCyAxZ0hT6 mEFetMHCXyFiEIaAlbQPu6Zvdxxs/YnLaciPMZlCS43V2JeFAtZKnlZpHZMwFkXLXV2r VbzptEm09DfN5Q7xJ50NARH78HRwFXOYYkOgoysFjOUCQ1rw13jWhia279SHXs4YIBNl 6jTBeU9Fuh6RdKqFYXEla0AlxsB6ur4zixRd0dnd+1FWmqh9/j1jgEnxLFtlivOgzBii hJ9w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:thread-index:thread-topic :content-transfer-encoding:mime-version:subject:references :in-reply-to:message-id:cc:to:from:date; bh=+UsUcMjWvWRAsGh6QVApdSAcr30N0eO8s2DOwkLx6Xg=; b=NEwph7dmejcsbbXyq9s5Yf+OhI/LQnE1BjNQMzDrLuCl9dcL9jBLLhoJsOkb2HfFbg 4/5+gtovBVpVkWIYsXlaR5vE4iWcbqzt2nPOBv3aCF6UyDR0i77Dk0pZIXXcaxR3A6Fd IsNUtEmYVoxVszEFlVv1vOEPSS9JHH8vlcZd9nltUhAlW4EEpJOI9KML7r/slEm5fzWo 4V0XtFBGJuJbShiGa/yBdgMMBZ9e7dQpRDNz534mKNYgaAI26zI2HbO/gTf+Jd1UOrwF pQgTsHypoFwDHBkQ9EsosAJi0B7tLjtt1D4eYDygzxv+ewJJAZ0LLdVv66y8WT2nv2gC /oQQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-nfs-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id i8si2466232edt.264.2020.11.12.15.09.15; Thu, 12 Nov 2020 15:09:48 -0800 (PST) Received-SPF: pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-nfs-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726306AbgKLXF7 (ORCPT + 99 others); Thu, 12 Nov 2020 18:05:59 -0500 Received: from natter.dneg.com ([193.203.89.68]:53064 "EHLO natter.dneg.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725929AbgKLXF6 (ORCPT ); Thu, 12 Nov 2020 18:05:58 -0500 Received: from localhost (localhost [127.0.0.1]) by natter.dneg.com (Postfix) with ESMTP id 9321C6BE8442; Thu, 12 Nov 2020 23:05:57 +0000 (GMT) X-Virus-Scanned: amavisd-new at mx-dneg Received: from natter.dneg.com ([127.0.0.1]) by localhost (natter.dneg.com [127.0.0.1]) (amavisd-new, port 10024) with LMTP id 7gpH7tHleWOS; Thu, 12 Nov 2020 23:05:57 +0000 (GMT) Received: from zrozimbrai.dneg.com (zrozimbrai.dneg.com [10.11.20.12]) by natter.dneg.com (Postfix) with ESMTPS id 6FD3F6BE5A30; Thu, 12 Nov 2020 23:05:57 +0000 (GMT) Received: from localhost (localhost [127.0.0.1]) by zrozimbrai.dneg.com (Postfix) with ESMTP id 60AE58264C51; Thu, 12 Nov 2020 23:05:57 +0000 (GMT) Received: from zrozimbrai.dneg.com ([127.0.0.1]) by localhost (zrozimbrai.dneg.com [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id zA8MMN2YK7h9; Thu, 12 Nov 2020 23:05:57 +0000 (GMT) Received: from localhost (localhost [127.0.0.1]) by zrozimbrai.dneg.com (Postfix) with ESMTP id 44B358264C58; Thu, 12 Nov 2020 23:05:57 +0000 (GMT) X-Virus-Scanned: amavisd-new at zimbra-dneg Received: from zrozimbrai.dneg.com ([127.0.0.1]) by localhost (zrozimbrai.dneg.com [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id JGK68F4eRZEt; Thu, 12 Nov 2020 23:05:57 +0000 (GMT) Received: from zrozimbra1.dneg.com (zrozimbra1.dneg.com [10.11.16.16]) by zrozimbrai.dneg.com (Postfix) with ESMTP id 26EF08264C51; Thu, 12 Nov 2020 23:05:57 +0000 (GMT) Date: Thu, 12 Nov 2020 23:05:57 +0000 (GMT) From: Daire Byrne To: bfields Cc: Trond Myklebust , linux-cachefs , linux-nfs Message-ID: <883314904.86570901.1605222357023.JavaMail.zimbra@dneg.com> In-Reply-To: <20201112205524.GI9243@fieldses.org> References: <943482310.31162206.1599499860595.JavaMail.zimbra@dneg.com> <279389889.68934777.1603124383614.JavaMail.zimbra@dneg.com> <635679406.70384074.1603272832846.JavaMail.zimbra@dneg.com> <20201109160256.GB11144@fieldses.org> <1744768451.86186596.1605186084252.JavaMail.zimbra@dneg.com> <20201112135733.GA9243@fieldses.org> <444227972.86442677.1605206025305.JavaMail.zimbra@dneg.com> <20201112205524.GI9243@fieldses.org> Subject: Re: Adventures in NFS re-exporting MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Mailer: Zimbra 8.7.11_GA_1854 (ZimbraWebClient - GC78 (Linux)/8.7.11_GA_1854) Thread-Topic: Adventures in NFS re-exporting Thread-Index: PdekKxbQPoF2p68YCT8h7p3DQKp8cA== Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org ----- On 12 Nov, 2020, at 20:55, bfields bfields@fieldses.org wrote: > On Thu, Nov 12, 2020 at 06:33:45PM +0000, Daire Byrne wrote: >> There was some discussion about NFS4_CHANGE_TYPE_IS_MONOTONIC_INCR >> allowing for the hack/optimisation but I guess that is only for the >> case when re-exporting NFSv4 to the eventual clients. It would not >> help if you were re-exporting an NFSv3 server with NFSv3 to the >> clients? I lack the deeper understanding to say anything more than >> that. > > Oh, right, thanks for the reminder. The CHANGE_TYPE_IS_MONOTONIC_INCR > optimization still looks doable to me. > > How does that help, anyway? I guess it avoids false positives of some > kind when rpc's are processed out of order? > > Looking back at > > https://lore.kernel.org/linux-nfs/1155061727.42788071.1600777874179.JavaMail.zimbra@dneg.com/ > > this bothers me: "I'm not exactly sure why, but the iversion of the > inode gets changed locally (due to atime modification?) most likely via > invocation of method inode_inc_iversion_raw. Each time it gets > incremented the following call to validate attributes detects changes > causing it to be reloaded from the originating server." > > The only call to that function outside afs or ceph code is in > fs/nfs/write.c, in the write delegation case. The Linux server doesn't > support write delegations, Netapp does but this shouldn't be causing > cache invalidations. So, I can't lay claim to identifying the exact optimisation/hack that improves the retention of the re-export server's client cache when re-exporting an NFSv3 server (which is then read by many clients). We were working with an engineer at the time who showed an interest in our use case and after we supplied a reproducer he suggested modifying the nfs/inode.c - if (!inode_eq_iversion_raw(inode, fattr->change_attr)) { + if (inode_peek_iversion_raw(inode) < fattr->change_attr) { His reasoning at the time was: "Fixes inode invalidation caused by read access. The least important bit is ORed with 1 and causes the inode version to differ from the one seen on the NFS share. This in turn causes unnecessary re-download impacting the performance significantly. This fix makes it only re-fetch file content if inode version seen on the server is newer than the one on the client." But I've always been puzzled by why this only seems to be the case when using knfsd to re-export the (NFSv3) client mount. Using multiple processes on a standard client mount never causes any similar re-validations. And this happens with a completely read-only share which is why I started to think it has something to do with atimes as that could perhaps still cause a "write" modification even when read-only? In our case we saw this at it's most extreme when we were re-exporting a read-only NFSv3 Netapp "software" share and loading large applications with many python search paths to trawl through. Multiple clients of the re-export server just kept causing the re-export server's client to re-validate and re-download from the Netapp even though no files or dirs had changed and the actimeo=large (with nocto for good measure). The patch made it such that the re-export server's client cache acted the same way if we ran 100 processes directly on the NFSv3 client mount (on the re-export server) or ran it on 100 clients of the re-export server - the data remained in client cache for the duration. So the re-export server fetches the data from the originating server once and then serves all those results many times over to all the clients from it's cache - exactly what we want. Daire