Received: by 2002:a05:6a10:f347:0:0:0:0 with SMTP id d7csp3930017pxu; Mon, 30 Nov 2020 13:28:14 -0800 (PST) X-Google-Smtp-Source: ABdhPJxsfSg4a+Y7oNi8ph06wt9RjdRe1BLKMqCGS15HhgRlyEyrTkGgLOHhW2kWDR+gzzvon/oh X-Received: by 2002:a05:6402:746:: with SMTP id p6mr2224704edy.313.1606771693824; Mon, 30 Nov 2020 13:28:13 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1606771693; cv=none; d=google.com; s=arc-20160816; b=Rg8pek4u/kVZTzXneNL3wdPeSkiJXKXsjke8hlRoU91kgwjDIrm2C8pfwNcMRwjNVj EYrlvjDzsR5qicE1Tg3UPoFuY5DOIKiHxQRrGRe1QkBVR9rmyjzLPy/YtBN9DWs1Zu73 JhkNUsT7ahq2jKzJDuGmcBNlTo5OlCQx0jS4gctvdXyuQOKHLXXOpMCVxhFb4HSjYdV8 +980DyN9bOxz08mR0l4E3Q6Nx/cjwOHsbGDS3VDvGyUA2fZDENIVJ6cYa1gE45cnvsjg SqOKUB+U+0ohfATb+wBNNzb2CLN1sSCsuaNdJOcbo7qVXwskMUP8OWNyViTyEkrvFQc8 +dSA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=TK86FYMh4NcwVgZ/HMjtTY7f6daA4fAh/FGCBw9SWi0=; b=K0Epw0IC2+2S5GMbdBH3uO83lHSKZzI6UXgm9srdUs5npKQq4pnhKSB15yuZOO7yri AJg+6x2OISLrIze6aiov/iGWbEr9P5u/o6LG20N+a1hsEI7u8kOOJN3s3B8CudHnBNYW cvC0SV10Zk5qdb9oqCWwawaC674pqyyt0m/dsQnVp63WAUD+bqt78uPSlRyU4iQQeqGN rxwjE9UL6RqwTt8e4uSMReSv2QVc5qgvlDQyA8MuADrrQuh8VWdQI9QtLa5jqCbO1raV lL9UKL7CnSEQP/WEyZkTEITqcPVKBjxuQBOHBBZAxRX8XSHhAnjkkhNghHmX4m94OMP2 Ipvw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=njX6OVIg; spf=pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-nfs-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id v7si15832726edj.402.2020.11.30.13.27.49; Mon, 30 Nov 2020 13:28:13 -0800 (PST) Received-SPF: pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=njX6OVIg; spf=pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-nfs-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2388485AbgK3VZn (ORCPT + 99 others); Mon, 30 Nov 2020 16:25:43 -0500 Received: from mail.kernel.org ([198.145.29.99]:42062 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2388604AbgK3VZm (ORCPT ); Mon, 30 Nov 2020 16:25:42 -0500 Received: from leira.hammer.space (c-68-36-133-222.hsd1.mi.comcast.net [68.36.133.222]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id D0D7C2084C; Mon, 30 Nov 2020 21:25:00 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1606771501; bh=HufUug6259kNClrNT4VXbOysm6eh56f38WGdc9pBN5E=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=njX6OVIgyvZBAm7ohJckIbtqiDNli6pjR/fYelk1LfkqOZGqu/DCilXNpRiVu2H6B VHk2ykv52PKgpVSVxBwHFH1NeoJu1GfNoE+9S8GFN3h3V9SFELk/I2zx8r2Y/HgqqS h+Krb0nrabA4aC7TJp9ofkKgemJV2vh2B2fxqudg= From: trondmy@kernel.org To: "J. Bruce Fields" , Chuck Lever Cc: linux-nfs@vger.kernel.org Subject: [PATCH 1/6] nfsd: add a new EXPORT_OP_NOWCC flag to struct export_operations Date: Mon, 30 Nov 2020 16:24:50 -0500 Message-Id: <20201130212455.254469-2-trondmy@kernel.org> X-Mailer: git-send-email 2.28.0 In-Reply-To: <20201130212455.254469-1-trondmy@kernel.org> References: <20201130212455.254469-1-trondmy@kernel.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org From: Jeff Layton With NFSv3 nfsd will always attempt to send along WCC data to the client. This generally involves saving off the in-core inode information prior to doing the operation on the given filehandle, and then issuing a vfs_getattr to it after the op. Some filesystems (particularly clustered or networked ones) have an expensive ->getattr inode operation. Atomicitiy is also often difficult or impossible to guarantee on such filesystems. For those, we're best off not trying to provide WCC information to the client at all, and to simply allow it to poll for that information as needed with a GETATTR RPC. This patch adds a new flags field to struct export_operations, and defines a new EXPORT_OP_NOWCC flag that filesystems can use to indicate that nfsd should not attempt to provide WCC info in NFSv3 replies. It also adds a blurb about the new flags field and flag to the exporting documentation. The server will also now skip collecting this information for NFSv2 as well, since that info is never used there anyway. Note that this patch does not add this flag to any filesystem export_operations structures. This was originally developed to allow reexporting nfs via nfsd. That code is not (and may never be) suitable for merging into mainline. Other filesystems may want to consider enabling this flag too. It's hard to tell however which ones have export operations to enable export via knfsd and which ones mostly rely on them for open-by-filehandle support, so I'm leaving that up to the individual maintainers to decide. I am cc'ing the relevant lists for those filesystems that I think may want to consider adding this though. Cc: HPDD-discuss@lists.01.org Cc: ceph-devel@vger.kernel.org Cc: cluster-devel@redhat.com Cc: fuse-devel@lists.sourceforge.net Cc: ocfs2-devel@oss.oracle.com Signed-off-by: Jeff Layton Signed-off-by: Lance Shelton Signed-off-by: Trond Myklebust --- Documentation/filesystems/nfs/exporting.rst | 27 +++++++++++++++++++++ fs/nfs/export.c | 1 + fs/nfsd/nfs3xdr.c | 7 ++++-- fs/nfsd/nfsfh.c | 14 +++++++++++ fs/nfsd/nfsfh.h | 2 +- include/linux/exportfs.h | 2 ++ 6 files changed, 50 insertions(+), 3 deletions(-) diff --git a/Documentation/filesystems/nfs/exporting.rst b/Documentation/filesystems/nfs/exporting.rst index 33d588a01ace..a3e3805833d1 100644 --- a/Documentation/filesystems/nfs/exporting.rst +++ b/Documentation/filesystems/nfs/exporting.rst @@ -154,6 +154,11 @@ struct which has the following members: to find potential names, and matches inode numbers to find the correct match. + flags + Some filesystems may need to be handled differently than others. The + export_operations struct also includes a flags field that allows the + filesystem to communicate such information to nfsd. See the Export + Operations Flags section below for more explanation. A filehandle fragment consists of an array of 1 or more 4byte words, together with a one byte "type". @@ -163,3 +168,25 @@ generated by encode_fh, in which case it will have been padded with nuls. Rather, the encode_fh routine should choose a "type" which indicates the decode_fh how much of the filehandle is valid, and how it should be interpreted. + +Export Operations Flags +----------------------- +In addition to the operation vector pointers, struct export_operations also +contains a "flags" field that allows the filesystem to communicate to nfsd +that it may want to do things differently when dealing with it. The +following flags are defined: + + EXPORT_OP_NOWCC + RFC 1813 recommends that servers always send weak cache consistency + (WCC) data to the client after each operation. The server should + atomically collect attributes about the inode, do an operation on it, + and then collect the attributes afterward. This allows the client to + skip issuing GETATTRs in some situations but means that the server + is calling vfs_getattr for almost all RPCs. On some filesystems + (particularly those that are clustered or networked) this is expensive + and atomicity is difficult to guarantee. This flag indicates to nfsd + that it should skip providing WCC attributes to the client in NFSv3 + replies when doing operations on this filesystem. Consider enabling + this on filesystems that have an expensive ->getattr inode operation, + or when atomicity between pre and post operation attribute collection + is impossible to guarantee. diff --git a/fs/nfs/export.c b/fs/nfs/export.c index 3430d6891e89..8f4c528865c5 100644 --- a/fs/nfs/export.c +++ b/fs/nfs/export.c @@ -171,4 +171,5 @@ const struct export_operations nfs_export_ops = { .encode_fh = nfs_encode_fh, .fh_to_dentry = nfs_fh_to_dentry, .get_parent = nfs_get_parent, + .flags = EXPORT_OP_NOWCC, }; diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c index 2277f83da250..480342675292 100644 --- a/fs/nfsd/nfs3xdr.c +++ b/fs/nfsd/nfs3xdr.c @@ -206,7 +206,7 @@ static __be32 * encode_post_op_attr(struct svc_rqst *rqstp, __be32 *p, struct svc_fh *fhp) { struct dentry *dentry = fhp->fh_dentry; - if (dentry && d_really_is_positive(dentry)) { + if (!fhp->fh_no_wcc && dentry && d_really_is_positive(dentry)) { __be32 err; struct kstat stat; @@ -261,7 +261,7 @@ void fill_pre_wcc(struct svc_fh *fhp) struct kstat stat; __be32 err; - if (fhp->fh_pre_saved) + if (fhp->fh_no_wcc || fhp->fh_pre_saved) return; inode = d_inode(fhp->fh_dentry); @@ -287,6 +287,9 @@ void fill_post_wcc(struct svc_fh *fhp) { __be32 err; + if (fhp->fh_no_wcc) + return; + if (fhp->fh_post_saved) printk("nfsd: inode locked twice during operation.\n"); diff --git a/fs/nfsd/nfsfh.c b/fs/nfsd/nfsfh.c index c81dbbad8792..0c2ee65e46f3 100644 --- a/fs/nfsd/nfsfh.c +++ b/fs/nfsd/nfsfh.c @@ -291,6 +291,16 @@ static __be32 nfsd_set_fh_dentry(struct svc_rqst *rqstp, struct svc_fh *fhp) fhp->fh_dentry = dentry; fhp->fh_export = exp; + + switch (rqstp->rq_vers) { + case 3: + if (!(dentry->d_sb->s_export_op->flags & EXPORT_OP_NOWCC)) + break; + /* Fallthrough */ + case 2: + fhp->fh_no_wcc = true; + } + return 0; out: exp_put(exp); @@ -559,6 +569,9 @@ fh_compose(struct svc_fh *fhp, struct svc_export *exp, struct dentry *dentry, */ set_version_and_fsid_type(fhp, exp, ref_fh); + /* If we have a ref_fh, then copy the fh_no_wcc setting from it. */ + fhp->fh_no_wcc = ref_fh ? ref_fh->fh_no_wcc : false; + if (ref_fh == fhp) fh_put(ref_fh); @@ -662,6 +675,7 @@ fh_put(struct svc_fh *fhp) exp_put(exp); fhp->fh_export = NULL; } + fhp->fh_no_wcc = false; return; } diff --git a/fs/nfsd/nfsfh.h b/fs/nfsd/nfsfh.h index 56cfbc361561..fb2b60a76b32 100644 --- a/fs/nfsd/nfsfh.h +++ b/fs/nfsd/nfsfh.h @@ -35,6 +35,7 @@ typedef struct svc_fh { bool fh_locked; /* inode locked by us */ bool fh_want_write; /* remount protection taken */ + bool fh_no_wcc; /* no wcc data needed */ int fh_flags; /* FH flags */ #ifdef CONFIG_NFSD_V3 bool fh_post_saved; /* post-op attrs saved */ @@ -54,7 +55,6 @@ typedef struct svc_fh { struct kstat fh_post_attr; /* full attrs after operation */ u64 fh_post_change; /* nfsv4 change; see above */ #endif /* CONFIG_NFSD_V3 */ - } svc_fh; #define NFSD4_FH_FOREIGN (1<<0) #define SET_FH_FLAG(c, f) ((c)->fh_flags |= (f)) diff --git a/include/linux/exportfs.h b/include/linux/exportfs.h index 3ceb72b67a7a..e7de0103a32e 100644 --- a/include/linux/exportfs.h +++ b/include/linux/exportfs.h @@ -213,6 +213,8 @@ struct export_operations { bool write, u32 *device_generation); int (*commit_blocks)(struct inode *inode, struct iomap *iomaps, int nr_iomaps, struct iattr *iattr); +#define EXPORT_OP_NOWCC (0x1) /* Don't collect wcc data for NFSv3 replies */ + unsigned long flags; }; extern int exportfs_encode_inode_fh(struct inode *inode, struct fid *fid, -- 2.28.0