Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id BE8D6C636CC for ; Tue, 7 Feb 2023 17:02:54 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231455AbjBGRCx (ORCPT ); Tue, 7 Feb 2023 12:02:53 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43940 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230509AbjBGRCw (ORCPT ); Tue, 7 Feb 2023 12:02:52 -0500 Received: from ams.source.kernel.org (ams.source.kernel.org [IPv6:2604:1380:4601:e00::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 383DC2703 for ; Tue, 7 Feb 2023 09:02:51 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id DF108B81A01 for ; Tue, 7 Feb 2023 17:02:49 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 4145AC433EF; Tue, 7 Feb 2023 17:02:48 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1675789368; bh=xKUawFi70PKyB/MDlFAR5ve4IJWAXAVX3Xz4jzFVv/U=; h=From:To:Cc:Subject:Date:From; b=c19n5kTHDOVDlAf+yq2HZssMnNC0JF0vbHHg160CbcjzgUo5HrapMsvgb/6BJXnIx cGBZTj9MdzmHxXKmqcdGJMXtfc2kX/A69v4AwLaVihc6TSWrUptpSRZsxHx9weqhnT yPYDfwWcnmO18Q8dJK5/Ch/iEvC9gLE5McIeUkv1vepCD1SDvF7TcUvJSlYAHFh5Zt IRmpItODWJ/zv4l7fPcO2lUhvPXw/6kxDoQawDXiJy947V8Hg/4vp5E3HYrDF+S747 rmI1oOdN0qNyKSKxOXbd4r0YOKZhGHj9Llk8Do54X5OP/hp5Z56k93wLCvAstMYPvv 6rKly6FJbzlRA== From: Jeff Layton To: chuck.lever@oracle.com Cc: linux-nfs@vger.kernel.org, Pierguido Lambri Subject: [PATCH v2] nfsd: don't fsync nfsd_files on last close Date: Tue, 7 Feb 2023 12:02:46 -0500 Message-Id: <20230207170246.198342-1-jlayton@kernel.org> X-Mailer: git-send-email 2.39.1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org Most of the time, NFSv4 clients issue a COMMIT before the final CLOSE of an open stateid, so with NFSv4, the fsync in the nfsd_file_free path is usually a no-op and doesn't block. We have a customer running knfsd over very slow storage (XFS over Ceph RBD). They were using the "async" export option because performance was more important than data integrity for this application. That export option turns NFSv4 COMMIT calls into no-ops. Due to the fsync in this codepath however, their final CLOSE calls would still stall (since a CLOSE effectively became a COMMIT). I think this fsync is not strictly necessary. We only use that result to reset the write verifier. Instead of fsync'ing all of the data when we free an nfsd_file, we can just check for writeback errors when one is acquired and when it is freed. If the client never comes back, then it'll never see the error anyway and there is no point in resetting it. If an error occurs after the nfsd_file is removed from the cache but before the inode is evicted, then it will reset the write verifier on the next nfsd_file_acquire, (since there will be an unseen error). The only exception here is if something else opens and fsyncs the file during that window. Given that local applications work with this limitation today, I don't see that as an issue. Link: https://bugzilla.redhat.com/show_bug.cgi?id=2166658 Fixes: ac3a2585f018 (nfsd: rework refcounting in filecache) Reported-and-Tested-by: Pierguido Lambri Signed-off-by: Jeff Layton --- fs/nfsd/filecache.c | 44 ++++++++++++-------------------------------- fs/nfsd/trace.h | 31 ------------------------------- 2 files changed, 12 insertions(+), 63 deletions(-) v2: - rebase onto nfsd-next branch - add Fixes and Link tags diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c index ecc32105c3ab..6e8712bd7c99 100644 --- a/fs/nfsd/filecache.c +++ b/fs/nfsd/filecache.c @@ -331,37 +331,27 @@ nfsd_file_alloc(struct nfsd_file_lookup_key *key, unsigned int may) return nf; } +/** + * nfsd_file_check_write_error - check for writeback errors on a file + * @nf: nfsd_file to check for writeback errors + * + * Check whether a nfsd_file has an unseen error. Reset the write + * verifier if so. + */ static void -nfsd_file_fsync(struct nfsd_file *nf) -{ - struct file *file = nf->nf_file; - int ret; - - if (!file || !(file->f_mode & FMODE_WRITE)) - return; - ret = vfs_fsync(file, 1); - trace_nfsd_file_fsync(nf, ret); - if (ret) - nfsd_reset_write_verifier(net_generic(nf->nf_net, nfsd_net_id)); -} - -static int nfsd_file_check_write_error(struct nfsd_file *nf) { struct file *file = nf->nf_file; - if (!file || !(file->f_mode & FMODE_WRITE)) - return 0; - return filemap_check_wb_err(file->f_mapping, READ_ONCE(file->f_wb_err)); + if ((file->f_mode & FMODE_WRITE) && + filemap_check_wb_err(file->f_mapping, READ_ONCE(file->f_wb_err))) + nfsd_reset_write_verifier(net_generic(nf->nf_net, nfsd_net_id)); } static void nfsd_file_hash_remove(struct nfsd_file *nf) { trace_nfsd_file_unhash(nf); - - if (nfsd_file_check_write_error(nf)) - nfsd_reset_write_verifier(net_generic(nf->nf_net, nfsd_net_id)); rhashtable_remove_fast(&nfsd_file_rhash_tbl, &nf->nf_rhash, nfsd_file_rhash_params); } @@ -387,23 +377,12 @@ nfsd_file_free(struct nfsd_file *nf) this_cpu_add(nfsd_file_total_age, age); nfsd_file_unhash(nf); - - /* - * We call fsync here in order to catch writeback errors. It's not - * strictly required by the protocol, but an nfsd_file could get - * evicted from the cache before a COMMIT comes in. If another - * task were to open that file in the interim and scrape the error, - * then the client may never see it. By calling fsync here, we ensure - * that writeback happens before the entry is freed, and that any - * errors reported result in the write verifier changing. - */ - nfsd_file_fsync(nf); - if (nf->nf_mark) nfsd_file_mark_put(nf->nf_mark); if (nf->nf_file) { get_file(nf->nf_file); filp_close(nf->nf_file, NULL); + nfsd_file_check_write_error(nf); fput(nf->nf_file); } @@ -1158,6 +1137,7 @@ nfsd_file_do_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp, out: if (status == nfs_ok) { this_cpu_inc(nfsd_file_acquisitions); + nfsd_file_check_write_error(nf); *pnf = nf; } else { if (refcount_dec_and_test(&nf->nf_ref)) diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h index 8f9c82d9e075..4183819ea082 100644 --- a/fs/nfsd/trace.h +++ b/fs/nfsd/trace.h @@ -1202,37 +1202,6 @@ TRACE_EVENT(nfsd_file_close, ) ); -TRACE_EVENT(nfsd_file_fsync, - TP_PROTO( - const struct nfsd_file *nf, - int ret - ), - TP_ARGS(nf, ret), - TP_STRUCT__entry( - __field(void *, nf_inode) - __field(int, nf_ref) - __field(int, ret) - __field(unsigned long, nf_flags) - __field(unsigned char, nf_may) - __field(struct file *, nf_file) - ), - TP_fast_assign( - __entry->nf_inode = nf->nf_inode; - __entry->nf_ref = refcount_read(&nf->nf_ref); - __entry->ret = ret; - __entry->nf_flags = nf->nf_flags; - __entry->nf_may = nf->nf_may; - __entry->nf_file = nf->nf_file; - ), - TP_printk("inode=%p ref=%d flags=%s may=%s nf_file=%p ret=%d", - __entry->nf_inode, - __entry->nf_ref, - show_nf_flags(__entry->nf_flags), - show_nfsd_may_flags(__entry->nf_may), - __entry->nf_file, __entry->ret - ) -); - #include "cache.h" TRACE_DEFINE_ENUM(RC_DROPIT); -- 2.39.1