Received: by 2002:a89:413:0:b0:1fd:dba5:e537 with SMTP id m19csp561960lqs; Thu, 13 Jun 2024 20:44:51 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCUrliNqWuRoQ0pZbRywEKB3UYQXOd8j+4AFL7R5slCqpf1eYWuqaTV3k+W7cB4oUZc5xi69SbODEW68ihByopedKgQWHLrx1UrIBkbCtw== X-Google-Smtp-Source: AGHT+IENqGmiAdqoJQqw3KB+iRELn1zX+PaiNI5VhCOyvEJybVLCbA9NaCRPL9aQQv0ypyAdhrtp X-Received: by 2002:a17:906:a44b:b0:a68:fb7e:f476 with SMTP id a640c23a62f3a-a6f60d2bcfamr81407366b.30.1718336691289; Thu, 13 Jun 2024 20:44:51 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1718336691; cv=pass; d=google.com; s=arc-20160816; b=YRwMQtHt75zV4J4iJM1YPpYfdz+QlqgOizIbabGge0MbfLOhov5f1pyYyudDH8HHyf ovpurPQu3TIKXTcahkzG7kBumbFxFH6z74tfmPD9nEtHwUbVdORNtZgPaeTWhlDANi2X 2dglzs4eE5vvFD5FKxxfd+KBUX1tAoMjPs0bsOTA5KgNnRXNQpK3Eeg0beEUewOI2Lua YGt6VMn/dRyXIpEYNV/hD6zdRnA+1u/VtE85HOGrp2LZNP5PhakTKpfFWMtA7zJTHROZ fdzVehbiqz8R6F5wQRhSiAYAXBdvZMMHTWslvWdHLAL+LJPhdnBED+83j/XlD7vI6fxK yzUQ== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:in-reply-to:message-id :date:subject:cc:to:from:dkim-signature; bh=bpaR3ISXytOeYmz7pHKnddqpKVe665dQlJbWvTuygks=; fh=+HdF1SWcNPggmowVZtzfZ2+SYSXLLgvZBECjaw6UQek=; b=GbqXvIgHJxNHThFL62zEnNDI+RF9FZ2q37JpGhqRsdjk4qnwXu0SzA9cHHrGbuztju luBPwTI9zF3G66h+GhfyZXLmh3RPMsYg2WpfqTG/lUjAZisRrtIbHlzXajWkT0yStAfj ev0Pz69i0j91ZUslwVVO+4jNlVAWfi2gPl5jIRa1LrDXfkh0Ps9DYsfWxyER38xEHp71 oc+MyogD2T6l/xsc+9HZ3fGWU4o+22Vvw9+uYZ4Sgze95gw9zhC1FH07IySIybvOJAGU TghkIIVGDVhAc6XWCPCGeGg4a/9CBIFQQtfjAa8FOjJZA575QvDUfQwzzo+ima+sCS3l KReg==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=QwQYq1hj; arc=pass (i=1 dkim=pass dkdomain=kernel.org); spf=pass (google.com: domain of linux-nfs+bounces-3809-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) smtp.mailfrom="linux-nfs+bounces-3809-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from am.mirrors.kernel.org (am.mirrors.kernel.org. [147.75.80.249]) by mx.google.com with ESMTPS id a640c23a62f3a-a6f56df80cfsi130452966b.577.2024.06.13.20.44.51 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 13 Jun 2024 20:44:51 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-nfs+bounces-3809-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) client-ip=147.75.80.249; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=QwQYq1hj; arc=pass (i=1 dkim=pass dkdomain=kernel.org); spf=pass (google.com: domain of linux-nfs+bounces-3809-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) smtp.mailfrom="linux-nfs+bounces-3809-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by am.mirrors.kernel.org (Postfix) with ESMTPS id CE8F41F24220 for ; Fri, 14 Jun 2024 03:44:50 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 98197146A97; Fri, 14 Jun 2024 03:44:40 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="QwQYq1hj" X-Original-To: linux-nfs@vger.kernel.org Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 751E7ECC for ; Fri, 14 Jun 2024 03:44:40 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1718336680; cv=none; b=SAYXECqWRFIpIvF/Myk5x9udUvXyc9q5/cH5rhqE95VSp4lYvQPxB8rV5sKZGdAt7Z3GlPn+QJgXVjv+yrjEdO+Txw3+UmMC1VvGS/3I/6ihaopklfvCupX82mFOskPHDxo7Qn/Oxu2hYkD95/eFizlVhsVLTle5y3rlzOmwryU= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1718336680; c=relaxed/simple; bh=Noogo8VEf449iO4uxp/t7QHEdzBDQXlb77KtF8kwk7U=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=VR+5aQqfSegCrBtmimpFlNz5RZ+g8QrDHmMFVOH19F5ZaXzTkSEJD3YhKuZ3TLpsrtdgkfAx3Oua/yWmxG2bozqqN06jWRxaqT72Ezd+88Yt+qQkkquIkJg+L7XMOKd8KsrisHQgrdDcCbU6qH6TCQmJmsvBjG/BWqkKDcUPj7w= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=QwQYq1hj; arc=none smtp.client-ip=10.30.226.201 Received: by smtp.kernel.org (Postfix) with ESMTPSA id D63B8C2BBFC; Fri, 14 Jun 2024 03:44:39 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1718336680; bh=Noogo8VEf449iO4uxp/t7QHEdzBDQXlb77KtF8kwk7U=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=QwQYq1hjdONG0UJeVpnWe5dmO4es1KZfHbS7Ln2Puk29+uBIQaXWsgDxOKw62hpEd w5DdAId9fpGXrOGB3Qv6/R3E4Xde3d9f/bw27iM4MdmwGSoPtYDVPPqYGNVVxY5gQq X4g34cAc3GQj2vPm5buaNrkEQ5y4um7Szkc0UPYsoU5BRCWKxDSgZvmuvpkLOL1wIc 3td7G+PO42K9xdY4+beHNn1Bqu/z1/DoFZqc0Yl2pugSMXWaQHkWVvMUcc5l1lSqdT H+w43QyUsYsibor2H161RB751nm6pwvxcncvIdQlcvsXt3Z1nTqiYUS4ssyhd0n5N9 mfDm78XtZZJ3g== From: Mike Snitzer To: linux-nfs@vger.kernel.org Cc: Jeff Layton , Chuck Lever , Trond Myklebust , NeilBrown , snitzer@hammerspace.com Subject: [PATCH v3 09/18] pnfs/flexfiles: Enable localio for flexfiles I/O Date: Thu, 13 Jun 2024 23:44:17 -0400 Message-ID: <20240614034426.31043-10-snitzer@kernel.org> X-Mailer: git-send-email 2.44.0 In-Reply-To: <20240614034426.31043-1-snitzer@kernel.org> References: <20240614034426.31043-1-snitzer@kernel.org> Precedence: bulk X-Mailing-List: linux-nfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit From: Trond Myklebust If the DS is local to this client, then we should be able to use local I/O to write the data. Signed-off-by: Peng Tao Signed-off-by: Lance Shelton Signed-off-by: Trond Myklebust Signed-off-by: Mike Snitzer --- fs/nfs/flexfilelayout/flexfilelayout.c | 113 ++++++++++++++++++++-- fs/nfs/flexfilelayout/flexfilelayout.h | 2 + fs/nfs/flexfilelayout/flexfilelayoutdev.c | 6 ++ 3 files changed, 112 insertions(+), 9 deletions(-) diff --git a/fs/nfs/flexfilelayout/flexfilelayout.c b/fs/nfs/flexfilelayout/flexfilelayout.c index 3ea07446f05a..ec6aaa110a7b 100644 --- a/fs/nfs/flexfilelayout/flexfilelayout.c +++ b/fs/nfs/flexfilelayout/flexfilelayout.c @@ -11,6 +11,7 @@ #include #include #include +#include #include #include @@ -162,6 +163,52 @@ decode_name(struct xdr_stream *xdr, u32 *id) return 0; } +static struct file * +ff_local_open_fh(struct pnfs_layout_segment *lseg, + u32 ds_idx, + struct nfs_client *clp, + const struct cred *cred, + struct nfs_fh *fh, + fmode_t mode) +{ + struct nfs4_ff_layout_mirror *mirror = FF_LAYOUT_COMP(lseg, ds_idx); + struct file *filp, *new, __rcu **pfile; + + if (!nfs_server_is_local(clp)) + return NULL; + if (mode & FMODE_WRITE) { + /* + * Always request read and write access since this corresponds + * to a rw layout. + */ + mode |= FMODE_READ; + pfile = &mirror->rw_file; + } else + pfile = &mirror->ro_file; + + new = NULL; + rcu_read_lock(); + filp = rcu_dereference(*pfile); + if (!filp) { + rcu_read_unlock(); + new = nfs_local_open_fh(clp, cred, fh, mode); + if (IS_ERR(new)) + return NULL; + rcu_read_lock(); + /* try to swap in the pointer */ + filp = cmpxchg(pfile, NULL, new); + if (!filp) { + filp = new; + new = NULL; + } + } + filp = get_file_rcu(&filp); + rcu_read_unlock(); + if (new) + fput(new); + return filp; +} + static bool ff_mirror_match_fh(const struct nfs4_ff_layout_mirror *m1, const struct nfs4_ff_layout_mirror *m2) { @@ -237,8 +284,15 @@ static struct nfs4_ff_layout_mirror *ff_layout_alloc_mirror(gfp_t gfp_flags) static void ff_layout_free_mirror(struct nfs4_ff_layout_mirror *mirror) { + struct file *filp; const struct cred *cred; + filp = rcu_access_pointer(mirror->ro_file); + if (filp) + fput(filp); + filp = rcu_access_pointer(mirror->rw_file); + if (filp) + fput(filp); ff_layout_remove_mirror(mirror); kfree(mirror->fh_versions); cred = rcu_access_pointer(mirror->ro_cred); @@ -414,6 +468,7 @@ ff_layout_alloc_lseg(struct pnfs_layout_hdr *lh, struct nfs4_ff_layout_mirror *mirror; struct cred *kcred; const struct cred __rcu *cred; + const struct cred __rcu *old; kuid_t uid; kgid_t gid; u32 ds_count, fh_count, id; @@ -513,13 +568,26 @@ ff_layout_alloc_lseg(struct pnfs_layout_hdr *lh, mirror = ff_layout_add_mirror(lh, fls->mirror_array[i]); if (mirror != fls->mirror_array[i]) { + struct file *filp; + /* swap cred ptrs so free_mirror will clean up old */ if (lgr->range.iomode == IOMODE_READ) { - cred = xchg(&mirror->ro_cred, cred); - rcu_assign_pointer(fls->mirror_array[i]->ro_cred, cred); + old = xchg(&mirror->ro_cred, cred); + rcu_assign_pointer(fls->mirror_array[i]->ro_cred, old); + /* drop file if creds changed */ + if (old != cred) { + filp = rcu_dereference_protected(xchg(&mirror->ro_file, NULL), 1); + if (filp) + fput(filp); + } } else { - cred = xchg(&mirror->rw_cred, cred); - rcu_assign_pointer(fls->mirror_array[i]->rw_cred, cred); + old = xchg(&mirror->rw_cred, cred); + rcu_assign_pointer(fls->mirror_array[i]->rw_cred, old); + if (old != cred) { + filp = rcu_dereference_protected(xchg(&mirror->rw_file, NULL), 1); + if (filp) + fput(filp); + } } ff_layout_free_mirror(fls->mirror_array[i]); fls->mirror_array[i] = mirror; @@ -1757,6 +1825,7 @@ ff_layout_read_pagelist(struct nfs_pageio_descriptor *desc, struct pnfs_layout_segment *lseg = hdr->lseg; struct nfs4_pnfs_ds *ds; struct rpc_clnt *ds_clnt; + struct file *filp; struct nfs4_ff_layout_mirror *mirror; const struct cred *ds_cred; loff_t offset = hdr->args.offset; @@ -1803,12 +1872,20 @@ ff_layout_read_pagelist(struct nfs_pageio_descriptor *desc, hdr->args.offset = offset; hdr->mds_offset = offset; + /* Start IO accounting for local read */ + filp = ff_local_open_fh(lseg, idx, ds->ds_clp, ds_cred, fh, + FMODE_READ); + if (filp) { + hdr->task.tk_start = ktime_get(); + ff_layout_read_record_layoutstats_start(&hdr->task, hdr); + } + /* Perform an asynchronous read to ds */ nfs_initiate_pgio(desc, ds->ds_clp, ds_clnt, hdr, ds_cred, ds->ds_clp->rpc_ops, vers == 3 ? &ff_layout_read_call_ops_v3 : &ff_layout_read_call_ops_v4, - 0, RPC_TASK_SOFTCONN, NULL); + 0, RPC_TASK_SOFTCONN, filp); put_cred(ds_cred); return PNFS_ATTEMPTED; @@ -1829,6 +1906,7 @@ ff_layout_write_pagelist(struct nfs_pageio_descriptor *desc, struct pnfs_layout_segment *lseg = hdr->lseg; struct nfs4_pnfs_ds *ds; struct rpc_clnt *ds_clnt; + struct file *filp; struct nfs4_ff_layout_mirror *mirror; const struct cred *ds_cred; loff_t offset = hdr->args.offset; @@ -1873,12 +1951,20 @@ ff_layout_write_pagelist(struct nfs_pageio_descriptor *desc, */ hdr->args.offset = offset; + /* Start IO accounting for local write */ + filp = ff_local_open_fh(lseg, idx, ds->ds_clp, ds_cred, fh, + FMODE_READ|FMODE_WRITE); + if (filp) { + hdr->task.tk_start = ktime_get(); + ff_layout_write_record_layoutstats_start(&hdr->task, hdr); + } + /* Perform an asynchronous write */ nfs_initiate_pgio(desc, ds->ds_clp, ds_clnt, hdr, ds_cred, ds->ds_clp->rpc_ops, vers == 3 ? &ff_layout_write_call_ops_v3 : &ff_layout_write_call_ops_v4, - sync, RPC_TASK_SOFTCONN, NULL); + sync, RPC_TASK_SOFTCONN, filp); put_cred(ds_cred); return PNFS_ATTEMPTED; @@ -1912,6 +1998,7 @@ static int ff_layout_initiate_commit(struct nfs_commit_data *data, int how) struct pnfs_layout_segment *lseg = data->lseg; struct nfs4_pnfs_ds *ds; struct rpc_clnt *ds_clnt; + struct file *filp; struct nfs4_ff_layout_mirror *mirror; const struct cred *ds_cred; u32 idx; @@ -1950,10 +2037,18 @@ static int ff_layout_initiate_commit(struct nfs_commit_data *data, int how) if (fh) data->args.fh = fh; + /* Start IO accounting for local commit */ + filp = ff_local_open_fh(lseg, idx, ds->ds_clp, ds_cred, fh, + FMODE_READ|FMODE_WRITE); + if (filp) { + data->task.tk_start = ktime_get(); + ff_layout_commit_record_layoutstats_start(&data->task, data); + } + ret = nfs_initiate_commit(ds_clnt, data, ds->ds_clp->rpc_ops, - vers == 3 ? &ff_layout_commit_call_ops_v3 : - &ff_layout_commit_call_ops_v4, - how, RPC_TASK_SOFTCONN, NULL); + vers == 3 ? &ff_layout_commit_call_ops_v3 : + &ff_layout_commit_call_ops_v4, + how, RPC_TASK_SOFTCONN, filp); put_cred(ds_cred); return ret; out_err: diff --git a/fs/nfs/flexfilelayout/flexfilelayout.h b/fs/nfs/flexfilelayout/flexfilelayout.h index f84b3fb0dddd..8e042df5a2c9 100644 --- a/fs/nfs/flexfilelayout/flexfilelayout.h +++ b/fs/nfs/flexfilelayout/flexfilelayout.h @@ -82,7 +82,9 @@ struct nfs4_ff_layout_mirror { struct nfs_fh *fh_versions; nfs4_stateid stateid; const struct cred __rcu *ro_cred; + struct file __rcu *ro_file; const struct cred __rcu *rw_cred; + struct file __rcu *rw_file; refcount_t ref; spinlock_t lock; unsigned long flags; diff --git a/fs/nfs/flexfilelayout/flexfilelayoutdev.c b/fs/nfs/flexfilelayout/flexfilelayoutdev.c index e028f5a0ef5f..e58bedfb1dcc 100644 --- a/fs/nfs/flexfilelayout/flexfilelayoutdev.c +++ b/fs/nfs/flexfilelayout/flexfilelayoutdev.c @@ -395,6 +395,12 @@ nfs4_ff_layout_prepare_ds(struct pnfs_layout_segment *lseg, /* connect success, check rsize/wsize limit */ if (!status) { + /* + * ds_clp is put in destroy_ds(). + * keep ds_clp even if DS is local, so that if local IO cannot + * proceed somehow, we can fall back to NFS whenever we want. + */ + nfs_local_probe(ds->ds_clp); max_payload = nfs_block_size(rpc_max_payload(ds->ds_clp->cl_rpcclient), NULL); -- 2.44.0