2024-06-07 14:26:56

by Mike Snitzer

[permalink] [raw]
Subject: [for-6.11 PATCH 00/29] nfs/nfsd: add support for localio bypass

Hi,

This patch series rebases "localio" changes that Hammerspace (and
Primary Data before it) has been carrying since 2014. The reason they
weren't proposed for upstream inclusion until now was the handshake
for whether or not a client and server are local was brittle. Please
see the commit header of "nfs/localio: discontinue network address
based localio setup" (patch 20) for more context.

Aside from rebasing the original changes (patches 1 - 18) from a
5.15.-130-stable kernel, my contribution to this series was to make
the localio handshake more robust. To do so a new LOCALIO protocol
extension has been added to both NFS v3 and v4. It follows the
well-worn pattern established by the ACL protocol extension.

These changes have proven stable against various test scenarios:
1) client and server both on localhost (for both v3 and v4.2)
2) various permutations of client and server support enablement for
both local and remote client and server.
3) client on host, server within a container (for both v3 and v4.2)

I've preserved all established author and Signed-off-by attribution
despite Andy, Peng and Jeff no longer working for Primary Data (or
Hammerspace). I've confirmed with Trond that its best to keep it all
despite those email addresses no longer being active. My Signed-off-by
and that of reviewers and maintainer(s) to follow will build on the
established development provenance.

I also made sure to preserve the original work done by others (rather
than fold changes that I add to this work, to avoid tainting the long
established development and sequence of changes).

My container testing was done in terms of podman managed containers.
I'd appreciate additional review relative to network namespaces.
fs/nfsd/localio.c:nfsd_local_fakerqst_create() in particular is simply
using the client's network namespace with rpc_net_ns(rpc_clnt). I have
an extra patch that updates nfsd_open_local_fh()'s first argument to
be the server's 'struct net' -- but I stopped short of formally
including that change in this series because it hasn't proven needed
(but more exotic hypothetical scenarios could easily expose the need
for it). I can append it to the series as an "RFC PATCH 30/29" as
needed.

All review and comments are welcome!

Thanks,
Mike

Mike Snitzer (11):
nfs/write: fix nfs_initiate_commit to return error from nfs_local_commit
nfs/localio: discontinue network address based localio setup
nfs_common: add NFS v3 LOCALIO protocol extension enablement
nfs: implement v3 client support for NFS_LOCALIO_PROGRAM
nfsd: implement v3 server support for NFS_LOCALIO_PROGRAM
nfs_common: add NFS v4 LOCALIO protocol extension enablement
nfs: implement v4 client support for NFS_LOCALIO_PROGRAM
nfsd: implement v4 server support for NFS_LOCALIO_PROGRAM
nfs/nfsd: switch GETUUID to using {encode,decode}_opaque_fixed
nfs/nfsd: consolidate {encode,decode}_opaque_fixed in nfs_xdr.h
nfs/localio: move managing nfsd_open_local_fh symbol to nfs_common

Peng Tao (3):
sunrpc: add and export rpc_ntop6_addr_noscopeid
nfs: move nfs_stat_to_errno to nfs.h
nfs/flexfiles: check local DS when making DS connections

Trond Myklebust (8):
NFS: Manage boot verifier correctly in the case of localio
NFS: Enable localio for non-pNFS I/O
pnfs/flexfiles: Enable localio for flexfiles I/O
NFS: Add tracepoints for nfs_local_enable and nfs_local_disable
NFS: Don't call filesystem write() routines directly
NFS: Don't call filesystem read() routines directly
NFS: Use completion rather than flush_work() in nfs_local_commit()
NFS: localio writes need to use a normal workqueue

Weston Andros Adamson (7):
nfs: pass nfs_client to nfs_initiate_pgio
nfs: pass nfs_client to nfs_initiate_commit
nfs: pass descriptor thru nfs_initiate_pgio path
sunrpc: handle NULL req->defer in cache_defer_req
sunrpc: export svc_defer
sunrpc: add rpcauth_map_to_svc_cred
nfs/nfsd: add "local io" support

fs/Kconfig | 3 +
fs/nfs/Kconfig | 25 +
fs/nfs/Makefile | 2 +
fs/nfs/blocklayout/blocklayout.c | 6 +-
fs/nfs/client.c | 15 +-
fs/nfs/filelayout/filelayout.c | 19 +-
fs/nfs/flexfilelayout/flexfilelayout.c | 129 +++-
fs/nfs/flexfilelayout/flexfilelayout.h | 2 +
fs/nfs/flexfilelayout/flexfilelayoutdev.c | 6 +
fs/nfs/inode.c | 28 +-
fs/nfs/internal.h | 101 ++-
fs/nfs/localio.c | 814 ++++++++++++++++++++++
fs/nfs/nfs2xdr.c | 69 --
fs/nfs/nfs3_fs.h | 1 +
fs/nfs/nfs3client.c | 25 +
fs/nfs/nfs3proc.c | 3 +
fs/nfs/nfs3xdr.c | 58 ++
fs/nfs/nfs4_fs.h | 2 +
fs/nfs/nfs4client.c | 23 +
fs/nfs/nfs4proc.c | 3 +
fs/nfs/nfs4xdr.c | 65 +-
fs/nfs/nfstrace.h | 61 ++
fs/nfs/pagelist.c | 35 +-
fs/nfs/pnfs.c | 24 +-
fs/nfs/pnfs.h | 6 +-
fs/nfs/pnfs_nfs.c | 5 +-
fs/nfs/write.c | 28 +-
fs/nfs_common/Makefile | 3 +
fs/nfs_common/nfslocalio.c | 68 ++
fs/nfsd/Kconfig | 25 +
fs/nfsd/Makefile | 2 +
fs/nfsd/filecache.c | 2 +-
fs/nfsd/localio.c | 324 +++++++++
fs/nfsd/netns.h | 4 +
fs/nfsd/nfsd.h | 11 +
fs/nfsd/nfssvc.c | 91 ++-
fs/nfsd/trace.h | 3 +-
fs/nfsd/vfs.h | 8 +
fs/nfsd/xdr.h | 6 +
include/linux/nfs.h | 65 ++
include/linux/nfs_fs.h | 2 +
include/linux/nfs_fs_sb.h | 8 +
include/linux/nfs_xdr.h | 31 +-
include/linux/nfslocalio.h | 37 +
include/linux/sunrpc/auth.h | 4 +
include/linux/sunrpc/svc_xprt.h | 1 +
include/uapi/linux/nfs.h | 4 +
net/sunrpc/auth.c | 16 +
net/sunrpc/cache.c | 2 +
net/sunrpc/svc_xprt.c | 4 +-
50 files changed, 2120 insertions(+), 159 deletions(-)
create mode 100644 fs/nfs/localio.c
create mode 100644 fs/nfs_common/nfslocalio.c
create mode 100644 fs/nfsd/localio.c
create mode 100644 include/linux/nfslocalio.h

--
2.44.0



2024-06-07 14:26:58

by Mike Snitzer

[permalink] [raw]
Subject: [for-6.11 PATCH 01/29] nfs: pass nfs_client to nfs_initiate_pgio

From: Weston Andros Adamson <[email protected]>

The nfs_client is needed for localio support.

Signed-off-by: Weston Andros Adamson <[email protected]>
Signed-off-by: Lance Shelton <[email protected]>
Signed-off-by: Trond Myklebust <[email protected]>
Signed-off-by: Mike Snitzer <[email protected]>
---
fs/nfs/filelayout/filelayout.c | 4 ++--
fs/nfs/flexfilelayout/flexfilelayout.c | 6 ++++--
fs/nfs/internal.h | 5 +++--
fs/nfs/pagelist.c | 10 ++++++----
4 files changed, 15 insertions(+), 10 deletions(-)

diff --git a/fs/nfs/filelayout/filelayout.c b/fs/nfs/filelayout/filelayout.c
index 29d84dc66ca3..43e16e9e0176 100644
--- a/fs/nfs/filelayout/filelayout.c
+++ b/fs/nfs/filelayout/filelayout.c
@@ -486,7 +486,7 @@ filelayout_read_pagelist(struct nfs_pgio_header *hdr)
hdr->mds_offset = offset;

/* Perform an asynchronous read to ds */
- nfs_initiate_pgio(ds_clnt, hdr, hdr->cred,
+ nfs_initiate_pgio(ds->ds_clp, ds_clnt, hdr, hdr->cred,
NFS_PROTO(hdr->inode), &filelayout_read_call_ops,
0, RPC_TASK_SOFTCONN);
return PNFS_ATTEMPTED;
@@ -528,7 +528,7 @@ filelayout_write_pagelist(struct nfs_pgio_header *hdr, int sync)
hdr->args.offset = filelayout_get_dserver_offset(lseg, offset);

/* Perform an asynchronous write */
- nfs_initiate_pgio(ds_clnt, hdr, hdr->cred,
+ nfs_initiate_pgio(ds->ds_clp, ds_clnt, hdr, hdr->cred,
NFS_PROTO(hdr->inode), &filelayout_write_call_ops,
sync, RPC_TASK_SOFTCONN);
return PNFS_ATTEMPTED;
diff --git a/fs/nfs/flexfilelayout/flexfilelayout.c b/fs/nfs/flexfilelayout/flexfilelayout.c
index 24188af56d5b..327f1a5c9fbe 100644
--- a/fs/nfs/flexfilelayout/flexfilelayout.c
+++ b/fs/nfs/flexfilelayout/flexfilelayout.c
@@ -1803,7 +1803,8 @@ ff_layout_read_pagelist(struct nfs_pgio_header *hdr)
hdr->mds_offset = offset;

/* Perform an asynchronous read to ds */
- nfs_initiate_pgio(ds_clnt, hdr, ds_cred, ds->ds_clp->rpc_ops,
+ nfs_initiate_pgio(ds->ds_clp, ds_clnt, hdr, ds_cred,
+ ds->ds_clp->rpc_ops,
vers == 3 ? &ff_layout_read_call_ops_v3 :
&ff_layout_read_call_ops_v4,
0, RPC_TASK_SOFTCONN);
@@ -1871,7 +1872,8 @@ ff_layout_write_pagelist(struct nfs_pgio_header *hdr, int sync)
hdr->args.offset = offset;

/* Perform an asynchronous write */
- nfs_initiate_pgio(ds_clnt, hdr, ds_cred, ds->ds_clp->rpc_ops,
+ nfs_initiate_pgio(ds->ds_clp, ds_clnt, hdr, ds_cred,
+ ds->ds_clp->rpc_ops,
vers == 3 ? &ff_layout_write_call_ops_v3 :
&ff_layout_write_call_ops_v4,
sync, RPC_TASK_SOFTCONN);
diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
index 9f0f4534744b..a9c0c29f7804 100644
--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -306,8 +306,9 @@ extern const struct nfs_pageio_ops nfs_pgio_rw_ops;
struct nfs_pgio_header *nfs_pgio_header_alloc(const struct nfs_rw_ops *);
void nfs_pgio_header_free(struct nfs_pgio_header *);
int nfs_generic_pgio(struct nfs_pageio_descriptor *, struct nfs_pgio_header *);
-int nfs_initiate_pgio(struct rpc_clnt *clnt, struct nfs_pgio_header *hdr,
- const struct cred *cred, const struct nfs_rpc_ops *rpc_ops,
+int nfs_initiate_pgio(struct nfs_client *clp, struct rpc_clnt *rpc_clnt,
+ struct nfs_pgio_header *hdr, const struct cred *cred,
+ const struct nfs_rpc_ops *rpc_ops,
const struct rpc_call_ops *call_ops, int how, int flags);
void nfs_free_request(struct nfs_page *req);
struct nfs_pgio_mirror *
diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
index 6efb5068c116..d9b795c538cd 100644
--- a/fs/nfs/pagelist.c
+++ b/fs/nfs/pagelist.c
@@ -844,8 +844,9 @@ static void nfs_pgio_prepare(struct rpc_task *task, void *calldata)
rpc_exit(task, err);
}

-int nfs_initiate_pgio(struct rpc_clnt *clnt, struct nfs_pgio_header *hdr,
- const struct cred *cred, const struct nfs_rpc_ops *rpc_ops,
+int nfs_initiate_pgio(struct nfs_client *clp, struct rpc_clnt *rpc_clnt,
+ struct nfs_pgio_header *hdr, const struct cred *cred,
+ const struct nfs_rpc_ops *rpc_ops,
const struct rpc_call_ops *call_ops, int how, int flags)
{
struct rpc_task *task;
@@ -855,7 +856,7 @@ int nfs_initiate_pgio(struct rpc_clnt *clnt, struct nfs_pgio_header *hdr,
.rpc_cred = cred,
};
struct rpc_task_setup task_setup_data = {
- .rpc_client = clnt,
+ .rpc_client = rpc_clnt,
.task = &hdr->task,
.rpc_message = &msg,
.callback_ops = call_ops,
@@ -1070,7 +1071,8 @@ static int nfs_generic_pg_pgios(struct nfs_pageio_descriptor *desc)
if (ret == 0) {
if (NFS_SERVER(hdr->inode)->nfs_client->cl_minorversion)
task_flags = RPC_TASK_MOVEABLE;
- ret = nfs_initiate_pgio(NFS_CLIENT(hdr->inode),
+ ret = nfs_initiate_pgio(NFS_SERVER(hdr->inode)->nfs_client,
+ NFS_CLIENT(hdr->inode),
hdr,
hdr->cred,
NFS_PROTO(hdr->inode),
--
2.44.0


2024-06-07 14:27:00

by Mike Snitzer

[permalink] [raw]
Subject: [for-6.11 PATCH 02/29] nfs: pass nfs_client to nfs_initiate_commit

From: Weston Andros Adamson <[email protected]>

The nfs_client is needed for localio support.

Signed-off-by: Weston Andros Adamson <[email protected]>
Signed-off-by: Lance Shelton <[email protected]>
Signed-off-by: Trond Myklebust <[email protected]>
Signed-off-by: Mike Snitzer <[email protected]>
---
fs/nfs/filelayout/filelayout.c | 3 ++-
fs/nfs/flexfilelayout/flexfilelayout.c | 2 +-
fs/nfs/internal.h | 3 ++-
fs/nfs/pnfs_nfs.c | 3 ++-
fs/nfs/write.c | 9 ++++++---
5 files changed, 13 insertions(+), 7 deletions(-)

diff --git a/fs/nfs/filelayout/filelayout.c b/fs/nfs/filelayout/filelayout.c
index 43e16e9e0176..0c4a1fbb6a19 100644
--- a/fs/nfs/filelayout/filelayout.c
+++ b/fs/nfs/filelayout/filelayout.c
@@ -1009,7 +1009,8 @@ static int filelayout_initiate_commit(struct nfs_commit_data *data, int how)
fh = select_ds_fh_from_commit(lseg, data->ds_commit_index);
if (fh)
data->args.fh = fh;
- return nfs_initiate_commit(ds_clnt, data, NFS_PROTO(data->inode),
+ return nfs_initiate_commit(ds->ds_clp, ds_clnt, data,
+ NFS_PROTO(data->inode),
&filelayout_commit_call_ops, how,
RPC_TASK_SOFTCONN);
out_err:
diff --git a/fs/nfs/flexfilelayout/flexfilelayout.c b/fs/nfs/flexfilelayout/flexfilelayout.c
index 327f1a5c9fbe..dee4bc560b8e 100644
--- a/fs/nfs/flexfilelayout/flexfilelayout.c
+++ b/fs/nfs/flexfilelayout/flexfilelayout.c
@@ -1948,7 +1948,7 @@ static int ff_layout_initiate_commit(struct nfs_commit_data *data, int how)
if (fh)
data->args.fh = fh;

- ret = nfs_initiate_commit(ds_clnt, data, ds->ds_clp->rpc_ops,
+ ret = nfs_initiate_commit(ds->ds_clp, ds_clnt, data, ds->ds_clp->rpc_ops,
vers == 3 ? &ff_layout_commit_call_ops_v3 :
&ff_layout_commit_call_ops_v4,
how, RPC_TASK_SOFTCONN);
diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
index a9c0c29f7804..13c28cae45c5 100644
--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -525,7 +525,8 @@ extern void nfs_pageio_init_write(struct nfs_pageio_descriptor *pgio,
extern void nfs_pageio_reset_write_mds(struct nfs_pageio_descriptor *pgio);
extern void nfs_commit_free(struct nfs_commit_data *p);
extern void nfs_commit_prepare(struct rpc_task *task, void *calldata);
-extern int nfs_initiate_commit(struct rpc_clnt *clnt,
+extern int nfs_initiate_commit(struct nfs_client *clp,
+ struct rpc_clnt *clnt,
struct nfs_commit_data *data,
const struct nfs_rpc_ops *nfs_ops,
const struct rpc_call_ops *call_ops,
diff --git a/fs/nfs/pnfs_nfs.c b/fs/nfs/pnfs_nfs.c
index 88e061bd711b..b29b50c2c933 100644
--- a/fs/nfs/pnfs_nfs.c
+++ b/fs/nfs/pnfs_nfs.c
@@ -534,7 +534,8 @@ pnfs_generic_commit_pagelist(struct inode *inode, struct list_head *mds_pages,
list_del(&data->list);
if (data->ds_commit_index < 0) {
nfs_init_commit(data, NULL, NULL, cinfo);
- nfs_initiate_commit(NFS_CLIENT(inode), data,
+ nfs_initiate_commit(NFS_SERVER(inode)->nfs_client,
+ NFS_CLIENT(inode), data,
NFS_PROTO(data->inode),
data->mds_ops, how,
RPC_TASK_CRED_NOREF);
diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index 2329cbb0e446..c9cfa1308264 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -1667,7 +1667,9 @@ void nfs_commitdata_release(struct nfs_commit_data *data)
}
EXPORT_SYMBOL_GPL(nfs_commitdata_release);

-int nfs_initiate_commit(struct rpc_clnt *clnt, struct nfs_commit_data *data,
+int nfs_initiate_commit(struct nfs_client *clp,
+ struct rpc_clnt *rpc_clnt,
+ struct nfs_commit_data *data,
const struct nfs_rpc_ops *nfs_ops,
const struct rpc_call_ops *call_ops,
int how, int flags)
@@ -1681,7 +1683,7 @@ int nfs_initiate_commit(struct rpc_clnt *clnt, struct nfs_commit_data *data,
};
struct rpc_task_setup task_setup_data = {
.task = &data->task,
- .rpc_client = clnt,
+ .rpc_client = rpc_clnt,
.rpc_message = &msg,
.callback_ops = call_ops,
.callback_data = data,
@@ -1814,7 +1816,8 @@ nfs_commit_list(struct inode *inode, struct list_head *head, int how,
nfs_init_commit(data, head, NULL, cinfo);
if (NFS_SERVER(inode)->nfs_client->cl_minorversion)
task_flags = RPC_TASK_MOVEABLE;
- return nfs_initiate_commit(NFS_CLIENT(inode), data, NFS_PROTO(inode),
+ return nfs_initiate_commit(NFS_SERVER(inode)->nfs_client,
+ NFS_CLIENT(inode), data, NFS_PROTO(inode),
data->mds_ops, how,
RPC_TASK_CRED_NOREF | task_flags);
}
--
2.44.0


2024-06-07 14:27:03

by Mike Snitzer

[permalink] [raw]
Subject: [for-6.11 PATCH 03/29] nfs: pass descriptor thru nfs_initiate_pgio path

From: Weston Andros Adamson <[email protected]>

This is needed for localio support.

Signed-off-by: Weston Andros Adamson <[email protected]>
Signed-off-by: Peng Tao <[email protected]>
Signed-off-by: Lance Shelton <[email protected]>
Signed-off-by: Trond Myklebust <[email protected]>
Signed-off-by: Mike Snitzer <[email protected]>
---
fs/nfs/blocklayout/blocklayout.c | 6 ++++--
fs/nfs/filelayout/filelayout.c | 10 ++++++----
fs/nfs/flexfilelayout/flexfilelayout.c | 10 ++++++----
fs/nfs/internal.h | 6 +++---
fs/nfs/pagelist.c | 6 ++++--
fs/nfs/pnfs.c | 24 +++++++++++++-----------
fs/nfs/pnfs.h | 6 ++++--
7 files changed, 40 insertions(+), 28 deletions(-)

diff --git a/fs/nfs/blocklayout/blocklayout.c b/fs/nfs/blocklayout/blocklayout.c
index 6be13e0ec170..6a61ddd1835f 100644
--- a/fs/nfs/blocklayout/blocklayout.c
+++ b/fs/nfs/blocklayout/blocklayout.c
@@ -227,7 +227,8 @@ bl_end_par_io_read(void *data)
}

static enum pnfs_try_status
-bl_read_pagelist(struct nfs_pgio_header *header)
+bl_read_pagelist(struct nfs_pageio_descriptor *desc,
+ struct nfs_pgio_header *header)
{
struct pnfs_block_layout *bl = BLK_LSEG2EXT(header->lseg);
struct pnfs_block_dev_map map = { .start = NFS4_MAX_UINT64 };
@@ -372,7 +373,8 @@ static void bl_end_par_io_write(void *data)
}

static enum pnfs_try_status
-bl_write_pagelist(struct nfs_pgio_header *header, int sync)
+bl_write_pagelist(struct nfs_pageio_descriptor *desc,
+ struct nfs_pgio_header *header, int sync)
{
struct pnfs_block_layout *bl = BLK_LSEG2EXT(header->lseg);
struct pnfs_block_dev_map map = { .start = NFS4_MAX_UINT64 };
diff --git a/fs/nfs/filelayout/filelayout.c b/fs/nfs/filelayout/filelayout.c
index 0c4a1fbb6a19..d66f2efbd92f 100644
--- a/fs/nfs/filelayout/filelayout.c
+++ b/fs/nfs/filelayout/filelayout.c
@@ -447,7 +447,8 @@ static const struct rpc_call_ops filelayout_commit_call_ops = {
};

static enum pnfs_try_status
-filelayout_read_pagelist(struct nfs_pgio_header *hdr)
+filelayout_read_pagelist(struct nfs_pageio_descriptor *desc,
+ struct nfs_pgio_header *hdr)
{
struct pnfs_layout_segment *lseg = hdr->lseg;
struct nfs4_pnfs_ds *ds;
@@ -486,7 +487,7 @@ filelayout_read_pagelist(struct nfs_pgio_header *hdr)
hdr->mds_offset = offset;

/* Perform an asynchronous read to ds */
- nfs_initiate_pgio(ds->ds_clp, ds_clnt, hdr, hdr->cred,
+ nfs_initiate_pgio(desc, ds->ds_clp, ds_clnt, hdr, hdr->cred,
NFS_PROTO(hdr->inode), &filelayout_read_call_ops,
0, RPC_TASK_SOFTCONN);
return PNFS_ATTEMPTED;
@@ -494,7 +495,8 @@ filelayout_read_pagelist(struct nfs_pgio_header *hdr)

/* Perform async writes. */
static enum pnfs_try_status
-filelayout_write_pagelist(struct nfs_pgio_header *hdr, int sync)
+filelayout_write_pagelist(struct nfs_pageio_descriptor *desc,
+ struct nfs_pgio_header *hdr, int sync)
{
struct pnfs_layout_segment *lseg = hdr->lseg;
struct nfs4_pnfs_ds *ds;
@@ -528,7 +530,7 @@ filelayout_write_pagelist(struct nfs_pgio_header *hdr, int sync)
hdr->args.offset = filelayout_get_dserver_offset(lseg, offset);

/* Perform an asynchronous write */
- nfs_initiate_pgio(ds->ds_clp, ds_clnt, hdr, hdr->cred,
+ nfs_initiate_pgio(desc, ds->ds_clp, ds_clnt, hdr, hdr->cred,
NFS_PROTO(hdr->inode), &filelayout_write_call_ops,
sync, RPC_TASK_SOFTCONN);
return PNFS_ATTEMPTED;
diff --git a/fs/nfs/flexfilelayout/flexfilelayout.c b/fs/nfs/flexfilelayout/flexfilelayout.c
index dee4bc560b8e..d7e9e5ef4085 100644
--- a/fs/nfs/flexfilelayout/flexfilelayout.c
+++ b/fs/nfs/flexfilelayout/flexfilelayout.c
@@ -1751,7 +1751,8 @@ static const struct rpc_call_ops ff_layout_commit_call_ops_v4 = {
};

static enum pnfs_try_status
-ff_layout_read_pagelist(struct nfs_pgio_header *hdr)
+ff_layout_read_pagelist(struct nfs_pageio_descriptor *desc,
+ struct nfs_pgio_header *hdr)
{
struct pnfs_layout_segment *lseg = hdr->lseg;
struct nfs4_pnfs_ds *ds;
@@ -1803,7 +1804,7 @@ ff_layout_read_pagelist(struct nfs_pgio_header *hdr)
hdr->mds_offset = offset;

/* Perform an asynchronous read to ds */
- nfs_initiate_pgio(ds->ds_clp, ds_clnt, hdr, ds_cred,
+ nfs_initiate_pgio(desc, ds->ds_clp, ds_clnt, hdr, ds_cred,
ds->ds_clp->rpc_ops,
vers == 3 ? &ff_layout_read_call_ops_v3 :
&ff_layout_read_call_ops_v4,
@@ -1822,7 +1823,8 @@ ff_layout_read_pagelist(struct nfs_pgio_header *hdr)

/* Perform async writes. */
static enum pnfs_try_status
-ff_layout_write_pagelist(struct nfs_pgio_header *hdr, int sync)
+ff_layout_write_pagelist(struct nfs_pageio_descriptor *desc,
+ struct nfs_pgio_header *hdr, int sync)
{
struct pnfs_layout_segment *lseg = hdr->lseg;
struct nfs4_pnfs_ds *ds;
@@ -1872,7 +1874,7 @@ ff_layout_write_pagelist(struct nfs_pgio_header *hdr, int sync)
hdr->args.offset = offset;

/* Perform an asynchronous write */
- nfs_initiate_pgio(ds->ds_clp, ds_clnt, hdr, ds_cred,
+ nfs_initiate_pgio(desc, ds->ds_clp, ds_clnt, hdr, ds_cred,
ds->ds_clp->rpc_ops,
vers == 3 ? &ff_layout_write_call_ops_v3 :
&ff_layout_write_call_ops_v4,
diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
index 13c28cae45c5..873c2339b78a 100644
--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -306,9 +306,9 @@ extern const struct nfs_pageio_ops nfs_pgio_rw_ops;
struct nfs_pgio_header *nfs_pgio_header_alloc(const struct nfs_rw_ops *);
void nfs_pgio_header_free(struct nfs_pgio_header *);
int nfs_generic_pgio(struct nfs_pageio_descriptor *, struct nfs_pgio_header *);
-int nfs_initiate_pgio(struct nfs_client *clp, struct rpc_clnt *rpc_clnt,
- struct nfs_pgio_header *hdr, const struct cred *cred,
- const struct nfs_rpc_ops *rpc_ops,
+int nfs_initiate_pgio(struct nfs_pageio_descriptor *, struct nfs_client *clp,
+ struct rpc_clnt *rpc_clnt, struct nfs_pgio_header *hdr,
+ const struct cred *cred, const struct nfs_rpc_ops *rpc_ops,
const struct rpc_call_ops *call_ops, int how, int flags);
void nfs_free_request(struct nfs_page *req);
struct nfs_pgio_mirror *
diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
index d9b795c538cd..3786d767e2ff 100644
--- a/fs/nfs/pagelist.c
+++ b/fs/nfs/pagelist.c
@@ -844,7 +844,8 @@ static void nfs_pgio_prepare(struct rpc_task *task, void *calldata)
rpc_exit(task, err);
}

-int nfs_initiate_pgio(struct nfs_client *clp, struct rpc_clnt *rpc_clnt,
+int nfs_initiate_pgio(struct nfs_pageio_descriptor *desc,
+ struct nfs_client *clp, struct rpc_clnt *rpc_clnt,
struct nfs_pgio_header *hdr, const struct cred *cred,
const struct nfs_rpc_ops *rpc_ops,
const struct rpc_call_ops *call_ops, int how, int flags)
@@ -1071,7 +1072,8 @@ static int nfs_generic_pg_pgios(struct nfs_pageio_descriptor *desc)
if (ret == 0) {
if (NFS_SERVER(hdr->inode)->nfs_client->cl_minorversion)
task_flags = RPC_TASK_MOVEABLE;
- ret = nfs_initiate_pgio(NFS_SERVER(hdr->inode)->nfs_client,
+ ret = nfs_initiate_pgio(desc,
+ NFS_SERVER(hdr->inode)->nfs_client,
NFS_CLIENT(hdr->inode),
hdr,
hdr->cred,
diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index b5834728f31b..c9015179b72c 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -2885,10 +2885,11 @@ pnfs_write_through_mds(struct nfs_pageio_descriptor *desc,
}

static enum pnfs_try_status
-pnfs_try_to_write_data(struct nfs_pgio_header *hdr,
- const struct rpc_call_ops *call_ops,
- struct pnfs_layout_segment *lseg,
- int how)
+pnfs_try_to_write_data(struct nfs_pageio_descriptor *desc,
+ struct nfs_pgio_header *hdr,
+ const struct rpc_call_ops *call_ops,
+ struct pnfs_layout_segment *lseg,
+ int how)
{
struct inode *inode = hdr->inode;
enum pnfs_try_status trypnfs;
@@ -2898,7 +2899,7 @@ pnfs_try_to_write_data(struct nfs_pgio_header *hdr,

dprintk("%s: Writing ino:%lu %u@%llu (how %d)\n", __func__,
inode->i_ino, hdr->args.count, hdr->args.offset, how);
- trypnfs = nfss->pnfs_curr_ld->write_pagelist(hdr, how);
+ trypnfs = nfss->pnfs_curr_ld->write_pagelist(desc, hdr, how);
if (trypnfs != PNFS_NOT_ATTEMPTED)
nfs_inc_stats(inode, NFSIOS_PNFS_WRITE);
dprintk("%s End (trypnfs:%d)\n", __func__, trypnfs);
@@ -2913,7 +2914,7 @@ pnfs_do_write(struct nfs_pageio_descriptor *desc,
struct pnfs_layout_segment *lseg = desc->pg_lseg;
enum pnfs_try_status trypnfs;

- trypnfs = pnfs_try_to_write_data(hdr, call_ops, lseg, how);
+ trypnfs = pnfs_try_to_write_data(desc, hdr, call_ops, lseg, how);
switch (trypnfs) {
case PNFS_NOT_ATTEMPTED:
pnfs_write_through_mds(desc, hdr);
@@ -3012,9 +3013,10 @@ pnfs_read_through_mds(struct nfs_pageio_descriptor *desc,
* Call the appropriate parallel I/O subsystem read function.
*/
static enum pnfs_try_status
-pnfs_try_to_read_data(struct nfs_pgio_header *hdr,
- const struct rpc_call_ops *call_ops,
- struct pnfs_layout_segment *lseg)
+pnfs_try_to_read_data(struct nfs_pageio_descriptor *desc,
+ struct nfs_pgio_header *hdr,
+ const struct rpc_call_ops *call_ops,
+ struct pnfs_layout_segment *lseg)
{
struct inode *inode = hdr->inode;
struct nfs_server *nfss = NFS_SERVER(inode);
@@ -3025,7 +3027,7 @@ pnfs_try_to_read_data(struct nfs_pgio_header *hdr,
dprintk("%s: Reading ino:%lu %u@%llu\n",
__func__, inode->i_ino, hdr->args.count, hdr->args.offset);

- trypnfs = nfss->pnfs_curr_ld->read_pagelist(hdr);
+ trypnfs = nfss->pnfs_curr_ld->read_pagelist(desc, hdr);
if (trypnfs != PNFS_NOT_ATTEMPTED)
nfs_inc_stats(inode, NFSIOS_PNFS_READ);
dprintk("%s End (trypnfs:%d)\n", __func__, trypnfs);
@@ -3058,7 +3060,7 @@ pnfs_do_read(struct nfs_pageio_descriptor *desc, struct nfs_pgio_header *hdr)
struct pnfs_layout_segment *lseg = desc->pg_lseg;
enum pnfs_try_status trypnfs;

- trypnfs = pnfs_try_to_read_data(hdr, call_ops, lseg);
+ trypnfs = pnfs_try_to_read_data(desc, hdr, call_ops, lseg);
switch (trypnfs) {
case PNFS_NOT_ATTEMPTED:
pnfs_read_through_mds(desc, hdr);
diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
index fa5beeaaf5da..92acb837cfa6 100644
--- a/fs/nfs/pnfs.h
+++ b/fs/nfs/pnfs.h
@@ -157,8 +157,10 @@ struct pnfs_layoutdriver_type {
* Return PNFS_ATTEMPTED to indicate the layout code has attempted
* I/O, else return PNFS_NOT_ATTEMPTED to fall back to normal NFS
*/
- enum pnfs_try_status (*read_pagelist)(struct nfs_pgio_header *);
- enum pnfs_try_status (*write_pagelist)(struct nfs_pgio_header *, int);
+ enum pnfs_try_status (*read_pagelist)(struct nfs_pageio_descriptor *,
+ struct nfs_pgio_header *);
+ enum pnfs_try_status (*write_pagelist)(struct nfs_pageio_descriptor *,
+ struct nfs_pgio_header *, int);

void (*free_deviceid_node) (struct nfs4_deviceid_node *);
struct nfs4_deviceid_node * (*alloc_deviceid_node)
--
2.44.0


2024-06-07 14:27:04

by Mike Snitzer

[permalink] [raw]
Subject: [for-6.11 PATCH 04/29] sunrpc: handle NULL req->defer in cache_defer_req

From: Weston Andros Adamson <[email protected]>

Dont crash with a NULL pointer dereference when req->defer isn't
set. This is needed for the localio path.

Signed-off-by: Weston Andros Adamson <[email protected]>
Signed-off-by: Lance Shelton <[email protected]>
Signed-off-by: Trond Myklebust <[email protected]>
Signed-off-by: Mike Snitzer <[email protected]>
---
net/sunrpc/cache.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/net/sunrpc/cache.c b/net/sunrpc/cache.c
index 95ff74706104..b757b891382c 100644
--- a/net/sunrpc/cache.c
+++ b/net/sunrpc/cache.c
@@ -714,6 +714,8 @@ static bool cache_defer_req(struct cache_req *req, struct cache_head *item)
return false;
}

+ if (!req->defer)
+ return false;
dreq = req->defer(req);
if (dreq == NULL)
return false;
--
2.44.0


2024-06-07 14:27:06

by Mike Snitzer

[permalink] [raw]
Subject: [for-6.11 PATCH 05/29] sunrpc: export svc_defer

From: Weston Andros Adamson <[email protected]>

Signed-off-by: Weston Andros Adamson <[email protected]>
Signed-off-by: Lance Shelton <[email protected]>
Signed-off-by: Trond Myklebust <[email protected]>
Signed-off-by: Mike Snitzer <[email protected]>
---
include/linux/sunrpc/svc_xprt.h | 1 +
net/sunrpc/svc_xprt.c | 4 ++--
2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/include/linux/sunrpc/svc_xprt.h b/include/linux/sunrpc/svc_xprt.h
index 0981e35a9fed..5ce68f6586f8 100644
--- a/include/linux/sunrpc/svc_xprt.h
+++ b/include/linux/sunrpc/svc_xprt.h
@@ -159,6 +159,7 @@ int svc_xprt_names(struct svc_serv *serv, char *buf, const int buflen);
void svc_add_new_perm_xprt(struct svc_serv *serv, struct svc_xprt *xprt);
void svc_age_temp_xprts_now(struct svc_serv *, struct sockaddr *);
void svc_xprt_deferred_close(struct svc_xprt *xprt);
+struct cache_deferred_req *svc_defer(struct cache_req *req);

static inline void svc_xprt_get(struct svc_xprt *xprt)
{
diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c
index dd86d7f1e97e..03d3969ca5c2 100644
--- a/net/sunrpc/svc_xprt.c
+++ b/net/sunrpc/svc_xprt.c
@@ -29,7 +29,6 @@ module_param(svc_rpc_per_connection_limit, uint, 0644);

static struct svc_deferred_req *svc_deferred_dequeue(struct svc_xprt *xprt);
static int svc_deferred_recv(struct svc_rqst *rqstp);
-static struct cache_deferred_req *svc_defer(struct cache_req *req);
static void svc_age_temp_xprts(struct timer_list *t);
static void svc_delete_xprt(struct svc_xprt *xprt);

@@ -1185,7 +1184,7 @@ static void svc_revisit(struct cache_deferred_req *dreq, int too_many)
* This code can only handle requests that consist of an xprt-header
* and rpc-header.
*/
-static struct cache_deferred_req *svc_defer(struct cache_req *req)
+struct cache_deferred_req *svc_defer(struct cache_req *req)
{
struct svc_rqst *rqstp = container_of(req, struct svc_rqst, rq_chandle);
struct svc_deferred_req *dr;
@@ -1226,6 +1225,7 @@ static struct cache_deferred_req *svc_defer(struct cache_req *req)
dr->handle.revisit = svc_revisit;
return &dr->handle;
}
+EXPORT_SYMBOL_GPL(svc_defer);

/*
* recv data from a deferred request into an active one
--
2.44.0


2024-06-07 14:27:08

by Mike Snitzer

[permalink] [raw]
Subject: [for-6.11 PATCH 06/29] sunrpc: add rpcauth_map_to_svc_cred

From: Weston Andros Adamson <[email protected]>

Add new funtion rpcauth_map_to_svc_cred which maps a generic rpc_cred to an
svc_cred suitable for use in nfsd.

This is needed by the localio code to map nfs client creds to nfs server
credentials.

Signed-off-by: Weston Andros Adamson <[email protected]>
Signed-off-by: Lance Shelton <[email protected]>
Signed-off-by: Trond Myklebust <[email protected]>
Signed-off-by: Mike Snitzer <[email protected]>
---
include/linux/sunrpc/auth.h | 4 ++++
net/sunrpc/auth.c | 16 ++++++++++++++++
2 files changed, 20 insertions(+)

diff --git a/include/linux/sunrpc/auth.h b/include/linux/sunrpc/auth.h
index 61e58327b1aa..5ebf031361a1 100644
--- a/include/linux/sunrpc/auth.h
+++ b/include/linux/sunrpc/auth.h
@@ -11,6 +11,7 @@
#define _LINUX_SUNRPC_AUTH_H

#include <linux/sunrpc/sched.h>
+#include <linux/sunrpc/svcauth.h>
#include <linux/sunrpc/msg_prot.h>
#include <linux/sunrpc/xdr.h>

@@ -184,6 +185,9 @@ int rpcauth_uptodatecred(struct rpc_task *);
int rpcauth_init_credcache(struct rpc_auth *);
void rpcauth_destroy_credcache(struct rpc_auth *);
void rpcauth_clear_credcache(struct rpc_cred_cache *);
+bool rpcauth_map_to_svc_cred(struct rpc_auth *,
+ const struct cred *,
+ struct svc_cred *);
char * rpcauth_stringify_acceptor(struct rpc_cred *);

static inline
diff --git a/net/sunrpc/auth.c b/net/sunrpc/auth.c
index 04534ea537c8..55a03a3bcac2 100644
--- a/net/sunrpc/auth.c
+++ b/net/sunrpc/auth.c
@@ -308,6 +308,22 @@ rpcauth_init_credcache(struct rpc_auth *auth)
}
EXPORT_SYMBOL_GPL(rpcauth_init_credcache);

+bool
+rpcauth_map_to_svc_cred(struct rpc_auth *auth, const struct cred *cred,
+ struct svc_cred *svc)
+{
+ svc->cr_uid = cred->uid;
+ svc->cr_gid = cred->gid;
+ svc->cr_flavor = auth->au_flavor;
+ svc->cr_principal = NULL;
+ svc->cr_gss_mech = NULL;
+ if (cred->group_info)
+ svc->cr_group_info = get_group_info(cred->group_info);
+
+ return true;
+}
+EXPORT_SYMBOL_GPL(rpcauth_map_to_svc_cred);
+
char *
rpcauth_stringify_acceptor(struct rpc_cred *cred)
{
--
2.44.0


2024-06-07 14:27:10

by Mike Snitzer

[permalink] [raw]
Subject: [for-6.11 PATCH 07/29] sunrpc: add and export rpc_ntop6_addr_noscopeid

From: Peng Tao <[email protected]>

Signed-off-by: Peng Tao <[email protected]>
Signed-off-by: Lance Shelton <[email protected]>
Signed-off-by: Trond Myklebust <[email protected]>
Signed-off-by: Mike Snitzer <[email protected]>
---
include/linux/sunrpc/addr.h | 9 +++++++++
net/sunrpc/addr.c | 19 +++++++++++++------
2 files changed, 22 insertions(+), 6 deletions(-)

diff --git a/include/linux/sunrpc/addr.h b/include/linux/sunrpc/addr.h
index 07d454873b6d..e1007bddc3c4 100644
--- a/include/linux/sunrpc/addr.h
+++ b/include/linux/sunrpc/addr.h
@@ -68,6 +68,9 @@ static inline bool __rpc_copy_addr4(struct sockaddr *dst,
}

#if IS_ENABLED(CONFIG_IPV6)
+extern size_t rpc_ntop6_addr_noscopeid(const struct in6_addr *addr,
+ char *buf, const int buflen);
+
static inline bool rpc_cmp_addr6(const struct sockaddr *sap1,
const struct sockaddr *sap2)
{
@@ -94,6 +97,12 @@ static inline bool __rpc_copy_addr6(struct sockaddr *dst,
return true;
}
#else /* !(IS_ENABLED(CONFIG_IPV6) */
+static size_t rpc_ntop6_addr_noscopeid(const struct in6_addr *addr,
+ char *buf, const int buflen)
+{
+ return 0;
+}
+
static inline bool rpc_cmp_addr6(const struct sockaddr *sap1,
const struct sockaddr *sap2)
{
diff --git a/net/sunrpc/addr.c b/net/sunrpc/addr.c
index 97ff11973c49..78a123a7c39b 100644
--- a/net/sunrpc/addr.c
+++ b/net/sunrpc/addr.c
@@ -25,12 +25,9 @@

#if IS_ENABLED(CONFIG_IPV6)

-static size_t rpc_ntop6_noscopeid(const struct sockaddr *sap,
- char *buf, const int buflen)
+size_t rpc_ntop6_addr_noscopeid(const struct in6_addr *addr,
+ char *buf, const int buflen)
{
- const struct sockaddr_in6 *sin6 = (struct sockaddr_in6 *)sap;
- const struct in6_addr *addr = &sin6->sin6_addr;
-
/*
* RFC 4291, Section 2.2.2
*
@@ -55,13 +52,23 @@ static size_t rpc_ntop6_noscopeid(const struct sockaddr *sap,
*/
if (ipv6_addr_v4mapped(addr))
return snprintf(buf, buflen, "::ffff:%pI4",
- &addr->s6_addr32[3]);
+ &addr->s6_addr32[3]);

/*
* RFC 4291, Section 2.2.1
*/
return snprintf(buf, buflen, "%pI6c", addr);
}
+EXPORT_SYMBOL_GPL(rpc_ntop6_addr_noscopeid);
+
+static size_t rpc_ntop6_noscopeid(const struct sockaddr *sap,
+ char *buf, const int buflen)
+{
+ const struct sockaddr_in6 *sin6 = (struct sockaddr_in6 *)sap;
+ const struct in6_addr *addr = &sin6->sin6_addr;
+
+ return rpc_ntop6_addr_noscopeid(addr, buf, buflen);
+}

static size_t rpc_ntop6(const struct sockaddr *sap,
char *buf, const size_t buflen)
--
2.44.0


2024-06-07 14:27:12

by Mike Snitzer

[permalink] [raw]
Subject: [for-6.11 PATCH 08/29] nfs: move nfs_stat_to_errno to nfs.h

From: Peng Tao <[email protected]>

So that knfsd can use it to map nfs stat to sys errno as well.

Signed-off-by: Peng Tao <[email protected]>
Signed-off-by: Lance Shelton <[email protected]>
Signed-off-by: Trond Myklebust <[email protected]>
Signed-off-by: Mike Snitzer <[email protected]>
---
fs/nfs/nfs2xdr.c | 69 ---------------------------------------------
include/linux/nfs.h | 63 +++++++++++++++++++++++++++++++++++++++++
2 files changed, 63 insertions(+), 69 deletions(-)

diff --git a/fs/nfs/nfs2xdr.c b/fs/nfs/nfs2xdr.c
index c19093814296..f7ef44829f6e 100644
--- a/fs/nfs/nfs2xdr.c
+++ b/fs/nfs/nfs2xdr.c
@@ -27,9 +27,6 @@

#define NFSDBG_FACILITY NFSDBG_XDR

-/* Mapping from NFS error code to "errno" error code. */
-#define errno_NFSERR_IO EIO
-
/*
* Declare the space requirements for NFS arguments and replies as
* number of 32bit-words
@@ -64,8 +61,6 @@
#define NFS_readdirres_sz (1+NFS_pagepad_sz)
#define NFS_statfsres_sz (1+NFS_info_sz)

-static int nfs_stat_to_errno(enum nfs_stat);
-
/*
* Encode/decode NFSv2 basic data types
*
@@ -1054,70 +1049,6 @@ static int nfs2_xdr_dec_statfsres(struct rpc_rqst *req, struct xdr_stream *xdr,
return nfs_stat_to_errno(status);
}

-
-/*
- * We need to translate between nfs status return values and
- * the local errno values which may not be the same.
- */
-static const struct {
- int stat;
- int errno;
-} nfs_errtbl[] = {
- { NFS_OK, 0 },
- { NFSERR_PERM, -EPERM },
- { NFSERR_NOENT, -ENOENT },
- { NFSERR_IO, -errno_NFSERR_IO},
- { NFSERR_NXIO, -ENXIO },
-/* { NFSERR_EAGAIN, -EAGAIN }, */
- { NFSERR_ACCES, -EACCES },
- { NFSERR_EXIST, -EEXIST },
- { NFSERR_XDEV, -EXDEV },
- { NFSERR_NODEV, -ENODEV },
- { NFSERR_NOTDIR, -ENOTDIR },
- { NFSERR_ISDIR, -EISDIR },
- { NFSERR_INVAL, -EINVAL },
- { NFSERR_FBIG, -EFBIG },
- { NFSERR_NOSPC, -ENOSPC },
- { NFSERR_ROFS, -EROFS },
- { NFSERR_MLINK, -EMLINK },
- { NFSERR_NAMETOOLONG, -ENAMETOOLONG },
- { NFSERR_NOTEMPTY, -ENOTEMPTY },
- { NFSERR_DQUOT, -EDQUOT },
- { NFSERR_STALE, -ESTALE },
- { NFSERR_REMOTE, -EREMOTE },
-#ifdef EWFLUSH
- { NFSERR_WFLUSH, -EWFLUSH },
-#endif
- { NFSERR_BADHANDLE, -EBADHANDLE },
- { NFSERR_NOT_SYNC, -ENOTSYNC },
- { NFSERR_BAD_COOKIE, -EBADCOOKIE },
- { NFSERR_NOTSUPP, -ENOTSUPP },
- { NFSERR_TOOSMALL, -ETOOSMALL },
- { NFSERR_SERVERFAULT, -EREMOTEIO },
- { NFSERR_BADTYPE, -EBADTYPE },
- { NFSERR_JUKEBOX, -EJUKEBOX },
- { -1, -EIO }
-};
-
-/**
- * nfs_stat_to_errno - convert an NFS status code to a local errno
- * @status: NFS status code to convert
- *
- * Returns a local errno value, or -EIO if the NFS status code is
- * not recognized. This function is used jointly by NFSv2 and NFSv3.
- */
-static int nfs_stat_to_errno(enum nfs_stat status)
-{
- int i;
-
- for (i = 0; nfs_errtbl[i].stat != -1; i++) {
- if (nfs_errtbl[i].stat == (int)status)
- return nfs_errtbl[i].errno;
- }
- dprintk("NFS: Unrecognized nfs status value: %u\n", status);
- return nfs_errtbl[i].errno;
-}
-
#define PROC(proc, argtype, restype, timer) \
[NFSPROC_##proc] = { \
.p_proc = NFSPROC_##proc, \
diff --git a/include/linux/nfs.h b/include/linux/nfs.h
index ceb70a926b95..b94f51d17bc5 100644
--- a/include/linux/nfs.h
+++ b/include/linux/nfs.h
@@ -10,6 +10,7 @@

#include <linux/sunrpc/msg_prot.h>
#include <linux/string.h>
+#include <linux/errno.h>
#include <linux/crc32.h>
#include <uapi/linux/nfs.h>

@@ -46,6 +47,68 @@ enum nfs3_stable_how {
NFS_INVALID_STABLE_HOW = -1
};

+/*
+ * We need to translate between nfs status return values and
+ * the local errno values which may not be the same.
+ */
+static const struct {
+ int stat;
+ int errno;
+} nfs_common_errtbl[] = {
+ { NFS_OK, 0 },
+ { NFSERR_PERM, -EPERM },
+ { NFSERR_NOENT, -ENOENT },
+ { NFSERR_IO, -EIO },
+ { NFSERR_NXIO, -ENXIO },
+/* { NFSERR_EAGAIN, -EAGAIN }, */
+ { NFSERR_ACCES, -EACCES },
+ { NFSERR_EXIST, -EEXIST },
+ { NFSERR_XDEV, -EXDEV },
+ { NFSERR_NODEV, -ENODEV },
+ { NFSERR_NOTDIR, -ENOTDIR },
+ { NFSERR_ISDIR, -EISDIR },
+ { NFSERR_INVAL, -EINVAL },
+ { NFSERR_FBIG, -EFBIG },
+ { NFSERR_NOSPC, -ENOSPC },
+ { NFSERR_ROFS, -EROFS },
+ { NFSERR_MLINK, -EMLINK },
+ { NFSERR_NAMETOOLONG, -ENAMETOOLONG },
+ { NFSERR_NOTEMPTY, -ENOTEMPTY },
+ { NFSERR_DQUOT, -EDQUOT },
+ { NFSERR_STALE, -ESTALE },
+ { NFSERR_REMOTE, -EREMOTE },
+#ifdef EWFLUSH
+ { NFSERR_WFLUSH, -EWFLUSH },
+#endif
+ { NFSERR_BADHANDLE, -EBADHANDLE },
+ { NFSERR_NOT_SYNC, -ENOTSYNC },
+ { NFSERR_BAD_COOKIE, -EBADCOOKIE },
+ { NFSERR_NOTSUPP, -ENOTSUPP },
+ { NFSERR_TOOSMALL, -ETOOSMALL },
+ { NFSERR_SERVERFAULT, -EREMOTEIO },
+ { NFSERR_BADTYPE, -EBADTYPE },
+ { NFSERR_JUKEBOX, -EJUKEBOX },
+ { -1, -EIO }
+};
+
+/**
+ * nfs_stat_to_errno - convert an NFS status code to a local errno
+ * @status: NFS status code to convert
+ *
+ * Returns a local errno value, or -EIO if the NFS status code is
+ * not recognized. This function is used jointly by NFSv2 and NFSv3.
+ */
+static inline int nfs_stat_to_errno(enum nfs_stat status)
+{
+ int i;
+
+ for (i = 0; nfs_common_errtbl[i].stat != -1; i++) {
+ if (nfs_common_errtbl[i].stat == (int)status)
+ return nfs_common_errtbl[i].errno;
+ }
+ return nfs_common_errtbl[i].errno;
+}
+
#ifdef CONFIG_CRC32
/**
* nfs_fhandle_hash - calculate the crc32 hash for the filehandle
--
2.44.0


2024-06-07 14:27:26

by Mike Snitzer

[permalink] [raw]
Subject: [for-6.11 PATCH 11/29] NFS: Enable localio for non-pNFS I/O

From: Trond Myklebust <[email protected]>

Try a local open of the file we're writing to, and if it succeeds, then
do local I/O.

Signed-off-by: Trond Myklebust <[email protected]>
Signed-off-by: Mike Snitzer <[email protected]>
---
fs/nfs/pagelist.c | 19 ++++++++++---------
fs/nfs/write.c | 7 ++++++-
2 files changed, 16 insertions(+), 10 deletions(-)

diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
index 9210a1821ec9..5890824f6200 100644
--- a/fs/nfs/pagelist.c
+++ b/fs/nfs/pagelist.c
@@ -1066,6 +1066,7 @@ EXPORT_SYMBOL_GPL(nfs_generic_pgio);
static int nfs_generic_pg_pgios(struct nfs_pageio_descriptor *desc)
{
struct nfs_pgio_header *hdr;
+ struct file *filp;
int ret;
unsigned short task_flags = 0;

@@ -1077,18 +1078,18 @@ static int nfs_generic_pg_pgios(struct nfs_pageio_descriptor *desc)
nfs_pgheader_init(desc, hdr, nfs_pgio_header_free);
ret = nfs_generic_pgio(desc, hdr);
if (ret == 0) {
+ struct nfs_client *clp = NFS_SERVER(hdr->inode)->nfs_client;
+
+ filp = nfs_local_file_open(clp, hdr->cred, hdr->args.fh,
+ hdr->args.context);
+
if (NFS_SERVER(hdr->inode)->nfs_client->cl_minorversion)
task_flags = RPC_TASK_MOVEABLE;
- ret = nfs_initiate_pgio(desc,
- NFS_SERVER(hdr->inode)->nfs_client,
- NFS_CLIENT(hdr->inode),
- hdr,
- hdr->cred,
- NFS_PROTO(hdr->inode),
- desc->pg_rpc_callops,
- desc->pg_ioflags,
+ ret = nfs_initiate_pgio(desc, clp, NFS_CLIENT(hdr->inode),
+ hdr, hdr->cred, NFS_PROTO(hdr->inode),
+ desc->pg_rpc_callops, desc->pg_ioflags,
RPC_TASK_CRED_NOREF | task_flags,
- NULL);
+ filp);
}
return ret;
}
diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index ba0b36b15bc1..79375af3f2a6 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -1808,6 +1808,8 @@ nfs_commit_list(struct inode *inode, struct list_head *head, int how,
struct nfs_commit_info *cinfo)
{
struct nfs_commit_data *data;
+ struct nfs_client *clp = NFS_SERVER(inode)->nfs_client;
+ struct file *filp;
unsigned short task_flags = 0;

/* another commit raced with us */
@@ -1824,10 +1826,13 @@ nfs_commit_list(struct inode *inode, struct list_head *head, int how,
nfs_init_commit(data, head, NULL, cinfo);
if (NFS_SERVER(inode)->nfs_client->cl_minorversion)
task_flags = RPC_TASK_MOVEABLE;
+
+ filp = nfs_local_file_open(clp, data->cred, data->args.fh,
+ data->context);
return nfs_initiate_commit(NFS_SERVER(inode)->nfs_client,
NFS_CLIENT(inode), data, NFS_PROTO(inode),
data->mds_ops, how,
- RPC_TASK_CRED_NOREF | task_flags, NULL);
+ RPC_TASK_CRED_NOREF | task_flags, filp);
}

/*
--
2.44.0


2024-06-07 14:27:32

by Mike Snitzer

[permalink] [raw]
Subject: [for-6.11 PATCH 12/29] nfs/flexfiles: check local DS when making DS connections

From: Peng Tao <[email protected]>

Do this by creating DS connection and check local IP address.
If it matches DS address (ignoring port), mark mirror_ds->local_ds
as true so that later we know if local DS IO is possible.

Signed-off-by: Peng Tao <[email protected]>
Signed-off-by: Lance Shelton <[email protected]>
Signed-off-by: Trond Myklebust <[email protected]>
Signed-off-by: Mike Snitzer <[email protected]>
---
fs/nfs/flexfilelayout/flexfilelayoutdev.c | 25 +++++++++++++++++++++++
1 file changed, 25 insertions(+)

diff --git a/fs/nfs/flexfilelayout/flexfilelayoutdev.c b/fs/nfs/flexfilelayout/flexfilelayoutdev.c
index e028f5a0ef5f..af329d9b7d1e 100644
--- a/fs/nfs/flexfilelayout/flexfilelayoutdev.c
+++ b/fs/nfs/flexfilelayout/flexfilelayoutdev.c
@@ -348,6 +348,22 @@ ff_layout_init_mirror_ds(struct pnfs_layout_hdr *lo,
return false;
}

+static bool ff_layout_ds_is_local(struct nfs4_pnfs_ds *ds)
+{
+ struct nfs_local_addr *addr;
+ struct sockaddr *sap;
+ struct nfs4_pnfs_ds_addr *da;
+
+ list_for_each_entry(da, &ds->ds_addrs, da_node) {
+ sap = (struct sockaddr *)&da->da_addr;
+ list_for_each_entry(addr, &ds->ds_clp->cl_local_addrs, cl_addrs)
+ if (rpc_cmp_addr((struct sockaddr *)&addr->address, sap))
+ return true;
+ }
+
+ return false;
+}
+
/**
* nfs4_ff_layout_prepare_ds - prepare a DS connection for an RPC call
* @lseg: the layout segment we're operating on
@@ -395,6 +411,15 @@ nfs4_ff_layout_prepare_ds(struct pnfs_layout_segment *lseg,

/* connect success, check rsize/wsize limit */
if (!status) {
+ /*
+ * ds_clp is put in destroy_ds().
+ * keep ds_clp even if DS is local, so that if local IO cannot
+ * proceed somehow, we can fall back to NFS whenever we want.
+ */
+ if (ff_layout_ds_is_local(ds)) {
+ dprintk("%s: found local DS\n", __func__);
+ nfs_local_enable(ds->ds_clp);
+ }
max_payload =
nfs_block_size(rpc_max_payload(ds->ds_clp->cl_rpcclient),
NULL);
--
2.44.0


2024-06-07 14:27:36

by Mike Snitzer

[permalink] [raw]
Subject: [for-6.11 PATCH 13/29] pnfs/flexfiles: Enable localio for flexfiles I/O

From: Trond Myklebust <[email protected]>

If the DS is local to this client, then we might be able to use local
I/O to write the data.

[snitm: rebase accounted for commit 0ede61d8589cc ("file: convert to SLAB_TYPESAFE_BY_RCU")]

Signed-off-by: Trond Myklebust <[email protected]>
Signed-off-by: Mike Snitzer <[email protected]>
---
fs/nfs/flexfilelayout/flexfilelayout.c | 109 +++++++++++++++++++++++--
fs/nfs/flexfilelayout/flexfilelayout.h | 2 +
2 files changed, 104 insertions(+), 7 deletions(-)

diff --git a/fs/nfs/flexfilelayout/flexfilelayout.c b/fs/nfs/flexfilelayout/flexfilelayout.c
index ce6cb5d82427..0a9eccb44085 100644
--- a/fs/nfs/flexfilelayout/flexfilelayout.c
+++ b/fs/nfs/flexfilelayout/flexfilelayout.c
@@ -11,6 +11,7 @@
#include <linux/nfs_mount.h>
#include <linux/nfs_page.h>
#include <linux/module.h>
+#include <linux/file.h>
#include <linux/sched/mm.h>

#include <linux/sunrpc/metrics.h>
@@ -162,6 +163,52 @@ decode_name(struct xdr_stream *xdr, u32 *id)
return 0;
}

+static struct file *
+ff_local_open_fh(struct pnfs_layout_segment *lseg,
+ u32 ds_idx,
+ struct nfs_client *clp,
+ const struct cred *cred,
+ struct nfs_fh *fh,
+ fmode_t mode)
+{
+ struct nfs4_ff_layout_mirror *mirror = FF_LAYOUT_COMP(lseg, ds_idx);
+ struct file *filp, *new, __rcu **pfile;
+
+ if (!nfs_server_is_local(clp))
+ return NULL;
+ if (mode & FMODE_WRITE) {
+ /*
+ * Always request read and write access since this corresponds
+ * to a rw layout.
+ */
+ mode |= FMODE_READ;
+ pfile = &mirror->rw_file;
+ } else
+ pfile = &mirror->ro_file;
+
+ new = NULL;
+ rcu_read_lock();
+ filp = rcu_dereference(*pfile);
+ if (!filp) {
+ rcu_read_unlock();
+ new = nfs_local_open_fh(clp, cred, fh, mode);
+ if (IS_ERR(new))
+ return NULL;
+ rcu_read_lock();
+ /* try to swap in the pointer */
+ filp = cmpxchg(pfile, NULL, new);
+ if (!filp) {
+ filp = new;
+ new = NULL;
+ }
+ }
+ filp = get_file_rcu(&filp);
+ rcu_read_unlock();
+ if (new)
+ fput(new);
+ return filp;
+}
+
static bool ff_mirror_match_fh(const struct nfs4_ff_layout_mirror *m1,
const struct nfs4_ff_layout_mirror *m2)
{
@@ -237,8 +284,15 @@ static struct nfs4_ff_layout_mirror *ff_layout_alloc_mirror(gfp_t gfp_flags)

static void ff_layout_free_mirror(struct nfs4_ff_layout_mirror *mirror)
{
+ struct file *filp;
const struct cred *cred;

+ filp = rcu_access_pointer(mirror->ro_file);
+ if (filp)
+ fput(filp);
+ filp = rcu_access_pointer(mirror->rw_file);
+ if (filp)
+ fput(filp);
ff_layout_remove_mirror(mirror);
kfree(mirror->fh_versions);
cred = rcu_access_pointer(mirror->ro_cred);
@@ -414,6 +468,7 @@ ff_layout_alloc_lseg(struct pnfs_layout_hdr *lh,
struct nfs4_ff_layout_mirror *mirror;
struct cred *kcred;
const struct cred __rcu *cred;
+ const struct cred __rcu *old;
kuid_t uid;
kgid_t gid;
u32 ds_count, fh_count, id;
@@ -513,13 +568,26 @@ ff_layout_alloc_lseg(struct pnfs_layout_hdr *lh,

mirror = ff_layout_add_mirror(lh, fls->mirror_array[i]);
if (mirror != fls->mirror_array[i]) {
+ struct file *filp;
+
/* swap cred ptrs so free_mirror will clean up old */
if (lgr->range.iomode == IOMODE_READ) {
- cred = xchg(&mirror->ro_cred, cred);
- rcu_assign_pointer(fls->mirror_array[i]->ro_cred, cred);
+ old = xchg(&mirror->ro_cred, cred);
+ rcu_assign_pointer(fls->mirror_array[i]->ro_cred, old);
+ /* drop file if creds changed */
+ if (old != cred) {
+ filp = rcu_dereference_protected(xchg(&mirror->ro_file, NULL), 1);
+ if (filp)
+ fput(filp);
+ }
} else {
- cred = xchg(&mirror->rw_cred, cred);
- rcu_assign_pointer(fls->mirror_array[i]->rw_cred, cred);
+ old = xchg(&mirror->rw_cred, cred);
+ rcu_assign_pointer(fls->mirror_array[i]->rw_cred, old);
+ if (old != cred) {
+ filp = rcu_dereference_protected(xchg(&mirror->rw_file, NULL), 1);
+ if (filp)
+ fput(filp);
+ }
}
ff_layout_free_mirror(fls->mirror_array[i]);
fls->mirror_array[i] = mirror;
@@ -1757,6 +1825,7 @@ ff_layout_read_pagelist(struct nfs_pageio_descriptor *desc,
struct pnfs_layout_segment *lseg = hdr->lseg;
struct nfs4_pnfs_ds *ds;
struct rpc_clnt *ds_clnt;
+ struct file *filp;
struct nfs4_ff_layout_mirror *mirror;
const struct cred *ds_cred;
loff_t offset = hdr->args.offset;
@@ -1803,12 +1872,20 @@ ff_layout_read_pagelist(struct nfs_pageio_descriptor *desc,
hdr->args.offset = offset;
hdr->mds_offset = offset;

+ /* Start IO accounting for local read */
+ filp = ff_local_open_fh(lseg, idx, ds->ds_clp, ds_cred, fh,
+ FMODE_READ);
+ if (filp) {
+ hdr->task.tk_start = ktime_get();
+ ff_layout_read_record_layoutstats_start(&hdr->task, hdr);
+ }
+
/* Perform an asynchronous read to ds */
nfs_initiate_pgio(desc, ds->ds_clp, ds_clnt, hdr, ds_cred,
ds->ds_clp->rpc_ops,
vers == 3 ? &ff_layout_read_call_ops_v3 :
&ff_layout_read_call_ops_v4,
- 0, RPC_TASK_SOFTCONN, NULL);
+ 0, RPC_TASK_SOFTCONN, filp);
put_cred(ds_cred);
return PNFS_ATTEMPTED;

@@ -1829,6 +1906,7 @@ ff_layout_write_pagelist(struct nfs_pageio_descriptor *desc,
struct pnfs_layout_segment *lseg = hdr->lseg;
struct nfs4_pnfs_ds *ds;
struct rpc_clnt *ds_clnt;
+ struct file *filp;
struct nfs4_ff_layout_mirror *mirror;
const struct cred *ds_cred;
loff_t offset = hdr->args.offset;
@@ -1873,12 +1951,20 @@ ff_layout_write_pagelist(struct nfs_pageio_descriptor *desc,
*/
hdr->args.offset = offset;

+ /* Start IO accounting for local write */
+ filp = ff_local_open_fh(lseg, idx, ds->ds_clp, ds_cred, fh,
+ FMODE_READ|FMODE_WRITE);
+ if (filp) {
+ hdr->task.tk_start = ktime_get();
+ ff_layout_write_record_layoutstats_start(&hdr->task, hdr);
+ }
+
/* Perform an asynchronous write */
nfs_initiate_pgio(desc, ds->ds_clp, ds_clnt, hdr, ds_cred,
ds->ds_clp->rpc_ops,
vers == 3 ? &ff_layout_write_call_ops_v3 :
&ff_layout_write_call_ops_v4,
- sync, RPC_TASK_SOFTCONN, NULL);
+ sync, RPC_TASK_SOFTCONN, filp);
put_cred(ds_cred);
return PNFS_ATTEMPTED;

@@ -1912,6 +1998,7 @@ static int ff_layout_initiate_commit(struct nfs_commit_data *data, int how)
struct pnfs_layout_segment *lseg = data->lseg;
struct nfs4_pnfs_ds *ds;
struct rpc_clnt *ds_clnt;
+ struct file *filp;
struct nfs4_ff_layout_mirror *mirror;
const struct cred *ds_cred;
u32 idx;
@@ -1950,10 +2037,18 @@ static int ff_layout_initiate_commit(struct nfs_commit_data *data, int how)
if (fh)
data->args.fh = fh;

+ /* Start IO accounting for local commit */
+ filp = ff_local_open_fh(lseg, idx, ds->ds_clp, ds_cred, fh,
+ FMODE_READ|FMODE_WRITE);
+ if (filp) {
+ data->task.tk_start = ktime_get();
+ ff_layout_commit_record_layoutstats_start(&data->task, data);
+ }
+
ret = nfs_initiate_commit(ds->ds_clp, ds_clnt, data, ds->ds_clp->rpc_ops,
vers == 3 ? &ff_layout_commit_call_ops_v3 :
&ff_layout_commit_call_ops_v4,
- how, RPC_TASK_SOFTCONN, NULL);
+ how, RPC_TASK_SOFTCONN, filp);
put_cred(ds_cred);
return ret;
out_err:
diff --git a/fs/nfs/flexfilelayout/flexfilelayout.h b/fs/nfs/flexfilelayout/flexfilelayout.h
index f84b3fb0dddd..8e042df5a2c9 100644
--- a/fs/nfs/flexfilelayout/flexfilelayout.h
+++ b/fs/nfs/flexfilelayout/flexfilelayout.h
@@ -82,7 +82,9 @@ struct nfs4_ff_layout_mirror {
struct nfs_fh *fh_versions;
nfs4_stateid stateid;
const struct cred __rcu *ro_cred;
+ struct file __rcu *ro_file;
const struct cred __rcu *rw_cred;
+ struct file __rcu *rw_file;
refcount_t ref;
spinlock_t lock;
unsigned long flags;
--
2.44.0


2024-06-07 14:27:50

by Mike Snitzer

[permalink] [raw]
Subject: [for-6.11 PATCH 14/29] NFS: Add tracepoints for nfs_local_enable and nfs_local_disable

From: Trond Myklebust <[email protected]>

Allow tracing of when local I/O begins and ends.

Signed-off-by: Trond Myklebust <[email protected]>
Signed-off-by: Mike Snitzer <[email protected]>
---
fs/nfs/localio.c | 5 ++---
fs/nfs/nfstrace.h | 32 ++++++++++++++++++++++++++++++++
2 files changed, 34 insertions(+), 3 deletions(-)

diff --git a/fs/nfs/localio.c b/fs/nfs/localio.c
index 5c69eb0fe7b6..5939ca2216be 100644
--- a/fs/nfs/localio.c
+++ b/fs/nfs/localio.c
@@ -195,7 +195,6 @@ nfs_local_put_lookup_ctx(void)
spin_unlock(&ctx->lock);
if (fn)
symbol_put(nfsd_open_local_fh);
- dprintk("destroy lookup context\n");
}
}

@@ -206,8 +205,8 @@ void
nfs_local_enable(struct nfs_client *clp)
{
if (nfs_local_get_lookup_ctx()) {
- dprintk("enabled local i/o\n");
set_bit(NFS_CS_LOCAL_IO, &clp->cl_flags);
+ trace_nfs_local_enable(clp);
}
}
EXPORT_SYMBOL_GPL(nfs_local_enable);
@@ -219,7 +218,7 @@ void
nfs_local_disable(struct nfs_client *clp)
{
if (test_and_clear_bit(NFS_CS_LOCAL_IO, &clp->cl_flags)) {
- dprintk("disabled local i/o\n");
+ trace_nfs_local_disable(clp);
nfs_local_put_lookup_ctx();
}
}
diff --git a/fs/nfs/nfstrace.h b/fs/nfs/nfstrace.h
index 45d4086cdeb1..95a2c19a9172 100644
--- a/fs/nfs/nfstrace.h
+++ b/fs/nfs/nfstrace.h
@@ -1710,6 +1710,38 @@ TRACE_EVENT(nfs_local_open_fh,
)
);

+DECLARE_EVENT_CLASS(nfs_local_client_event,
+ TP_PROTO(
+ const struct nfs_client *clp
+ ),
+
+ TP_ARGS(clp),
+
+ TP_STRUCT__entry(
+ __field(unsigned int, protocol)
+ __string(server, clp->cl_hostname)
+ ),
+
+ TP_fast_assign(
+ __entry->protocol = clp->rpc_ops->version;
+ __assign_str(server);
+ ),
+
+ TP_printk(
+ "server=%s NFSv%u", __get_str(server), __entry->protocol
+ )
+);
+
+#define DEFINE_NFS_LOCAL_CLIENT_EVENT(name) \
+ DEFINE_EVENT(nfs_local_client_event, name, \
+ TP_PROTO( \
+ const struct nfs_client *clp \
+ ), \
+ TP_ARGS(clp))
+
+DEFINE_NFS_LOCAL_CLIENT_EVENT(nfs_local_enable);
+DEFINE_NFS_LOCAL_CLIENT_EVENT(nfs_local_disable);
+
DECLARE_EVENT_CLASS(nfs_xdr_event,
TP_PROTO(
const struct xdr_stream *xdr,
--
2.44.0


2024-06-07 14:28:06

by Mike Snitzer

[permalink] [raw]
Subject: [for-6.11 PATCH 16/29] NFS: Don't call filesystem read() routines directly

From: Trond Myklebust <[email protected]>

In order to avoid issues with stack overflow, just call the read
routines from a workqueue job.

Signed-off-by: Trond Myklebust <[email protected]>
Signed-off-by: Mike Snitzer <[email protected]>
---
fs/nfs/localio.c | 39 +++++++++++++++++++++++++++++----------
1 file changed, 29 insertions(+), 10 deletions(-)

diff --git a/fs/nfs/localio.c b/fs/nfs/localio.c
index 2c6811b20dcf..d997f0a96627 100644
--- a/fs/nfs/localio.c
+++ b/fs/nfs/localio.c
@@ -465,21 +465,38 @@ nfs_local_read_aio_complete(struct kiocb *kiocb, long ret)
nfs_local_pgio_complete(iocb);
}

-static int
-nfs_do_local_read(struct nfs_pgio_header *hdr, struct file *filp,
- const struct rpc_call_ops *call_ops)
+static void nfs_local_call_read(struct work_struct *work)
{
- struct nfs_local_kiocb *iocb;
+ struct nfs_local_io_args *args =
+ container_of(work, struct nfs_local_io_args, work);
+ struct nfs_local_kiocb *iocb = args->iocb;
+ struct file *filp = iocb->kiocb.ki_filp;
struct iov_iter iter;
ssize_t status;

+ nfs_local_iter_init(&iter, iocb, READ);
+
+ status = filp->f_op->read_iter(&iocb->kiocb, &iter);
+ if (status != -EIOCBQUEUED) {
+ nfs_local_read_done(iocb, status);
+ nfs_local_pgio_release(iocb);
+ }
+ complete(args->done);
+}
+
+static int nfs_do_local_read(struct nfs_pgio_header *hdr, struct file *filp,
+ const struct rpc_call_ops *call_ops)
+{
+ struct nfs_local_io_args args;
+ DECLARE_COMPLETION_ONSTACK(done);
+ struct nfs_local_kiocb *iocb;
+
dprintk("%s: vfs_read count=%u pos=%llu\n",
__func__, hdr->args.count, hdr->args.offset);

iocb = nfs_local_iocb_alloc(hdr, filp, GFP_KERNEL);
if (iocb == NULL)
return -ENOMEM;
- nfs_local_iter_init(&iter, iocb, READ);

nfs_local_pgio_init(hdr, call_ops);
hdr->res.eof = false;
@@ -489,11 +506,13 @@ nfs_do_local_read(struct nfs_pgio_header *hdr, struct file *filp,
iocb->kiocb.ki_complete = nfs_local_read_aio_complete;
}

- status = filp->f_op->read_iter(&iocb->kiocb, &iter);
- if (status != -EIOCBQUEUED) {
- nfs_local_read_done(iocb, status);
- nfs_local_pgio_release(iocb);
- }
+ args.iocb = iocb;
+ args.done = &done;
+ INIT_WORK_ONSTACK(&args.work, nfs_local_call_read);
+
+ queue_work(nfsiod_workqueue, &args.work);
+ wait_for_completion(&done);
+ destroy_work_on_stack(&args.work);
return 0;
}

--
2.44.0


2024-06-07 14:28:11

by Mike Snitzer

[permalink] [raw]
Subject: [for-6.11 PATCH 19/29] nfs/write: fix nfs_initiate_commit to return error from nfs_local_commit

status was established in nfs_initiate_commit when nfs_local_commit
was introduced, but it was never actually used to return any error
from nfs_local_commit.

Signed-off-by: Mike Snitzer <[email protected]>
---
fs/nfs/write.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index 79375af3f2a6..7deda7e90d22 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -1704,7 +1704,7 @@ int nfs_initiate_commit(struct nfs_client *clp,
dprintk("NFS: initiated commit call\n");

if (localio) {
- nfs_local_commit(clp, localio, data, call_ops, how);
+ status = nfs_local_commit(clp, localio, data, call_ops, how);
goto out;
}

--
2.44.0


2024-06-07 14:28:13

by Mike Snitzer

[permalink] [raw]
Subject: [for-6.11 PATCH 21/29] nfs_common: add NFS v3 LOCALIO protocol extension enablement

First use is in nfsd, to add access to a global nfsd_uuids list that
will be used to identify local nfsd instances.

nfsd_uuids is protected by nfsd_mutex or RCU read lock. List is
composed of nfsd_uuid_t instances that are managed as nfsd creates
them (per network namespace).

nfsd_uuid_is_local() will be used to search all local nfsd for the
client specified nfsd uuid.

Also, nfs and nfsd will only build their respective localio.c if
NFS_V3_LOCALIO and/or NFSD_V3_LOCALIO are enabled.

Signed-off-by: Mike Snitzer <[email protected]>
---
fs/Kconfig | 3 +++
fs/nfs/Kconfig | 13 ++++++++++++
fs/nfs/Makefile | 3 ++-
fs/nfs/internal.h | 38 ++++++++++++++++++++++++++++++++++
fs/nfs_common/Makefile | 3 +++
fs/nfs_common/nfslocalio.c | 42 ++++++++++++++++++++++++++++++++++++++
fs/nfsd/Kconfig | 13 ++++++++++++
fs/nfsd/Makefile | 3 ++-
fs/nfsd/netns.h | 4 ++++
fs/nfsd/nfssvc.c | 11 ++++++++++
include/linux/nfslocalio.h | 29 ++++++++++++++++++++++++++
11 files changed, 160 insertions(+), 2 deletions(-)
create mode 100644 fs/nfs_common/nfslocalio.c
create mode 100644 include/linux/nfslocalio.h

diff --git a/fs/Kconfig b/fs/Kconfig
index a46b0cbc4d8f..9864a738ccae 100644
--- a/fs/Kconfig
+++ b/fs/Kconfig
@@ -377,6 +377,9 @@ config NFS_ACL_SUPPORT
tristate
select FS_POSIX_ACL

+config NFS_LOCALIO_SUPPORT
+ tristate
+
config NFS_COMMON
bool
depends on NFSD || NFS_FS || LOCKD
diff --git a/fs/nfs/Kconfig b/fs/nfs/Kconfig
index 57249f040dfc..db8c9d6edcea 100644
--- a/fs/nfs/Kconfig
+++ b/fs/nfs/Kconfig
@@ -5,6 +5,7 @@ config NFS_FS
select LOCKD
select SUNRPC
select NFS_ACL_SUPPORT if NFS_V3_ACL
+ select NFS_LOCALIO_SUPPORT if NFS_V3_LOCALIO
help
Choose Y here if you want to access files residing on other
computers using Sun's Network File System protocol. To compile
@@ -72,6 +73,18 @@ config NFS_V3_ACL

If unsure, say N.

+config NFS_V3_LOCALIO
+ bool "NFS client support for the NFSv3 LOCALIO protocol extension"
+ depends on NFS_V3
+ help
+ Some NFS servers support an auxiliary NFSv3 LOCALIO protocol
+ that is not an official part of the NFS version 3 protocol.
+
+ This option enables support for version 3 of the LOCALIO
+ protocol in the kernel's NFS client.
+
+ If unsure, say N.
+
config NFS_V4
tristate "NFS client support for NFS version 4"
depends on NFS_FS
diff --git a/fs/nfs/Makefile b/fs/nfs/Makefile
index af64cf5ea420..7fed1ce375da 100644
--- a/fs/nfs/Makefile
+++ b/fs/nfs/Makefile
@@ -9,10 +9,11 @@ CFLAGS_nfstrace.o += -I$(src)
nfs-y := client.o dir.o file.o getroot.o inode.o super.o \
io.o direct.o pagelist.o read.o symlink.o unlink.o \
write.o namespace.o mount_clnt.o nfstrace.o \
- export.o sysfs.o fs_context.o localio.o
+ export.o sysfs.o fs_context.o
nfs-$(CONFIG_ROOT_NFS) += nfsroot.o
nfs-$(CONFIG_SYSCTL) += sysctl.o
nfs-$(CONFIG_NFS_FSCACHE) += fscache.o
+nfs-$(CONFIG_NFS_V3_LOCALIO) += localio.o

obj-$(CONFIG_NFS_V2) += nfsv2.o
nfsv2-y := nfs2super.o proc.o nfs2xdr.o
diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
index 6d75466ad356..9f81a94e798c 100644
--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -452,6 +452,7 @@ extern void nfs_set_cache_invalid(struct inode *inode, unsigned long flags);
extern bool nfs_check_cache_invalid(struct inode *, unsigned long);
extern int nfs_wait_bit_killable(struct wait_bit_key *key, int mode);

+#if defined(CONFIG_NFS_V3_LOCALIO)
/* localio.c */
extern void nfs_local_init(void);
extern void nfs_local_enable(struct nfs_client *);
@@ -471,6 +472,43 @@ extern int nfs_local_commit(struct nfs_client *, struct file *,
const struct rpc_call_ops *, int);
extern bool nfs_server_is_local(const struct nfs_client *clp);

+#else
+static inline void nfs_local_init(void) {}
+static inline void nfs_local_enable(struct nfs_client *clp) {}
+static inline void nfs_local_disable(struct nfs_client *clp) {}
+static inline void nfs_local_probe(struct nfs_client *clp) {}
+static inline struct file *nfs_local_open_fh(struct nfs_client *clp,
+ const struct cred *cred,
+ struct nfs_fh *fh,
+ const fmode_t mode)
+{
+ return ERR_PTR(-EINVAL);
+}
+static inline struct file *nfs_local_file_open(struct nfs_client *clp,
+ const struct cred *cred,
+ struct nfs_fh *fh,
+ struct nfs_open_context *ctx)
+{
+ return NULL;
+}
+static inline int nfs_local_doio(struct nfs_client *clp, struct file *filep,
+ struct nfs_pgio_header *hdr,
+ const struct rpc_call_ops *call_ops)
+{
+ return -EINVAL;
+}
+static inline int nfs_local_commit(struct nfs_client *clp, struct file *filep,
+ struct nfs_commit_data *data,
+ const struct rpc_call_ops *call_ops, int how)
+{
+ return -EINVAL;
+}
+static inline bool nfs_server_is_local(const struct nfs_client *clp)
+{
+ return false;
+}
+#endif /* CONFIG_NFS_V3_LOCALIO */
+
/* super.c */
extern const struct super_operations nfs_sops;
bool nfs_auth_info_match(const struct nfs_auth_info *, rpc_authflavor_t);
diff --git a/fs/nfs_common/Makefile b/fs/nfs_common/Makefile
index 119c75ab9fd0..c566cca92a1b 100644
--- a/fs/nfs_common/Makefile
+++ b/fs/nfs_common/Makefile
@@ -6,5 +6,8 @@
obj-$(CONFIG_NFS_ACL_SUPPORT) += nfs_acl.o
nfs_acl-objs := nfsacl.o

+obj-$(CONFIG_NFS_LOCALIO_SUPPORT) += nfs_localio.o
+nfs_localio-objs := nfslocalio.o
+
obj-$(CONFIG_GRACE_PERIOD) += grace.o
obj-$(CONFIG_NFS_V4_2_SSC_HELPER) += nfs_ssc.o
diff --git a/fs/nfs_common/nfslocalio.c b/fs/nfs_common/nfslocalio.c
new file mode 100644
index 000000000000..f214cc6754a1
--- /dev/null
+++ b/fs/nfs_common/nfslocalio.c
@@ -0,0 +1,42 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2024 Mike Snitzer <[email protected]>
+ */
+
+#include <linux/module.h>
+#include <linux/rculist.h>
+#include <linux/nfslocalio.h>
+
+MODULE_LICENSE("GPL");
+
+/*
+ * Global list of nfsd_uuid_t instances, add/remove
+ * is protected by fs/nfsd/nfssvc.c:nfsd_mutex.
+ * Reads are protected RCU read lock (see below).
+ */
+LIST_HEAD(nfsd_uuids);
+EXPORT_SYMBOL(nfsd_uuids);
+
+/* Must be called with RCU read lock held. */
+static const uuid_t * nfsd_uuid_lookup(const uuid_t *uuid)
+{
+ nfsd_uuid_t *nfsd_uuid;
+
+ list_for_each_entry_rcu(nfsd_uuid, &nfsd_uuids, list)
+ if (uuid_equal(&nfsd_uuid->uuid, uuid))
+ return &nfsd_uuid->uuid;
+
+ return &uuid_null;
+}
+
+bool nfsd_uuid_is_local(const uuid_t *uuid)
+{
+ const uuid_t *nfsd_uuid;
+
+ rcu_read_lock();
+ nfsd_uuid = nfsd_uuid_lookup(uuid);
+ rcu_read_unlock();
+
+ return !uuid_is_null(nfsd_uuid);
+}
+EXPORT_SYMBOL_GPL(nfsd_uuid_is_local);
diff --git a/fs/nfsd/Kconfig b/fs/nfsd/Kconfig
index 272ab8d5c4d7..c8eb7e2d4006 100644
--- a/fs/nfsd/Kconfig
+++ b/fs/nfsd/Kconfig
@@ -9,6 +9,7 @@ config NFSD
select EXPORTFS
select NFS_ACL_SUPPORT if NFSD_V2_ACL
select NFS_ACL_SUPPORT if NFSD_V3_ACL
+ select NFS_LOCALIO_SUPPORT if NFSD_V3_LOCALIO
depends on MULTIUSER
help
Choose Y here if you want to allow other computers to access
@@ -69,6 +70,18 @@ config NFSD_V3_ACL

If unsure, say N.

+config NFSD_V3_LOCALIO
+ bool "NFS server support for the NFSv3 LOCALIO protocol extension"
+ depends on NFSD
+ help
+ Some NFS servers support an auxiliary NFSv3 LOCALIO protocol
+ that is not an official part of the NFS version 3 protocol.
+
+ This option enables support for version 3 of the LOCALIO
+ protocol in the kernel's NFS server.
+
+ If unsure, say N.
+
config NFSD_V4
bool "NFS server support for NFS version 4"
depends on NFSD && PROC_FS
diff --git a/fs/nfsd/Makefile b/fs/nfsd/Makefile
index 702f277394f1..0e01749f6153 100644
--- a/fs/nfsd/Makefile
+++ b/fs/nfsd/Makefile
@@ -13,7 +13,7 @@ nfsd-y += trace.o
nfsd-y += nfssvc.o nfsctl.o nfsfh.o vfs.o \
export.o auth.o lockd.o nfscache.o \
stats.o filecache.o nfs3proc.o nfs3xdr.o \
- netlink.o localio.o
+ netlink.o
nfsd-$(CONFIG_NFSD_V2) += nfsproc.o nfsxdr.o
nfsd-$(CONFIG_NFSD_V2_ACL) += nfs2acl.o
nfsd-$(CONFIG_NFSD_V3_ACL) += nfs3acl.o
@@ -23,3 +23,4 @@ nfsd-$(CONFIG_NFSD_PNFS) += nfs4layouts.o
nfsd-$(CONFIG_NFSD_BLOCKLAYOUT) += blocklayout.o blocklayoutxdr.o
nfsd-$(CONFIG_NFSD_SCSILAYOUT) += blocklayout.o blocklayoutxdr.o
nfsd-$(CONFIG_NFSD_FLEXFILELAYOUT) += flexfilelayout.o flexfilelayoutxdr.o
+nfsd-$(CONFIG_NFSD_V3_LOCALIO) += localio.o
diff --git a/fs/nfsd/netns.h b/fs/nfsd/netns.h
index 14ec15656320..5c5f7030ad87 100644
--- a/fs/nfsd/netns.h
+++ b/fs/nfsd/netns.h
@@ -15,6 +15,7 @@
#include <linux/percpu_counter.h>
#include <linux/siphash.h>
#include <linux/sunrpc/stats.h>
+#include <linux/nfslocalio.h>

/* Hash tables for nfs4_clientid state */
#define CLIENT_HASH_BITS 4
@@ -213,6 +214,9 @@ struct nfsd_net {
/* last time an admin-revoke happened for NFSv4.0 */
time64_t nfs40_last_revoke;

+#if defined(CONFIG_NFSD_V3_LOCALIO)
+ nfsd_uuid_t nfsd_uuid;
+#endif
};

/* Simple check to find out if a given net was properly initialized */
diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
index cd9a6a1a9fc8..122cfa184805 100644
--- a/fs/nfsd/nfssvc.c
+++ b/fs/nfsd/nfssvc.c
@@ -19,6 +19,7 @@
#include <linux/sunrpc/svc_xprt.h>
#include <linux/lockd/bind.h>
#include <linux/nfsacl.h>
+#include <linux/nfslocalio.h>
#include <linux/seq_file.h>
#include <linux/inetdevice.h>
#include <net/addrconf.h>
@@ -427,6 +428,10 @@ static int nfsd_startup_net(struct net *net, const struct cred *cred)

#ifdef CONFIG_NFSD_V4_2_INTER_SSC
nfsd4_ssc_init_umount_work(nn);
+#endif
+#if defined(CONFIG_NFSD_V3_LOCALIO)
+ INIT_LIST_HEAD(&nn->nfsd_uuid.list);
+ list_add_tail_rcu(&nn->nfsd_uuid.list, &nfsd_uuids);
#endif
nn->nfsd_net_up = true;
return 0;
@@ -457,6 +462,9 @@ static void nfsd_shutdown_net(struct net *net)
nn->lockd_up = false;
}
nn->nfsd_net_up = false;
+#if defined(CONFIG_NFSD_V3_LOCALIO)
+ list_del_rcu(&nn->nfsd_uuid.list);
+#endif
nfsd_shutdown_generic();
}

@@ -787,6 +795,9 @@ nfsd_svc(int nrservs, struct net *net, const struct cred *cred, const char *scop

strscpy(nn->nfsd_name, scope ? scope : utsname()->nodename,
sizeof(nn->nfsd_name));
+#if defined(CONFIG_NFSD_V3_LOCALIO)
+ uuid_gen(&nn->nfsd_uuid.uuid);
+#endif

error = nfsd_create_serv(net);
if (error)
diff --git a/include/linux/nfslocalio.h b/include/linux/nfslocalio.h
new file mode 100644
index 000000000000..d0bbacd0adcf
--- /dev/null
+++ b/include/linux/nfslocalio.h
@@ -0,0 +1,29 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (C) 2024 Mike Snitzer <[email protected]>
+ */
+#ifndef __LINUX_NFSLOCALIO_H
+#define __LINUX_NFSLOCALIO_H
+
+#include <linux/list.h>
+#include <linux/uuid.h>
+
+/*
+ * Global list of nfsd_uuid_t instances, add/remove
+ * is protected by fs/nfsd/nfssvc.c:nfsd_mutex.
+ */
+extern struct list_head nfsd_uuids;
+
+/*
+ * Each nfsd instance has an nfsd_uuid_t that is accessible through the
+ * global nfsd_uuids list. Useful to allow a client to negotiate if localio
+ * possible with its server.
+ */
+typedef struct {
+ uuid_t uuid;
+ struct list_head list;
+} nfsd_uuid_t;
+
+bool nfsd_uuid_is_local(const uuid_t *uuid);
+
+#endif /* __LINUX_NFSLOCALIO_H */
--
2.44.0


2024-06-07 14:28:17

by Mike Snitzer

[permalink] [raw]
Subject: [for-6.11 PATCH 23/29] nfsd: implement v3 server support for NFS_LOCALIO_PROGRAM

LOCALIOPROC_GETUUID encodes the server's uuid_t in terms of smaller
UUID_SIZE (16), rather than larger UUID_STRING_LEN + 1 (37).

Signed-off-by: Mike Snitzer <[email protected]>
---
fs/nfsd/localio.c | 90 +++++++++++++++++++++++++++++++++++++++++++++++
fs/nfsd/nfsd.h | 6 ++++
fs/nfsd/nfssvc.c | 75 ++++++++++++++++++++++++++++++++++++++-
fs/nfsd/xdr.h | 6 ++++
4 files changed, 176 insertions(+), 1 deletion(-)

diff --git a/fs/nfsd/localio.c b/fs/nfsd/localio.c
index ff68454a4017..eda4fa49b316 100644
--- a/fs/nfsd/localio.c
+++ b/fs/nfsd/localio.c
@@ -14,6 +14,8 @@
#include "vfs.h"
#include "netns.h"
#include "filecache.h"
+#include "cache.h"
+#include "xdr3.h"

#define NFSDDBG_FACILITY NFSDDBG_FH

@@ -177,3 +179,91 @@ EXPORT_SYMBOL_GPL(nfsd_open_local_fh);

/* Compile time type checking, not used by anything */
static nfs_to_nfsd_open_t __maybe_unused nfsd_open_local_fh_typecheck = nfsd_open_local_fh;
+
+/*
+ * GETUUID XDR encode functions
+ */
+
+static __be32 nfsd_proc_null(struct svc_rqst *rqstp)
+{
+ return rpc_success;
+}
+
+static __be32 nfsd_proc_getuuid(struct svc_rqst *rqstp)
+{
+ struct nfsd_net *nn = net_generic(SVC_NET(rqstp), nfsd_net_id);
+ struct nfsd_getuuidres *resp = rqstp->rq_resp;
+
+ uuid_copy(&resp->uuid, &nn->nfsd_uuid.uuid);
+ resp->status = nfs_ok;
+
+ return rpc_success;
+}
+
+#if defined(CONFIG_NFSD_V3_LOCALIO)
+
+static void encode_uuid(struct xdr_stream *xdr,
+ const char *name, u32 length)
+{
+ __be32 *p;
+
+ p = xdr_reserve_space(xdr, 4 + length);
+ xdr_encode_opaque(p, name, length);
+}
+
+static bool nfs3svc_encode_getuuidres(struct svc_rqst *rqstp,
+ struct xdr_stream *xdr)
+{
+ struct nfsd_getuuidres *resp = rqstp->rq_resp;
+
+ if (!svcxdr_encode_nfsstat3(xdr, resp->status))
+ return false;
+ if (resp->status == nfs_ok) {
+ u8 uuid[UUID_SIZE];
+
+ export_uuid(uuid, &resp->uuid);
+ encode_uuid(xdr, uuid, UUID_SIZE);
+ dprintk("%s: nfs_ok uuid=%pU uuid_len=%lu\n",
+ __func__, uuid, sizeof(uuid));
+ }
+
+ return true;
+}
+
+#define ST 1 /* status */
+#define NFS3_filename_sz (1+(NFS3_MAXNAMLEN>>2))
+
+static const struct svc_procedure nfsd_localio_procedures3[2] = {
+ [LOCALIOPROC_NULL] = {
+ .pc_func = nfsd_proc_null,
+ .pc_decode = nfssvc_decode_voidarg,
+ .pc_encode = nfssvc_encode_voidres,
+ .pc_argsize = sizeof(struct nfsd_voidargs),
+ .pc_ressize = sizeof(struct nfsd_voidres),
+ .pc_cachetype = RC_NOCACHE,
+ .pc_xdrressize = ST,
+ .pc_name = "NULL",
+ },
+ [LOCALIOPROC_GETUUID] = {
+ .pc_func = nfsd_proc_getuuid,
+ .pc_decode = nfssvc_decode_voidarg,
+ .pc_encode = nfs3svc_encode_getuuidres,
+ .pc_argsize = sizeof(struct nfsd_voidargs),
+ .pc_ressize = sizeof(struct nfsd_getuuidres),
+ .pc_cachetype = RC_NOCACHE,
+ .pc_xdrressize = ST+NFS3_filename_sz,
+ .pc_name = "GETUUID",
+ },
+};
+
+static DEFINE_PER_CPU_ALIGNED(unsigned long,
+ nfsd_localio_count3[ARRAY_SIZE(nfsd_localio_procedures3)]);
+const struct svc_version nfsd_localio_version3 = {
+ .vs_vers = 3,
+ .vs_nproc = 2,
+ .vs_proc = nfsd_localio_procedures3,
+ .vs_dispatch = nfsd_dispatch,
+ .vs_count = nfsd_localio_count3,
+ .vs_xdrsize = NFS3_SVC_XDRSIZE,
+};
+#endif /* CONFIG_NFSD_V3_LOCALIO */
diff --git a/fs/nfsd/nfsd.h b/fs/nfsd/nfsd.h
index 8f4f239d9f8a..d6771669531d 100644
--- a/fs/nfsd/nfsd.h
+++ b/fs/nfsd/nfsd.h
@@ -142,6 +142,12 @@ extern const struct svc_version nfsd_acl_version3;
#endif
#endif

+#if defined(CONFIG_NFSD_V3_LOCALIO)
+extern const struct svc_version nfsd_localio_version3;
+#else
+#define nfsd_localio_version3 NULL
+#endif
+
struct nfsd_net;

enum vers_op {NFSD_SET, NFSD_CLEAR, NFSD_TEST, NFSD_AVAIL };
diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
index 122cfa184805..c18ee0f56da4 100644
--- a/fs/nfsd/nfssvc.c
+++ b/fs/nfsd/nfssvc.c
@@ -38,6 +38,16 @@
atomic_t nfsd_th_cnt = ATOMIC_INIT(0);
extern struct svc_program nfsd_program;
static int nfsd(void *vrqstp);
+#if defined(CONFIG_NFSD_V3_LOCALIO)
+static int nfsd_localio_rpcbind_set(struct net *,
+ const struct svc_program *,
+ u32, int,
+ unsigned short,
+ unsigned short);
+static __be32 nfsd_localio_init_request(struct svc_rqst *,
+ const struct svc_program *,
+ struct svc_process_info *);
+#endif /* CONFIG_NFSD_V3_LOCALIO */
#if defined(CONFIG_NFSD_V2_ACL) || defined(CONFIG_NFSD_V3_ACL)
static int nfsd_acl_rpcbind_set(struct net *,
const struct svc_program *,
@@ -81,6 +91,26 @@ DEFINE_SPINLOCK(nfsd_drc_lock);
unsigned long nfsd_drc_max_mem;
unsigned long nfsd_drc_mem_used;

+#if defined(CONFIG_NFSD_V3_LOCALIO)
+static const struct svc_version *nfsd_localio_version[] = {
+ [3] = &nfsd_localio_version3,
+};
+
+#define NFSD_LOCALIO_MINVERS 3
+#define NFSD_LOCALIO_NRVERS ARRAY_SIZE(nfsd_localio_version)
+
+static struct svc_program nfsd_localio_program = {
+ .pg_prog = NFS_LOCALIO_PROGRAM,
+ .pg_nvers = NFSD_LOCALIO_NRVERS,
+ .pg_vers = nfsd_localio_version,
+ .pg_name = "nfslocalio",
+ .pg_class = "nfsd",
+ .pg_authenticate = &svc_set_client,
+ .pg_init_request = nfsd_localio_init_request,
+ .pg_rpcbind_set = nfsd_localio_rpcbind_set,
+};
+#endif /* CONFIG_NFSD_V3_LOCALIO */
+
#if defined(CONFIG_NFSD_V2_ACL) || defined(CONFIG_NFSD_V3_ACL)
static const struct svc_version *nfsd_acl_version[] = {
# if defined(CONFIG_NFSD_V2_ACL)
@@ -95,6 +125,9 @@ static const struct svc_version *nfsd_acl_version[] = {
#define NFSD_ACL_NRVERS ARRAY_SIZE(nfsd_acl_version)

static struct svc_program nfsd_acl_program = {
+#if defined(CONFIG_NFSD_V3_LOCALIO)
+ .pg_next = &nfsd_localio_program,
+#endif /* CONFIG_NFSD_V3_LOCALIO */
.pg_prog = NFS_ACL_PROGRAM,
.pg_nvers = NFSD_ACL_NRVERS,
.pg_vers = nfsd_acl_version,
@@ -123,6 +156,10 @@ static const struct svc_version *nfsd_version[] = {
struct svc_program nfsd_program = {
#if defined(CONFIG_NFSD_V2_ACL) || defined(CONFIG_NFSD_V3_ACL)
.pg_next = &nfsd_acl_program,
+#else
+#if defined(CONFIG_NFSD_V3_LOCALIO)
+ .pg_next = &nfsd_localio_program,
+#endif /* CONFIG_NFSD_V3_LOCALIO */
#endif
.pg_prog = NFS_PROGRAM, /* program number */
.pg_nvers = NFSD_NRVERS, /* nr of entries in nfsd_version */
@@ -818,6 +855,42 @@ nfsd_svc(int nrservs, struct net *net, const struct cred *cred, const char *scop
return error;
}

+#if defined(CONFIG_NFSD_V3_LOCALIO)
+static bool
+nfsd_support_localio_version(int vers)
+{
+ if (vers >= NFSD_LOCALIO_MINVERS && vers < NFSD_LOCALIO_NRVERS)
+ return nfsd_localio_version[vers] != NULL;
+ return false;
+}
+
+static int
+nfsd_localio_rpcbind_set(struct net *net, const struct svc_program *progp,
+ u32 version, int family, unsigned short proto,
+ unsigned short port)
+{
+ if (!nfsd_support_localio_version(version) ||
+ !nfsd_vers(net_generic(net, nfsd_net_id), version, NFSD_TEST))
+ return 0;
+ return svc_generic_rpcbind_set(net, progp, version, family,
+ proto, port);
+}
+
+static __be32
+nfsd_localio_init_request(struct svc_rqst *rqstp,
+ const struct svc_program *progp,
+ struct svc_process_info *ret)
+{
+ struct nfsd_net *nn = net_generic(SVC_NET(rqstp), nfsd_net_id);
+
+ if (likely(nfsd_support_localio_version(rqstp->rq_vers) &&
+ nfsd_vers(nn, rqstp->rq_vers, NFSD_TEST)))
+ return svc_generic_init_request(rqstp, progp, ret);
+
+ return rpc_prog_unavail;
+}
+#endif /* CONFIG_NFSD_V3_LOCALIO */
+
#if defined(CONFIG_NFSD_V2_ACL) || defined(CONFIG_NFSD_V3_ACL)
static bool
nfsd_support_acl_version(int vers)
@@ -960,7 +1033,7 @@ nfsd(void *vrqstp)
}

/**
- * nfsd_dispatch - Process an NFS or NFSACL Request
+ * nfsd_dispatch - Process an NFS or NFSACL or NFSLOCALIO Request
* @rqstp: incoming request
*
* This RPC dispatcher integrates the NFS server's duplicate reply cache.
diff --git a/fs/nfsd/xdr.h b/fs/nfsd/xdr.h
index 852f71580bd0..5714469af597 100644
--- a/fs/nfsd/xdr.h
+++ b/fs/nfsd/xdr.h
@@ -5,6 +5,7 @@
#define LINUX_NFSD_H

#include <linux/vfs.h>
+#include <linux/uuid.h>
#include "nfsd.h"
#include "nfsfh.h"

@@ -123,6 +124,11 @@ struct nfsd_statfsres {
struct kstatfs stats;
};

+struct nfsd_getuuidres {
+ __be32 status;
+ uuid_t uuid;
+};
+
/*
* Storage requirements for XDR arguments and results.
*/
--
2.44.0


2024-06-07 14:28:20

by Mike Snitzer

[permalink] [raw]
Subject: [for-6.11 PATCH 24/29] nfs_common: add NFS v4 LOCALIO protocol extension enablement

Add CONFIG_NFS_V4_LOCALIO and CONFIG_NFSD_V4_LOCALIO to Kconfig.
Extend nfs_common's nfsd_uuids list infrastructure to NFS v4.

Also, nfs and nfsd will only build their respective localio.c if
either NFS_V{3,4}_LOCALIO and/or either NFSD_V{3,4}_LOCALIO are
enabled.

Signed-off-by: Mike Snitzer <[email protected]>
---
fs/nfs/Kconfig | 14 +++++++++++++-
fs/nfs/Makefile | 1 +
fs/nfsd/Kconfig | 14 +++++++++++++-
fs/nfsd/Makefile | 1 +
fs/nfsd/netns.h | 2 +-
fs/nfsd/nfssvc.c | 6 +++---
6 files changed, 32 insertions(+), 6 deletions(-)

diff --git a/fs/nfs/Kconfig b/fs/nfs/Kconfig
index db8c9d6edcea..453ec4903086 100644
--- a/fs/nfs/Kconfig
+++ b/fs/nfs/Kconfig
@@ -5,7 +5,7 @@ config NFS_FS
select LOCKD
select SUNRPC
select NFS_ACL_SUPPORT if NFS_V3_ACL
- select NFS_LOCALIO_SUPPORT if NFS_V3_LOCALIO
+ select NFS_LOCALIO_SUPPORT if NFS_V3_LOCALIO || NFS_V4_LOCALIO
help
Choose Y here if you want to access files residing on other
computers using Sun's Network File System protocol. To compile
@@ -99,6 +99,18 @@ config NFS_V4

If unsure, say Y.

+config NFS_V4_LOCALIO
+ bool "NFS client support for the NFSv4 LOCALIO protocol extension"
+ depends on NFS_V4
+ help
+ Some NFS servers support an auxiliary NFSv4 LOCALIO protocol
+ that is not an official part of the NFS version 4 protocol.
+
+ This option enables support for version 4 of the LOCALIO
+ protocol in the kernel's NFS client.
+
+ If unsure, say N.
+
config NFS_SWAP
bool "Provide swap over NFS support"
default n
diff --git a/fs/nfs/Makefile b/fs/nfs/Makefile
index 7fed1ce375da..ad9923fb0f03 100644
--- a/fs/nfs/Makefile
+++ b/fs/nfs/Makefile
@@ -14,6 +14,7 @@ nfs-$(CONFIG_ROOT_NFS) += nfsroot.o
nfs-$(CONFIG_SYSCTL) += sysctl.o
nfs-$(CONFIG_NFS_FSCACHE) += fscache.o
nfs-$(CONFIG_NFS_V3_LOCALIO) += localio.o
+nfs-$(CONFIG_NFS_V4_LOCALIO) += localio.o

obj-$(CONFIG_NFS_V2) += nfsv2.o
nfsv2-y := nfs2super.o proc.o nfs2xdr.o
diff --git a/fs/nfsd/Kconfig b/fs/nfsd/Kconfig
index c8eb7e2d4006..34d540324dfa 100644
--- a/fs/nfsd/Kconfig
+++ b/fs/nfsd/Kconfig
@@ -9,7 +9,7 @@ config NFSD
select EXPORTFS
select NFS_ACL_SUPPORT if NFSD_V2_ACL
select NFS_ACL_SUPPORT if NFSD_V3_ACL
- select NFS_LOCALIO_SUPPORT if NFSD_V3_LOCALIO
+ select NFS_LOCALIO_SUPPORT if NFSD_V3_LOCALIO || NFSD_V4_LOCALIO
depends on MULTIUSER
help
Choose Y here if you want to allow other computers to access
@@ -102,6 +102,18 @@ config NFSD_V4

If unsure, say N.

+config NFSD_V4_LOCALIO
+ bool "NFS server support for the NFSv4 LOCALIO protocol extension"
+ depends on NFSD_V4
+ help
+ Some NFS servers support an auxiliary NFSv4 LOCALIO protocol
+ that is not an official part of the NFS version 4 protocol.
+
+ This option enables support for version 4 of the LOCALIO
+ protocol in the kernel's NFS server.
+
+ If unsure, say N.
+
config NFSD_PNFS
bool

diff --git a/fs/nfsd/Makefile b/fs/nfsd/Makefile
index 0e01749f6153..51d52fb0cd04 100644
--- a/fs/nfsd/Makefile
+++ b/fs/nfsd/Makefile
@@ -24,3 +24,4 @@ nfsd-$(CONFIG_NFSD_BLOCKLAYOUT) += blocklayout.o blocklayoutxdr.o
nfsd-$(CONFIG_NFSD_SCSILAYOUT) += blocklayout.o blocklayoutxdr.o
nfsd-$(CONFIG_NFSD_FLEXFILELAYOUT) += flexfilelayout.o flexfilelayoutxdr.o
nfsd-$(CONFIG_NFSD_V3_LOCALIO) += localio.o
+nfsd-$(CONFIG_NFSD_V4_LOCALIO) += localio.o
diff --git a/fs/nfsd/netns.h b/fs/nfsd/netns.h
index 5c5f7030ad87..afeabe5c7613 100644
--- a/fs/nfsd/netns.h
+++ b/fs/nfsd/netns.h
@@ -214,7 +214,7 @@ struct nfsd_net {
/* last time an admin-revoke happened for NFSv4.0 */
time64_t nfs40_last_revoke;

-#if defined(CONFIG_NFSD_V3_LOCALIO)
+#if defined(CONFIG_NFSD_V3_LOCALIO) || defined(CONFIG_NFSD_V4_LOCALIO)
nfsd_uuid_t nfsd_uuid;
#endif
};
diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
index c18ee0f56da4..fab699699869 100644
--- a/fs/nfsd/nfssvc.c
+++ b/fs/nfsd/nfssvc.c
@@ -466,7 +466,7 @@ static int nfsd_startup_net(struct net *net, const struct cred *cred)
#ifdef CONFIG_NFSD_V4_2_INTER_SSC
nfsd4_ssc_init_umount_work(nn);
#endif
-#if defined(CONFIG_NFSD_V3_LOCALIO)
+#if defined(CONFIG_NFSD_V3_LOCALIO) || defined(CONFIG_NFSD_V4_LOCALIO)
INIT_LIST_HEAD(&nn->nfsd_uuid.list);
list_add_tail_rcu(&nn->nfsd_uuid.list, &nfsd_uuids);
#endif
@@ -499,7 +499,7 @@ static void nfsd_shutdown_net(struct net *net)
nn->lockd_up = false;
}
nn->nfsd_net_up = false;
-#if defined(CONFIG_NFSD_V3_LOCALIO)
+#if defined(CONFIG_NFSD_V3_LOCALIO) || defined(CONFIG_NFSD_V4_LOCALIO)
list_del_rcu(&nn->nfsd_uuid.list);
#endif
nfsd_shutdown_generic();
@@ -832,7 +832,7 @@ nfsd_svc(int nrservs, struct net *net, const struct cred *cred, const char *scop

strscpy(nn->nfsd_name, scope ? scope : utsname()->nodename,
sizeof(nn->nfsd_name));
-#if defined(CONFIG_NFSD_V3_LOCALIO)
+#if defined(CONFIG_NFSD_V3_LOCALIO) || defined(CONFIG_NFSD_V4_LOCALIO)
uuid_gen(&nn->nfsd_uuid.uuid);
#endif

--
2.44.0


2024-06-07 14:28:21

by Mike Snitzer

[permalink] [raw]
Subject: [for-6.11 PATCH 25/29] nfs: implement v4 client support for NFS_LOCALIO_PROGRAM

While doing so, factor out nfs_init_localioclient() so it is used by
both nfs3client.c and nfs4client.c

Signed-off-by: Mike Snitzer <[email protected]>
---
fs/nfs/internal.h | 28 ++++++++++++++++++++--
fs/nfs/localio.c | 2 +-
fs/nfs/nfs3client.c | 19 +--------------
fs/nfs/nfs4_fs.h | 2 ++
fs/nfs/nfs4client.c | 23 ++++++++++++++++++
fs/nfs/nfs4proc.c | 3 +++
fs/nfs/nfs4xdr.c | 53 +++++++++++++++++++++++++++++++++++++++++
include/linux/nfs_xdr.h | 2 ++
8 files changed, 111 insertions(+), 21 deletions(-)

diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
index 9f81a94e798c..1b2adca930fa 100644
--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -452,7 +452,31 @@ extern void nfs_set_cache_invalid(struct inode *inode, unsigned long flags);
extern bool nfs_check_cache_invalid(struct inode *, unsigned long);
extern int nfs_wait_bit_killable(struct wait_bit_key *key, int mode);

-#if defined(CONFIG_NFS_V3_LOCALIO)
+#if defined(CONFIG_NFS_V3_LOCALIO) || defined(CONFIG_NFS_V4_LOCALIO)
+/*
+ * Initialise an NFS localio client connection.
+ * Inlined here to allow nfs[34]client.c to share this code.
+ */
+static __always_inline void
+nfs_init_localioclient(struct nfs_client *clp,
+ const struct rpc_program *program, u32 vers)
+{
+ bool supported = false;
+
+ if (unlikely(!IS_ERR(clp->cl_rpcclient_localio)))
+ goto out;
+ clp->cl_rpcclient_localio = rpc_bind_new_program(clp->cl_rpcclient,
+ program, vers);
+ if (IS_ERR(clp->cl_rpcclient_localio))
+ goto out;
+ /* No errors! Assume that localio is supported */
+ supported = true;
+out:
+ dfprintk_rcu(CLIENT, "%s: server (%s) %s NFS v%u LOCALIO\n", __func__,
+ rpc_peeraddr2str(clp->cl_rpcclient_localio, RPC_DISPLAY_ADDR),
+ (supported ? "supports" : "does not support"), vers);
+}
+
/* localio.c */
extern void nfs_local_init(void);
extern void nfs_local_enable(struct nfs_client *);
@@ -507,7 +531,7 @@ static inline bool nfs_server_is_local(const struct nfs_client *clp)
{
return false;
}
-#endif /* CONFIG_NFS_V3_LOCALIO */
+#endif /* CONFIG_NFS_V3_LOCALIO || CONFIG_NFS_V4_LOCALIO */

/* super.c */
extern const struct super_operations nfs_sops;
diff --git a/fs/nfs/localio.c b/fs/nfs/localio.c
index 145708444998..ab92c92f04e5 100644
--- a/fs/nfs/localio.c
+++ b/fs/nfs/localio.c
@@ -269,6 +269,7 @@ void nfs_local_probe(struct nfs_client *clp)

switch (clp->cl_rpcclient->cl_vers) {
case 3:
+ case 4:
/*
* Retrieve server's uuid via LOCALIO protocol and verify the
* server with that uuid it is known to be local. This ensures
@@ -280,7 +281,6 @@ void nfs_local_probe(struct nfs_client *clp)
if (!nfsd_uuid_is_local(&uuid))
return;
break;
- case 4:
default:
return; /* localio not supported */
}
diff --git a/fs/nfs/nfs3client.c b/fs/nfs/nfs3client.c
index c41122ee808c..123e7c1fd339 100644
--- a/fs/nfs/nfs3client.c
+++ b/fs/nfs/nfs3client.c
@@ -152,23 +152,6 @@ const struct rpc_program nfslocalio_program3 = {
*/
void nfs3_init_localioclient(struct nfs_client *clp)
{
- if (unlikely(!IS_ERR(clp->cl_rpcclient_localio)))
- goto out;
-
- clp->cl_rpcclient_localio = rpc_bind_new_program(clp->cl_rpcclient,
- &nfslocalio_program3, 3);
- if (IS_ERR(clp->cl_rpcclient_localio)) {
- dprintk_rcu("%s: server (%s) does not support NFS v3 LOCALIO\n", __func__,
- rpc_peeraddr2str(clp->cl_rpcclient, RPC_DISPLAY_ADDR));
- return;
- }
-out:
- /* No errors! Assume that localio is supported */
- dprintk_rcu("%s: server (%s) supports NFS v3 LOCALIO\n", __func__,
- rpc_peeraddr2str(clp->cl_rpcclient_localio, RPC_DISPLAY_ADDR));
-}
-#else
-void nfs3_init_localioclient(struct nfs_client *clp)
-{
+ nfs_init_localioclient(clp, &nfslocalio_program3, 3);
}
#endif /* CONFIG_NFS_V3_LOCALIO */
diff --git a/fs/nfs/nfs4_fs.h b/fs/nfs/nfs4_fs.h
index 7024230f0d1d..a0a41917dec2 100644
--- a/fs/nfs/nfs4_fs.h
+++ b/fs/nfs/nfs4_fs.h
@@ -538,6 +538,8 @@ extern int nfs4_proc_commit(struct file *dst, __u64 offset, __u32 count, struct
extern const nfs4_stateid zero_stateid;
extern const nfs4_stateid invalid_stateid;

+extern void nfs4_init_localioclient(struct nfs_client *);
+
/* nfs4super.c */
struct nfs_mount_info;
extern struct nfs_subversion nfs_v4;
diff --git a/fs/nfs/nfs4client.c b/fs/nfs/nfs4client.c
index 84573df5cf5a..d2f634aa1e1b 100644
--- a/fs/nfs/nfs4client.c
+++ b/fs/nfs/nfs4client.c
@@ -1384,3 +1384,26 @@ int nfs4_update_server(struct nfs_server *server, const char *hostname,

return nfs_probe_server(server, NFS_FH(d_inode(server->super->s_root)));
}
+
+#if defined(CONFIG_NFS_V4_LOCALIO)
+static struct rpc_stat nfslocalio_rpcstat = { &nfslocalio_program4 };
+static const struct rpc_version *nfslocalio_version[] = {
+ [4] = &nfslocalio_version4,
+};
+
+const struct rpc_program nfslocalio_program4 = {
+ .name = "nfslocalio",
+ .number = NFS_LOCALIO_PROGRAM,
+ .nrvers = ARRAY_SIZE(nfslocalio_version),
+ .version = nfslocalio_version,
+ .stats = &nfslocalio_rpcstat,
+};
+
+/*
+ * Initialise an NFSv4 localio client connection
+ */
+void nfs4_init_localioclient(struct nfs_client *clp)
+{
+ nfs_init_localioclient(clp, &nfslocalio_program4, 4);
+}
+#endif /* CONFIG_NFS_V4_LOCALIO */
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index c93c12063b3a..060bc8dbee61 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -10745,6 +10745,9 @@ const struct nfs_rpc_ops nfs_v4_clientops = {
.discover_trunking = nfs4_discover_trunking,
.enable_swap = nfs4_enable_swap,
.disable_swap = nfs4_disable_swap,
+#if defined(CONFIG_NFS_V4_LOCALIO)
+ .init_localioclient = nfs4_init_localioclient,
+#endif
};

static const struct xattr_handler nfs4_xattr_nfs4_acl_handler = {
diff --git a/fs/nfs/nfs4xdr.c b/fs/nfs/nfs4xdr.c
index 1416099dfcd1..e6f3556a320e 100644
--- a/fs/nfs/nfs4xdr.c
+++ b/fs/nfs/nfs4xdr.c
@@ -7728,3 +7728,56 @@ const struct rpc_version nfs_version4 = {
.procs = nfs4_procedures,
.counts = nfs_version4_counts,
};
+
+#if defined(CONFIG_NFS_V4_LOCALIO)
+
+#define NFS4_filename_sz (1+(NFS4_MAXNAMLEN>>2))
+#define LOCALIO4_getuuidres_sz (1+NFS4_filename_sz)
+
+static void nfs4_xdr_enc_getuuidargs(struct rpc_rqst *req,
+ struct xdr_stream *xdr,
+ const void *data)
+{
+ /* void function */
+}
+
+static inline int nfs4_decode_getuuidresok(struct xdr_stream *xdr,
+ struct nfs_getuuidres *result)
+{
+ return decode_opaque_inline(xdr, &result->len, (char **)&result->uuid);
+}
+
+static int nfs4_xdr_dec_getuuidres(struct rpc_rqst *req,
+ struct xdr_stream *xdr,
+ void *result)
+{
+ // FIXME: need proper handling that isn't abusing nfs_opnum4
+ int error = decode_op_hdr(xdr, LOCALIOPROC_GETUUID);
+ if (unlikely(error))
+ goto out;
+ error = nfs4_decode_getuuidresok(xdr, result);
+out:
+ return error;
+}
+
+static const struct rpc_procinfo nfs4_localio_procedures[] = {
+ [LOCALIOPROC_GETUUID] = {
+ .p_proc = LOCALIOPROC_GETUUID,
+ .p_encode = nfs4_xdr_enc_getuuidargs,
+ .p_decode = nfs4_xdr_dec_getuuidres,
+ .p_arglen = 1,
+ .p_replen = LOCALIO4_getuuidres_sz,
+ .p_statidx = LOCALIOPROC_GETUUID,
+ .p_name = "GETUUID",
+ },
+};
+
+static unsigned int nfs4_localio_counts[ARRAY_SIZE(nfs4_localio_procedures)];
+const struct rpc_version nfslocalio_version4 = {
+ .number = 4,
+ .nrprocs = ARRAY_SIZE(nfs4_localio_procedures),
+ .procs = nfs4_localio_procedures,
+ .counts = nfs4_localio_counts,
+};
+
+#endif /* CONFIG_NFS_V4_LOCALIO */
diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h
index 9a030e9bd9cf..b6a16eca4664 100644
--- a/include/linux/nfs_xdr.h
+++ b/include/linux/nfs_xdr.h
@@ -1842,5 +1842,7 @@ extern const struct rpc_program nfsacl_program;

extern const struct rpc_version nfslocalio_version3;
extern const struct rpc_program nfslocalio_program3;
+extern const struct rpc_version nfslocalio_version4;
+extern const struct rpc_program nfslocalio_program4;

#endif
--
2.44.0


2024-06-07 14:28:26

by Mike Snitzer

[permalink] [raw]
Subject: [for-6.11 PATCH 27/29] nfs/nfsd: switch GETUUID to using {encode,decode}_opaque_fixed

The uuid is always UUID_SIZE (16) so there is no need to use less
efficient variable sized opaque encode and decode XDR methods.

Also, XDR buffer size requirements were audited and reduced
accordingly.

Signed-off-by: Mike Snitzer <[email protected]>
---
fs/nfs/localio.c | 6 ++---
fs/nfs/nfs3xdr.c | 13 +++++++--
fs/nfs/nfs4xdr.c | 5 ++--
fs/nfsd/localio.c | 60 +++++++++++++++++------------------------
include/linux/nfs_xdr.h | 3 +--
5 files changed, 41 insertions(+), 46 deletions(-)

diff --git a/fs/nfs/localio.c b/fs/nfs/localio.c
index ab92c92f04e5..ff28a7315470 100644
--- a/fs/nfs/localio.c
+++ b/fs/nfs/localio.c
@@ -246,9 +246,9 @@ static bool nfs_local_server_getuuid(struct nfs_client *clp, uuid_t *nfsd_uuid)
dprintk("%s: NFS issuing getuuid\n", __func__);
msg.rpc_proc = &clp->cl_rpcclient_localio->cl_procinfo[LOCALIOPROC_GETUUID];
status = rpc_call_sync(clp->cl_rpcclient_localio, &msg, 0);
- dprintk("%s: NFS reply getuuid: status=%d uuid=%pU uuid_len=%u\n",
- __func__, status, res.uuid, res.len);
- if (status || res.len != UUID_SIZE)
+ dprintk("%s: NFS reply getuuid: status=%d uuid=%pU\n",
+ __func__, status, res.uuid);
+ if (status)
return false;

import_uuid(nfsd_uuid, res.uuid);
diff --git a/fs/nfs/nfs3xdr.c b/fs/nfs/nfs3xdr.c
index 49689a9a2111..d2a17ecd12b8 100644
--- a/fs/nfs/nfs3xdr.c
+++ b/fs/nfs/nfs3xdr.c
@@ -2582,7 +2582,7 @@ const struct rpc_version nfsacl_version3 = {

#if defined(CONFIG_NFS_V3_LOCALIO)

-#define LOCALIO3_getuuidres_sz (1+NFS3_filename_sz)
+#define LOCALIO3_getuuidres_sz (1+XDR_QUADLEN(UUID_SIZE))

static void nfs3_xdr_enc_getuuidargs(struct rpc_rqst *req,
struct xdr_stream *xdr,
@@ -2591,10 +2591,19 @@ static void nfs3_xdr_enc_getuuidargs(struct rpc_rqst *req,
/* void function */
}

+// FIXME: factor out from fs/nfs/nfs4xdr.c
+static int decode_opaque_fixed(struct xdr_stream *xdr, void *buf, size_t len)
+{
+ ssize_t ret = xdr_stream_decode_opaque_fixed(xdr, buf, len);
+ if (unlikely(ret < 0))
+ return -EIO;
+ return 0;
+}
+
static inline int nfs3_decode_getuuidresok(struct xdr_stream *xdr,
struct nfs_getuuidres *result)
{
- return decode_inline_filename3(xdr, &result->uuid, &result->len);
+ return decode_opaque_fixed(xdr, result->uuid, UUID_SIZE);
}

static int nfs3_xdr_dec_getuuidres(struct rpc_rqst *req,
diff --git a/fs/nfs/nfs4xdr.c b/fs/nfs/nfs4xdr.c
index e6f3556a320e..d3b4fa3245f0 100644
--- a/fs/nfs/nfs4xdr.c
+++ b/fs/nfs/nfs4xdr.c
@@ -7731,8 +7731,7 @@ const struct rpc_version nfs_version4 = {

#if defined(CONFIG_NFS_V4_LOCALIO)

-#define NFS4_filename_sz (1+(NFS4_MAXNAMLEN>>2))
-#define LOCALIO4_getuuidres_sz (1+NFS4_filename_sz)
+#define LOCALIO4_getuuidres_sz (op_decode_hdr_maxsz+XDR_QUADLEN(UUID_SIZE))

static void nfs4_xdr_enc_getuuidargs(struct rpc_rqst *req,
struct xdr_stream *xdr,
@@ -7744,7 +7743,7 @@ static void nfs4_xdr_enc_getuuidargs(struct rpc_rqst *req,
static inline int nfs4_decode_getuuidresok(struct xdr_stream *xdr,
struct nfs_getuuidres *result)
{
- return decode_opaque_inline(xdr, &result->len, (char **)&result->uuid);
+ return decode_opaque_fixed(xdr, result->uuid, UUID_SIZE);
}

static int nfs4_xdr_dec_getuuidres(struct rpc_rqst *req,
diff --git a/fs/nfsd/localio.c b/fs/nfsd/localio.c
index e4d2adf9531f..ace99f371c13 100644
--- a/fs/nfsd/localio.c
+++ b/fs/nfsd/localio.c
@@ -201,17 +201,23 @@ static __be32 nfsd_proc_getuuid(struct svc_rqst *rqstp)
return rpc_success;
}

+#define NFS_getuuid_sz XDR_QUADLEN(UUID_SIZE)
+
+static inline void encode_opaque_fixed(struct xdr_stream *xdr, const void *buf, size_t len)
+{
+ WARN_ON_ONCE(xdr_stream_encode_opaque_fixed(xdr, buf, len) < 0);
+}
+
+static void encode_uuid(struct xdr_stream *xdr, uuid_t *src_uuid)
+{
+ u8 uuid[UUID_SIZE];
+
+ export_uuid(uuid, src_uuid);
+ encode_opaque_fixed(xdr, uuid, UUID_SIZE);
+ dprintk("%s: uuid=%pU\n", __func__, uuid);
+}
+
#if defined(CONFIG_NFSD_V3_LOCALIO)
-
-static void encode_uuid(struct xdr_stream *xdr,
- const char *name, u32 length)
-{
- __be32 *p;
-
- p = xdr_reserve_space(xdr, 4 + length);
- xdr_encode_opaque(p, name, length);
-}
-
static bool nfs3svc_encode_getuuidres(struct svc_rqst *rqstp,
struct xdr_stream *xdr)
{
@@ -219,14 +225,8 @@ static bool nfs3svc_encode_getuuidres(struct svc_rqst *rqstp,

if (!svcxdr_encode_nfsstat3(xdr, resp->status))
return false;
- if (resp->status == nfs_ok) {
- u8 uuid[UUID_SIZE];
-
- export_uuid(uuid, &resp->uuid);
- encode_uuid(xdr, uuid, UUID_SIZE);
- dprintk("%s: nfs_ok uuid=%pU uuid_len=%lu\n",
- __func__, uuid, sizeof(uuid));
- }
+ if (resp->status == nfs_ok)
+ encode_uuid(xdr, &resp->uuid);

return true;
}
@@ -242,7 +242,7 @@ static const struct svc_procedure nfsd_localio_procedures3[2] = {
.pc_argsize = sizeof(struct nfsd_voidargs),
.pc_ressize = sizeof(struct nfsd_voidres),
.pc_cachetype = RC_NOCACHE,
- .pc_xdrressize = ST,
+ .pc_xdrressize = 1,
.pc_name = "NULL",
},
[LOCALIOPROC_GETUUID] = {
@@ -252,7 +252,7 @@ static const struct svc_procedure nfsd_localio_procedures3[2] = {
.pc_argsize = sizeof(struct nfsd_voidargs),
.pc_ressize = sizeof(struct nfsd_getuuidres),
.pc_cachetype = RC_NOCACHE,
- .pc_xdrressize = ST+NFS3_filename_sz,
+ .pc_xdrressize = 1+NFS_getuuid_sz,
.pc_name = "GETUUID",
},
};
@@ -282,24 +282,12 @@ static bool nfs4svc_encode_getuuidres(struct svc_rqst *rqstp,
*p++ = cpu_to_be32(LOCALIOPROC_GETUUID);
*p++ = resp->status;

- if (resp->status == nfs_ok) {
- u8 uuid[UUID_SIZE];
-
- export_uuid(uuid, &resp->uuid);
- p = xdr_reserve_space(xdr, 4 + UUID_SIZE);
- if (!p)
- return 0;
- xdr_encode_opaque(p, uuid, UUID_SIZE);
- dprintk("%s: nfs_ok uuid=%pU uuid_len=%lu\n",
- __func__, uuid, sizeof(uuid));
- }
+ if (resp->status == nfs_ok)
+ encode_uuid(xdr, &resp->uuid);

return 1;
}

-#define ST 1 /* status */
-#define NFS4_filename_sz (1+(NFS4_MAXNAMLEN>>2))
-
static const struct svc_procedure nfsd_localio_procedures4[2] = {
[LOCALIOPROC_NULL] = {
.pc_func = nfsd_proc_null,
@@ -308,7 +296,7 @@ static const struct svc_procedure nfsd_localio_procedures4[2] = {
.pc_argsize = sizeof(struct nfsd_voidargs),
.pc_ressize = sizeof(struct nfsd_voidres),
.pc_cachetype = RC_NOCACHE,
- .pc_xdrressize = ST,
+ .pc_xdrressize = 1,
.pc_name = "NULL",
},
[LOCALIOPROC_GETUUID] = {
@@ -318,7 +306,7 @@ static const struct svc_procedure nfsd_localio_procedures4[2] = {
.pc_argsize = sizeof(struct nfsd_voidargs),
.pc_ressize = sizeof(struct nfsd_getuuidres),
.pc_cachetype = RC_NOCACHE,
- .pc_xdrressize = ST+NFS4_filename_sz,
+ .pc_xdrressize = 2+NFS_getuuid_sz,
.pc_name = "GETUUID",
},
};
diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h
index b6a16eca4664..2a438f4c2d6d 100644
--- a/include/linux/nfs_xdr.h
+++ b/include/linux/nfs_xdr.h
@@ -1003,8 +1003,7 @@ struct nfs3_getaclres {
};

struct nfs_getuuidres {
- const char * uuid;
- unsigned int len;
+ __u8 * uuid;
};

#if IS_ENABLED(CONFIG_NFS_V4)
--
2.44.0


2024-06-07 14:28:27

by Mike Snitzer

[permalink] [raw]
Subject: [for-6.11 PATCH 28/29] nfs/nfsd: consolidate {encode,decode}_opaque_fixed in nfs_xdr.h

Eliminates duplicate functions in various files.

Signed-off-by: Mike Snitzer <[email protected]>
---
fs/nfs/flexfilelayout/flexfilelayout.c | 6 ------
fs/nfs/nfs3xdr.c | 9 ---------
fs/nfs/nfs4xdr.c | 13 -------------
fs/nfsd/localio.c | 7 ++-----
include/linux/nfs_xdr.h | 20 +++++++++++++++++++-
5 files changed, 21 insertions(+), 34 deletions(-)

diff --git a/fs/nfs/flexfilelayout/flexfilelayout.c b/fs/nfs/flexfilelayout/flexfilelayout.c
index 0a9eccb44085..c2681ebd553c 100644
--- a/fs/nfs/flexfilelayout/flexfilelayout.c
+++ b/fs/nfs/flexfilelayout/flexfilelayout.c
@@ -2185,12 +2185,6 @@ static int ff_layout_encode_ioerr(struct xdr_stream *xdr,
return ff_layout_encode_ds_ioerr(xdr, &ff_args->errors);
}

-static void
-encode_opaque_fixed(struct xdr_stream *xdr, const void *buf, size_t len)
-{
- WARN_ON_ONCE(xdr_stream_encode_opaque_fixed(xdr, buf, len) < 0);
-}
-
static void
ff_layout_encode_ff_iostat_head(struct xdr_stream *xdr,
const nfs4_stateid *stateid,
diff --git a/fs/nfs/nfs3xdr.c b/fs/nfs/nfs3xdr.c
index d2a17ecd12b8..95a2fb0733ae 100644
--- a/fs/nfs/nfs3xdr.c
+++ b/fs/nfs/nfs3xdr.c
@@ -2591,15 +2591,6 @@ static void nfs3_xdr_enc_getuuidargs(struct rpc_rqst *req,
/* void function */
}

-// FIXME: factor out from fs/nfs/nfs4xdr.c
-static int decode_opaque_fixed(struct xdr_stream *xdr, void *buf, size_t len)
-{
- ssize_t ret = xdr_stream_decode_opaque_fixed(xdr, buf, len);
- if (unlikely(ret < 0))
- return -EIO;
- return 0;
-}
-
static inline int nfs3_decode_getuuidresok(struct xdr_stream *xdr,
struct nfs_getuuidres *result)
{
diff --git a/fs/nfs/nfs4xdr.c b/fs/nfs/nfs4xdr.c
index d3b4fa3245f0..6b35b1d7d7ce 100644
--- a/fs/nfs/nfs4xdr.c
+++ b/fs/nfs/nfs4xdr.c
@@ -968,11 +968,6 @@ static __be32 *reserve_space(struct xdr_stream *xdr, size_t nbytes)
return p;
}

-static void encode_opaque_fixed(struct xdr_stream *xdr, const void *buf, size_t len)
-{
- WARN_ON_ONCE(xdr_stream_encode_opaque_fixed(xdr, buf, len) < 0);
-}
-
static void encode_string(struct xdr_stream *xdr, unsigned int len, const char *str)
{
WARN_ON_ONCE(xdr_stream_encode_opaque(xdr, str, len) < 0);
@@ -4352,14 +4347,6 @@ static int decode_access(struct xdr_stream *xdr, u32 *supported, u32 *access)
return 0;
}

-static int decode_opaque_fixed(struct xdr_stream *xdr, void *buf, size_t len)
-{
- ssize_t ret = xdr_stream_decode_opaque_fixed(xdr, buf, len);
- if (unlikely(ret < 0))
- return -EIO;
- return 0;
-}
-
static int decode_stateid(struct xdr_stream *xdr, nfs4_stateid *stateid)
{
return decode_opaque_fixed(xdr, stateid, NFS4_STATEID_SIZE);
diff --git a/fs/nfsd/localio.c b/fs/nfsd/localio.c
index ace99f371c13..c4324a0fff57 100644
--- a/fs/nfsd/localio.c
+++ b/fs/nfsd/localio.c
@@ -8,6 +8,8 @@
#include <linux/sunrpc/svcauth_gss.h>
#include <linux/sunrpc/clnt.h>
#include <linux/nfs.h>
+#include <linux/nfs_fs.h>
+#include <linux/nfs_xdr.h>
#include <linux/string.h>

#include "nfsd.h"
@@ -203,11 +205,6 @@ static __be32 nfsd_proc_getuuid(struct svc_rqst *rqstp)

#define NFS_getuuid_sz XDR_QUADLEN(UUID_SIZE)

-static inline void encode_opaque_fixed(struct xdr_stream *xdr, const void *buf, size_t len)
-{
- WARN_ON_ONCE(xdr_stream_encode_opaque_fixed(xdr, buf, len) < 0);
-}
-
static void encode_uuid(struct xdr_stream *xdr, uuid_t *src_uuid)
{
u8 uuid[UUID_SIZE];
diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h
index 2a438f4c2d6d..daa4115f6be6 100644
--- a/include/linux/nfs_xdr.h
+++ b/include/linux/nfs_xdr.h
@@ -1826,6 +1826,24 @@ struct nfs_rpc_ops {
void (*init_localioclient)(struct nfs_client *);
};

+/*
+ * Helper functions used by NFS client and/or server
+ */
+static inline void encode_opaque_fixed(struct xdr_stream *xdr,
+ const void *buf, size_t len)
+{
+ WARN_ON_ONCE(xdr_stream_encode_opaque_fixed(xdr, buf, len) < 0);
+}
+
+static inline int decode_opaque_fixed(struct xdr_stream *xdr,
+ void *buf, size_t len)
+{
+ ssize_t ret = xdr_stream_decode_opaque_fixed(xdr, buf, len);
+ if (unlikely(ret < 0))
+ return -EIO;
+ return 0;
+}
+
/*
* Function vectors etc. for the NFS client
*/
@@ -1844,4 +1862,4 @@ extern const struct rpc_program nfslocalio_program3;
extern const struct rpc_version nfslocalio_version4;
extern const struct rpc_program nfslocalio_program4;

-#endif
+#endif /* _LINUX_NFS_XDR_H */
--
2.44.0


2024-06-07 14:28:47

by Mike Snitzer

[permalink] [raw]
Subject: [for-6.11 PATCH 29/29] nfs/localio: move managing nfsd_open_local_fh symbol to nfs_common

Get nfsd_open_local_fh and store it in rpc_client during client
creation, put the symbol during nfs_local_disable -- which is also
called during client destruction.

Eliminates the need for nfs_local_open_ctx and extra locking and
refcounting work in fs/nfs/localio.c

Also makes it so the reference to the nfsd_open_local_fh symbol is
managed by the nfs_common module instead of the nfs client modules.

Signed-off-by: Mike Snitzer <[email protected]>
---
fs/nfs/client.c | 1 +
fs/nfs/inode.c | 1 -
fs/nfs/internal.h | 13 ++++--
fs/nfs/localio.c | 89 +++-----------------------------------
fs/nfs_common/nfslocalio.c | 26 +++++++++++
include/linux/nfs.h | 4 --
include/linux/nfs_fs_sb.h | 2 +
include/linux/nfslocalio.h | 8 ++++
8 files changed, 53 insertions(+), 91 deletions(-)

diff --git a/fs/nfs/client.c b/fs/nfs/client.c
index 589aeba8ccbb..3d356fb05aee 100644
--- a/fs/nfs/client.c
+++ b/fs/nfs/client.c
@@ -171,6 +171,7 @@ struct nfs_client *nfs_alloc_client(const struct nfs_client_initdata *cl_init)

INIT_LIST_HEAD(&clp->cl_superblocks);
clp->cl_rpcclient = clp->cl_rpcclient_localio = ERR_PTR(-EINVAL);
+ clp->nfsd_open_local_fh = NULL;

clp->cl_flags = cl_init->init_flags;
clp->cl_proto = cl_init->proto;
diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
index b80469bce8df..811c99e65a02 100644
--- a/fs/nfs/inode.c
+++ b/fs/nfs/inode.c
@@ -2513,7 +2513,6 @@ static int __init init_nfs_fs(void)
if (err)
goto out1;

- nfs_local_init();
err = register_nfs_fs();
if (err)
goto out0;
diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
index 1b2adca930fa..e82bdc579589 100644
--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -470,11 +470,18 @@ nfs_init_localioclient(struct nfs_client *clp,
if (IS_ERR(clp->cl_rpcclient_localio))
goto out;
/* No errors! Assume that localio is supported */
+ clp->nfsd_open_local_fh = get_nfsd_open_local_fh();
+ if (!clp->nfsd_open_local_fh) {
+ rpc_shutdown_client(clp->cl_rpcclient_localio);
+ clp->cl_rpcclient_localio = ERR_PTR(-EINVAL);
+ goto out;
+ }
supported = true;
out:
- dfprintk_rcu(CLIENT, "%s: server (%s) %s NFS v%u LOCALIO\n", __func__,
- rpc_peeraddr2str(clp->cl_rpcclient_localio, RPC_DISPLAY_ADDR),
- (supported ? "supports" : "does not support"), vers);
+ dfprintk_rcu(CLIENT, "%s: server (%s) %s NFS v%u LOCALIO, nfsd_open_local_fh is %s.\n",
+ __func__, rpc_peeraddr2str(clp->cl_rpcclient_localio, RPC_DISPLAY_ADDR),
+ (supported ? "supports" : "does not support"), vers,
+ (clp->nfsd_open_local_fh ? "set" : "not set"));
}

/* localio.c */
diff --git a/fs/nfs/localio.c b/fs/nfs/localio.c
index ff28a7315470..fb1ebc9715ff 100644
--- a/fs/nfs/localio.c
+++ b/fs/nfs/localio.c
@@ -27,26 +27,6 @@

#define NFSDBG_FACILITY NFSDBG_VFS

-extern int nfsd_open_local_fh(struct rpc_clnt *rpc_clnt,
- const struct cred *cred,
- const struct nfs_fh *nfs_fh, const fmode_t fmode,
- struct file **pfilp);
-/*
- * The localio code needs to call into nfsd to do the filehandle -> struct path
- * mapping, but cannot be statically linked, because that will make the nfs
- * module depend on the nfsd module.
- *
- * Instead, do dynamic linking to the nfsd module. This way the nfs module
- * will only hold a reference on nfsd when it's actually in use. This also
- * allows some sanity checking, like giving up on localio if nfsd isn't loaded.
- */
-
-struct nfs_local_open_ctx {
- spinlock_t lock;
- nfs_to_nfsd_open_t open_f;
- atomic_t refcount;
-};
-
struct nfs_local_kiocb {
struct kiocb kiocb;
struct bio_vec *bvec;
@@ -139,8 +119,6 @@ nfs4errno(int errno)
return NFS4ERR_SERVERFAULT;
}

-static struct nfs_local_open_ctx __local_open_ctx __read_mostly;
-
static bool localio_enabled __read_mostly = true;
module_param(localio_enabled, bool, 0644);

@@ -151,66 +129,12 @@ bool nfs_server_is_local(const struct nfs_client *clp)
}
EXPORT_SYMBOL_GPL(nfs_server_is_local);

-void
-nfs_local_init(void)
-{
- struct nfs_local_open_ctx *ctx = &__local_open_ctx;
-
- ctx->open_f = NULL;
- spin_lock_init(&ctx->lock);
- atomic_set(&ctx->refcount, 0);
-}
-
-static bool
-nfs_local_get_lookup_ctx(void)
-{
- struct nfs_local_open_ctx *ctx = &__local_open_ctx;
- nfs_to_nfsd_open_t fn = NULL;
-
- spin_lock(&ctx->lock);
- if (ctx->open_f == NULL) {
- spin_unlock(&ctx->lock);
-
- fn = symbol_request(nfsd_open_local_fh);
- if (!fn)
- return false;
-
- spin_lock(&ctx->lock);
- /* catch race */
- if (ctx->open_f == NULL) {
- ctx->open_f = fn;
- fn = NULL;
- }
- }
- atomic_inc(&ctx->refcount);
- spin_unlock(&ctx->lock);
- if (fn)
- symbol_put(nfsd_open_local_fh);
- return true;
-}
-
-static void
-nfs_local_put_lookup_ctx(void)
-{
- struct nfs_local_open_ctx *ctx = &__local_open_ctx;
- nfs_to_nfsd_open_t fn;
-
- if (atomic_dec_and_lock(&ctx->refcount, &ctx->lock)) {
- fn = ctx->open_f;
- ctx->open_f = NULL;
- spin_unlock(&ctx->lock);
- if (fn)
- symbol_put(nfsd_open_local_fh);
- }
-}
-
/*
* nfs_local_enable - attempt to enable local i/o for an nfs_client
*/
-void
-nfs_local_enable(struct nfs_client *clp)
+void nfs_local_enable(struct nfs_client *clp)
{
- if (nfs_local_get_lookup_ctx()) {
+ if (READ_ONCE(clp->nfsd_open_local_fh)) {
set_bit(NFS_CS_LOCAL_IO, &clp->cl_flags);
trace_nfs_local_enable(clp);
}
@@ -219,12 +143,12 @@ nfs_local_enable(struct nfs_client *clp)
/*
* nfs_local_disable - disable local i/o for an nfs_client
*/
-void
-nfs_local_disable(struct nfs_client *clp)
+void nfs_local_disable(struct nfs_client *clp)
{
if (test_and_clear_bit(NFS_CS_LOCAL_IO, &clp->cl_flags)) {
trace_nfs_local_disable(clp);
- nfs_local_put_lookup_ctx();
+ put_nfsd_open_local_fh();
+ clp->nfsd_open_local_fh = NULL;
}
}

@@ -299,14 +223,13 @@ struct file *
nfs_local_open_fh(struct nfs_client *clp, const struct cred *cred,
struct nfs_fh *fh, const fmode_t mode)
{
- struct nfs_local_open_ctx *ctx = &__local_open_ctx;
struct file *filp;
int status;

if (mode & ~(FMODE_READ | FMODE_WRITE))
return ERR_PTR(-EINVAL);

- status = ctx->open_f(clp->cl_rpcclient, cred, fh, mode, &filp);
+ status = clp->nfsd_open_local_fh(clp->cl_rpcclient, cred, fh, mode, &filp);
if (status < 0) {
dprintk("%s: open local file failed error=%d\n",
__func__, status);
diff --git a/fs/nfs_common/nfslocalio.c b/fs/nfs_common/nfslocalio.c
index f214cc6754a1..c454c4100976 100644
--- a/fs/nfs_common/nfslocalio.c
+++ b/fs/nfs_common/nfslocalio.c
@@ -40,3 +40,29 @@ bool nfsd_uuid_is_local(const uuid_t *uuid)
return !uuid_is_null(nfsd_uuid);
}
EXPORT_SYMBOL_GPL(nfsd_uuid_is_local);
+
+/*
+ * The nfs localio code needs to call into nfsd to do the filehandle -> struct path
+ * mapping, but cannot be statically linked, because that will make the nfs module
+ * depend on the nfsd module.
+ *
+ * Instead, do dynamic linking to the nfsd module (via nfs_common module). The
+ * nfs_common module will only hold a reference on nfsd when localio is in use.
+ * This allows some sanity checking, like giving up on localio if nfsd isn't loaded.
+ */
+
+extern int nfsd_open_local_fh(struct rpc_clnt *rpc_clnt,
+ const struct cred *cred, const struct nfs_fh *nfs_fh,
+ const fmode_t fmode, struct file **pfilp);
+
+nfs_to_nfsd_open_t get_nfsd_open_local_fh(void)
+{
+ return symbol_request(nfsd_open_local_fh);
+}
+EXPORT_SYMBOL_GPL(get_nfsd_open_local_fh);
+
+void put_nfsd_open_local_fh(void)
+{
+ symbol_put(nfsd_open_local_fh);
+}
+EXPORT_SYMBOL_GPL(put_nfsd_open_local_fh);
diff --git a/include/linux/nfs.h b/include/linux/nfs.h
index 80843764fad3..755944b562e9 100644
--- a/include/linux/nfs.h
+++ b/include/linux/nfs.h
@@ -111,10 +111,6 @@ static inline int nfs_stat_to_errno(enum nfs_stat status)
return nfs_common_errtbl[i].errno;
}

-typedef int (*nfs_to_nfsd_open_t)(struct rpc_clnt *, const struct cred *,
- const struct nfs_fh *, const fmode_t,
- struct file **);
-
#ifdef CONFIG_CRC32
/**
* nfs_fhandle_hash - calculate the crc32 hash for the filehandle
diff --git a/include/linux/nfs_fs_sb.h b/include/linux/nfs_fs_sb.h
index efcdb4d8e9de..f5760b05ec87 100644
--- a/include/linux/nfs_fs_sb.h
+++ b/include/linux/nfs_fs_sb.h
@@ -8,6 +8,7 @@
#include <linux/wait.h>
#include <linux/nfs_xdr.h>
#include <linux/sunrpc/xprt.h>
+#include <linux/nfslocalio.h>

#include <linux/atomic.h>
#include <linux/refcount.h>
@@ -131,6 +132,7 @@ struct nfs_client {
struct timespec64 cl_nfssvc_boot;
seqlock_t cl_boot_lock;
struct rpc_clnt * cl_rpcclient_localio; /* localio RPC client handle */
+ nfs_to_nfsd_open_t nfsd_open_local_fh;
};

/*
diff --git a/include/linux/nfslocalio.h b/include/linux/nfslocalio.h
index d0bbacd0adcf..b8df1b9f248d 100644
--- a/include/linux/nfslocalio.h
+++ b/include/linux/nfslocalio.h
@@ -7,6 +7,7 @@

#include <linux/list.h>
#include <linux/uuid.h>
+#include <linux/nfs.h>

/*
* Global list of nfsd_uuid_t instances, add/remove
@@ -26,4 +27,11 @@ typedef struct {

bool nfsd_uuid_is_local(const uuid_t *uuid);

+typedef int (*nfs_to_nfsd_open_t)(struct rpc_clnt *, const struct cred *,
+ const struct nfs_fh *, const fmode_t,
+ struct file **);
+
+nfs_to_nfsd_open_t get_nfsd_open_local_fh(void);
+void put_nfsd_open_local_fh(void);
+
#endif /* __LINUX_NFSLOCALIO_H */
--
2.44.0


2024-06-07 14:49:04

by Mike Snitzer

[permalink] [raw]
Subject: [for-6.11 PATCH 09/29] NFS: Manage boot verifier correctly in the case of localio

From: Trond Myklebust <[email protected]>

If there is a localio error, we want to manage the boot verifier in
a similar fashion to how it is done on the server.

Signed-off-by: Trond Myklebust <[email protected]>
Signed-off-by: Mike Snitzer <[email protected]>
---
fs/nfs/client.c | 3 +++
include/linux/nfs_fs_sb.h | 4 ++++
2 files changed, 7 insertions(+)

diff --git a/fs/nfs/client.c b/fs/nfs/client.c
index de77848ae654..dd3278dcfca8 100644
--- a/fs/nfs/client.c
+++ b/fs/nfs/client.c
@@ -178,6 +178,9 @@ struct nfs_client *nfs_alloc_client(const struct nfs_client_initdata *cl_init)
clp->cl_max_connect = cl_init->max_connect ? cl_init->max_connect : 1;
clp->cl_net = get_net(cl_init->net);

+ seqlock_init(&clp->cl_boot_lock);
+ ktime_get_real_ts64(&clp->cl_nfssvc_boot);
+
clp->cl_principal = "*";
clp->cl_xprtsec = cl_init->xprtsec;
return clp;
diff --git a/include/linux/nfs_fs_sb.h b/include/linux/nfs_fs_sb.h
index 92de074e63b9..82a6f66fe1d0 100644
--- a/include/linux/nfs_fs_sb.h
+++ b/include/linux/nfs_fs_sb.h
@@ -125,6 +125,10 @@ struct nfs_client {
struct net *cl_net;
struct list_head pending_cb_stateids;
struct rcu_head rcu;
+
+ /* localio */
+ struct timespec64 cl_nfssvc_boot;
+ seqlock_t cl_boot_lock;
};

/*
--
2.44.0


2024-06-07 14:49:36

by Mike Snitzer

[permalink] [raw]
Subject: [for-6.11 PATCH 10/29] nfs/nfsd: add "local io" support

From: Weston Andros Adamson <[email protected]>

Add client support for bypassing NFS for localhost reads, writes, and commits.

This is only useful when the client and the server are running on the same
host and in the same container.

This has dynamic binding with the nfsd module. Local i/o will only work if
nfsd is already loaded.

[snitm: rebase accounted for commit d8b26071e65e8 ("NFSD: simplify struct nfsfh")
and commit 7c98f7cb8fda ("remove call_{read,write}_iter() functions")]

Signed-off-by: Weston Andros Adamson <[email protected]>
Signed-off-by: Jeff Layton <[email protected]>
Signed-off-by: Peng Tao <[email protected]>
Signed-off-by: Lance Shelton <[email protected]>
Signed-off-by: Trond Myklebust <[email protected]>
Signed-off-by: Mike Snitzer <[email protected]>
---
fs/nfs/Makefile | 2 +-
fs/nfs/client.c | 12 +
fs/nfs/filelayout/filelayout.c | 6 +-
fs/nfs/flexfilelayout/flexfilelayout.c | 6 +-
fs/nfs/inode.c | 5 +
fs/nfs/internal.h | 32 +-
fs/nfs/localio.c | 933 +++++++++++++++++++++++++
fs/nfs/nfstrace.h | 29 +
fs/nfs/pagelist.c | 12 +-
fs/nfs/pnfs_nfs.c | 2 +-
fs/nfs/write.c | 14 +-
fs/nfsd/Makefile | 2 +-
fs/nfsd/filecache.c | 2 +-
fs/nfsd/localio.c | 179 +++++
fs/nfsd/trace.h | 3 +-
fs/nfsd/vfs.h | 8 +
include/linux/nfs.h | 6 +
include/linux/nfs_fs.h | 2 +
include/linux/nfs_fs_sb.h | 2 +
include/linux/nfs_xdr.h | 1 +
20 files changed, 1240 insertions(+), 18 deletions(-)
create mode 100644 fs/nfs/localio.c
create mode 100644 fs/nfsd/localio.c

diff --git a/fs/nfs/Makefile b/fs/nfs/Makefile
index 5f6db37f461e..af64cf5ea420 100644
--- a/fs/nfs/Makefile
+++ b/fs/nfs/Makefile
@@ -9,7 +9,7 @@ CFLAGS_nfstrace.o += -I$(src)
nfs-y := client.o dir.o file.o getroot.o inode.o super.o \
io.o direct.o pagelist.o read.o symlink.o unlink.o \
write.o namespace.o mount_clnt.o nfstrace.o \
- export.o sysfs.o fs_context.o
+ export.o sysfs.o fs_context.o localio.o
nfs-$(CONFIG_ROOT_NFS) += nfsroot.o
nfs-$(CONFIG_SYSCTL) += sysctl.o
nfs-$(CONFIG_NFS_FSCACHE) += fscache.o
diff --git a/fs/nfs/client.c b/fs/nfs/client.c
index dd3278dcfca8..288de750fd3b 100644
--- a/fs/nfs/client.c
+++ b/fs/nfs/client.c
@@ -170,6 +170,7 @@ struct nfs_client *nfs_alloc_client(const struct nfs_client_initdata *cl_init)
}

INIT_LIST_HEAD(&clp->cl_superblocks);
+ INIT_LIST_HEAD(&clp->cl_local_addrs);
clp->cl_rpcclient = ERR_PTR(-EINVAL);

clp->cl_flags = cl_init->init_flags;
@@ -183,6 +184,7 @@ struct nfs_client *nfs_alloc_client(const struct nfs_client_initdata *cl_init)

clp->cl_principal = "*";
clp->cl_xprtsec = cl_init->xprtsec;
+ nfs_probe_local_addr(clp);
return clp;

error_cleanup:
@@ -236,10 +238,19 @@ static void pnfs_init_server(struct nfs_server *server)
*/
void nfs_free_client(struct nfs_client *clp)
{
+ struct nfs_local_addr *addr, *tmp;
+
+ nfs_local_disable(clp);
+
/* -EIO all pending I/O */
if (!IS_ERR(clp->cl_rpcclient))
rpc_shutdown_client(clp->cl_rpcclient);

+ list_for_each_entry_safe(addr, tmp, &clp->cl_local_addrs, cl_addrs) {
+ list_del(&addr->cl_addrs);
+ kfree(addr);
+ }
+
put_net(clp->cl_net);
put_nfs_version(clp->cl_nfs_mod);
kfree(clp->cl_hostname);
@@ -427,6 +438,7 @@ struct nfs_client *nfs_get_client(const struct nfs_client_initdata *cl_init)
list_add_tail(&new->cl_share_link,
&nn->nfs_client_list);
spin_unlock(&nn->nfs_client_lock);
+ nfs_local_probe(new);
return rpc_ops->init_client(new, cl_init);
}

diff --git a/fs/nfs/filelayout/filelayout.c b/fs/nfs/filelayout/filelayout.c
index d66f2efbd92f..bd8c717c31d2 100644
--- a/fs/nfs/filelayout/filelayout.c
+++ b/fs/nfs/filelayout/filelayout.c
@@ -489,7 +489,7 @@ filelayout_read_pagelist(struct nfs_pageio_descriptor *desc,
/* Perform an asynchronous read to ds */
nfs_initiate_pgio(desc, ds->ds_clp, ds_clnt, hdr, hdr->cred,
NFS_PROTO(hdr->inode), &filelayout_read_call_ops,
- 0, RPC_TASK_SOFTCONN);
+ 0, RPC_TASK_SOFTCONN, NULL);
return PNFS_ATTEMPTED;
}

@@ -532,7 +532,7 @@ filelayout_write_pagelist(struct nfs_pageio_descriptor *desc,
/* Perform an asynchronous write */
nfs_initiate_pgio(desc, ds->ds_clp, ds_clnt, hdr, hdr->cred,
NFS_PROTO(hdr->inode), &filelayout_write_call_ops,
- sync, RPC_TASK_SOFTCONN);
+ sync, RPC_TASK_SOFTCONN, NULL);
return PNFS_ATTEMPTED;
}

@@ -1014,7 +1014,7 @@ static int filelayout_initiate_commit(struct nfs_commit_data *data, int how)
return nfs_initiate_commit(ds->ds_clp, ds_clnt, data,
NFS_PROTO(data->inode),
&filelayout_commit_call_ops, how,
- RPC_TASK_SOFTCONN);
+ RPC_TASK_SOFTCONN, NULL);
out_err:
pnfs_generic_prepare_to_resend_writes(data);
pnfs_generic_commit_release(data);
diff --git a/fs/nfs/flexfilelayout/flexfilelayout.c b/fs/nfs/flexfilelayout/flexfilelayout.c
index d7e9e5ef4085..ce6cb5d82427 100644
--- a/fs/nfs/flexfilelayout/flexfilelayout.c
+++ b/fs/nfs/flexfilelayout/flexfilelayout.c
@@ -1808,7 +1808,7 @@ ff_layout_read_pagelist(struct nfs_pageio_descriptor *desc,
ds->ds_clp->rpc_ops,
vers == 3 ? &ff_layout_read_call_ops_v3 :
&ff_layout_read_call_ops_v4,
- 0, RPC_TASK_SOFTCONN);
+ 0, RPC_TASK_SOFTCONN, NULL);
put_cred(ds_cred);
return PNFS_ATTEMPTED;

@@ -1878,7 +1878,7 @@ ff_layout_write_pagelist(struct nfs_pageio_descriptor *desc,
ds->ds_clp->rpc_ops,
vers == 3 ? &ff_layout_write_call_ops_v3 :
&ff_layout_write_call_ops_v4,
- sync, RPC_TASK_SOFTCONN);
+ sync, RPC_TASK_SOFTCONN, NULL);
put_cred(ds_cred);
return PNFS_ATTEMPTED;

@@ -1953,7 +1953,7 @@ static int ff_layout_initiate_commit(struct nfs_commit_data *data, int how)
ret = nfs_initiate_commit(ds->ds_clp, ds_clnt, data, ds->ds_clp->rpc_ops,
vers == 3 ? &ff_layout_commit_call_ops_v3 :
&ff_layout_commit_call_ops_v4,
- how, RPC_TASK_SOFTCONN);
+ how, RPC_TASK_SOFTCONN, NULL);
put_cred(ds_cred);
return ret;
out_err:
diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
index acef52ecb1bb..4f88b860494f 100644
--- a/fs/nfs/inode.c
+++ b/fs/nfs/inode.c
@@ -39,6 +39,7 @@
#include <linux/slab.h>
#include <linux/compat.h>
#include <linux/freezer.h>
+#include <linux/file.h>
#include <linux/uaccess.h>
#include <linux/iversion.h>

@@ -1053,6 +1054,7 @@ struct nfs_open_context *alloc_nfs_open_context(struct dentry *dentry,
ctx->lock_context.open_context = ctx;
INIT_LIST_HEAD(&ctx->list);
ctx->mdsthreshold = NULL;
+ ctx->local_filp = NULL;
return ctx;
}
EXPORT_SYMBOL_GPL(alloc_nfs_open_context);
@@ -1084,6 +1086,8 @@ static void __put_nfs_open_context(struct nfs_open_context *ctx, int is_sync)
nfs_sb_deactive(sb);
put_rpccred(rcu_dereference_protected(ctx->ll_cred, 1));
kfree(ctx->mdsthreshold);
+ if (!IS_ERR_OR_NULL(ctx->local_filp))
+ fput(ctx->local_filp);
kfree_rcu(ctx, rcu_head);
}

@@ -2495,6 +2499,7 @@ static int __init init_nfs_fs(void)
if (err)
goto out1;

+ nfs_local_init();
err = register_nfs_fs();
if (err)
goto out0;
diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
index 873c2339b78a..67b348447a40 100644
--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -204,6 +204,12 @@ struct nfs_mount_request {
struct net *net;
};

+struct nfs_local_addr {
+ struct list_head cl_addrs;
+ struct sockaddr_storage address;
+ size_t addrlen;
+};
+
extern int nfs_mount(struct nfs_mount_request *info, int timeo, int retrans);
extern void nfs_umount(const struct nfs_mount_request *info);

@@ -309,7 +315,8 @@ int nfs_generic_pgio(struct nfs_pageio_descriptor *, struct nfs_pgio_header *);
int nfs_initiate_pgio(struct nfs_pageio_descriptor *, struct nfs_client *clp,
struct rpc_clnt *rpc_clnt, struct nfs_pgio_header *hdr,
const struct cred *cred, const struct nfs_rpc_ops *rpc_ops,
- const struct rpc_call_ops *call_ops, int how, int flags);
+ const struct rpc_call_ops *call_ops, int how, int flags,
+ struct file *localio);
void nfs_free_request(struct nfs_page *req);
struct nfs_pgio_mirror *
nfs_pgio_current_mirror(struct nfs_pageio_descriptor *desc);
@@ -450,6 +457,26 @@ extern void nfs_set_cache_invalid(struct inode *inode, unsigned long flags);
extern bool nfs_check_cache_invalid(struct inode *, unsigned long);
extern int nfs_wait_bit_killable(struct wait_bit_key *key, int mode);

+/* localio.c */
+extern void nfs_local_init(void);
+extern void nfs_local_enable(struct nfs_client *);
+extern void nfs_local_disable(struct nfs_client *);
+extern void nfs_local_probe(struct nfs_client *);
+extern struct file *nfs_local_open_fh(struct nfs_client *, const struct cred *,
+ struct nfs_fh *, const fmode_t);
+extern struct file *nfs_local_file_open(struct nfs_client *clp,
+ const struct cred *cred,
+ struct nfs_fh *fh,
+ struct nfs_open_context *ctx);
+extern int nfs_local_doio(struct nfs_client *, struct file *,
+ struct nfs_pgio_header *,
+ const struct rpc_call_ops *);
+extern int nfs_local_commit(struct nfs_client *, struct file *,
+ struct nfs_commit_data *,
+ const struct rpc_call_ops *, int);
+extern void nfs_probe_local_addr(struct nfs_client *clnt);
+extern bool nfs_server_is_local(const struct nfs_client *clp);
+
/* super.c */
extern const struct super_operations nfs_sops;
bool nfs_auth_info_match(const struct nfs_auth_info *, rpc_authflavor_t);
@@ -530,7 +557,8 @@ extern int nfs_initiate_commit(struct nfs_client *clp,
struct nfs_commit_data *data,
const struct nfs_rpc_ops *nfs_ops,
const struct rpc_call_ops *call_ops,
- int how, int flags);
+ int how, int flags,
+ struct file *localio);
extern void nfs_init_commit(struct nfs_commit_data *data,
struct list_head *head,
struct pnfs_layout_segment *lseg,
diff --git a/fs/nfs/localio.c b/fs/nfs/localio.c
new file mode 100644
index 000000000000..5c69eb0fe7b6
--- /dev/null
+++ b/fs/nfs/localio.c
@@ -0,0 +1,933 @@
+/*
+ * linux/fs/nfs/localio.c
+ *
+ * Copyright (C) 2014 Weston Andros Adamson <[email protected]>
+ *
+ */
+
+#include <linux/module.h>
+#include <linux/errno.h>
+#include <linux/vfs.h>
+#include <linux/file.h>
+#include <linux/inet.h>
+#include <linux/sunrpc/addr.h>
+#include <linux/inetdevice.h>
+#include <net/addrconf.h>
+#include <linux/module.h>
+#include <linux/bvec.h>
+
+#include <linux/nfs.h>
+#include <linux/nfs_fs.h>
+#include <linux/nfs_xdr.h>
+
+#include <uapi/linux/if_arp.h>
+
+#include "internal.h"
+#include "pnfs.h"
+#include "nfstrace.h"
+
+#define NFSDBG_FACILITY NFSDBG_VFS
+
+extern int nfsd_open_local_fh(struct rpc_clnt *rpc_clnt,
+ const struct cred *cred,
+ const struct nfs_fh *nfs_fh, const fmode_t fmode,
+ struct file **pfilp);
+/*
+ * The localio code needs to call into nfsd to do the filehandle -> struct path
+ * mapping, but cannot be statically linked, because that will make the nfs
+ * module depend on the nfsd module.
+ *
+ * Instead, do dynamic linking to the nfsd module. This way the nfs module
+ * will only hold a reference on nfsd when it's actually in use. This also
+ * allows some sanity checking, like giving up on localio if nfsd isn't loaded.
+ */
+
+struct nfs_local_open_ctx {
+ spinlock_t lock;
+ nfs_to_nfsd_open_t open_f;
+ atomic_t refcount;
+};
+
+struct nfs_local_kiocb {
+ struct kiocb kiocb;
+ struct bio_vec *bvec;
+ struct nfs_pgio_header *hdr;
+ struct work_struct work;
+};
+
+struct nfs_local_fsync_ctx {
+ struct file *filp;
+ struct nfs_commit_data *data;
+ struct work_struct work;
+ struct kref kref;
+};
+static void nfs_local_fsync_work(struct work_struct *work);
+
+/*
+ * We need to translate between nfs status return values and
+ * the local errno values which may not be the same.
+ */
+static struct {
+ __u32 stat;
+ int errno;
+} nfs_errtbl[] = {
+ { NFS4_OK, 0 },
+ { NFS4ERR_PERM, -EPERM },
+ { NFS4ERR_NOENT, -ENOENT },
+ { NFS4ERR_IO, -EIO },
+ { NFS4ERR_NXIO, -ENXIO },
+ { NFS4ERR_FBIG, -E2BIG },
+ { NFS4ERR_STALE, -EBADF },
+ { NFS4ERR_ACCESS, -EACCES },
+ { NFS4ERR_EXIST, -EEXIST },
+ { NFS4ERR_XDEV, -EXDEV },
+ { NFS4ERR_MLINK, -EMLINK },
+ { NFS4ERR_NOTDIR, -ENOTDIR },
+ { NFS4ERR_ISDIR, -EISDIR },
+ { NFS4ERR_INVAL, -EINVAL },
+ { NFS4ERR_FBIG, -EFBIG },
+ { NFS4ERR_NOSPC, -ENOSPC },
+ { NFS4ERR_ROFS, -EROFS },
+ { NFS4ERR_NAMETOOLONG, -ENAMETOOLONG },
+ { NFS4ERR_NOTEMPTY, -ENOTEMPTY },
+ { NFS4ERR_DQUOT, -EDQUOT },
+ { NFS4ERR_STALE, -ESTALE },
+ { NFS4ERR_STALE, -EOPENSTALE },
+ { NFS4ERR_DELAY, -ETIMEDOUT },
+ { NFS4ERR_DELAY, -ERESTARTSYS },
+ { NFS4ERR_DELAY, -EAGAIN },
+ { NFS4ERR_DELAY, -ENOMEM },
+ { NFS4ERR_IO, -ETXTBSY },
+ { NFS4ERR_IO, -EBUSY },
+ { NFS4ERR_BADHANDLE, -EBADHANDLE },
+ { NFS4ERR_BAD_COOKIE, -EBADCOOKIE },
+ { NFS4ERR_NOTSUPP, -EOPNOTSUPP },
+ { NFS4ERR_TOOSMALL, -ETOOSMALL },
+ { NFS4ERR_SERVERFAULT, -ESERVERFAULT },
+ { NFS4ERR_SERVERFAULT, -ENFILE },
+ { NFS4ERR_IO, -EREMOTEIO },
+ { NFS4ERR_IO, -EUCLEAN },
+ { NFS4ERR_PERM, -ENOKEY },
+ { NFS4ERR_BADTYPE, -EBADTYPE },
+ { NFS4ERR_SYMLINK, -ELOOP },
+ { NFS4ERR_DEADLOCK, -EDEADLK },
+};
+
+/*
+ * Convert an NFS error code to a local one.
+ * This one is used jointly by NFSv2 and NFSv3.
+ */
+static __u32
+nfs4errno(int errno)
+{
+ unsigned int i;
+ for (i = 0; i < ARRAY_SIZE(nfs_errtbl); i++) {
+ if (nfs_errtbl[i].errno == errno)
+ return nfs_errtbl[i].stat;
+ }
+ /* If we cannot translate the error, the recovery routines should
+ * handle it.
+ * Note: remaining NFSv4 error codes have values > 10000, so should
+ * not conflict with native Linux error codes.
+ */
+ return NFS4ERR_SERVERFAULT;
+}
+
+static struct nfs_local_open_ctx __local_open_ctx __read_mostly;
+
+static bool localio_enabled __read_mostly = true;
+module_param(localio_enabled, bool, 0644);
+
+bool nfs_server_is_local(const struct nfs_client *clp)
+{
+ return test_bit(NFS_CS_LOCAL_IO, &clp->cl_flags) != 0 &&
+ localio_enabled;
+}
+EXPORT_SYMBOL_GPL(nfs_server_is_local);
+
+void
+nfs_local_init(void)
+{
+ struct nfs_local_open_ctx *ctx = &__local_open_ctx;
+
+ ctx->open_f = NULL;
+ spin_lock_init(&ctx->lock);
+ atomic_set(&ctx->refcount, 0);
+}
+
+static bool
+nfs_local_get_lookup_ctx(void)
+{
+ struct nfs_local_open_ctx *ctx = &__local_open_ctx;
+ nfs_to_nfsd_open_t fn = NULL;
+
+ spin_lock(&ctx->lock);
+ if (ctx->open_f == NULL) {
+ spin_unlock(&ctx->lock);
+
+ fn = symbol_request(nfsd_open_local_fh);
+ if (!fn)
+ return false;
+
+ spin_lock(&ctx->lock);
+ /* catch race */
+ if (ctx->open_f == NULL) {
+ ctx->open_f = fn;
+ fn = NULL;
+ }
+ }
+ atomic_inc(&ctx->refcount);
+ spin_unlock(&ctx->lock);
+ if (fn)
+ symbol_put(nfsd_open_local_fh);
+ return true;
+}
+
+static void
+nfs_local_put_lookup_ctx(void)
+{
+ struct nfs_local_open_ctx *ctx = &__local_open_ctx;
+ nfs_to_nfsd_open_t fn;
+
+ if (atomic_dec_and_lock(&ctx->refcount, &ctx->lock)) {
+ fn = ctx->open_f;
+ ctx->open_f = NULL;
+ spin_unlock(&ctx->lock);
+ if (fn)
+ symbol_put(nfsd_open_local_fh);
+ dprintk("destroy lookup context\n");
+ }
+}
+
+/*
+ * nfs_local_enable - attempt to enable local i/o for an nfs_client
+ */
+void
+nfs_local_enable(struct nfs_client *clp)
+{
+ if (nfs_local_get_lookup_ctx()) {
+ dprintk("enabled local i/o\n");
+ set_bit(NFS_CS_LOCAL_IO, &clp->cl_flags);
+ }
+}
+EXPORT_SYMBOL_GPL(nfs_local_enable);
+
+/*
+ * nfs_local_disable - disable local i/o for an nfs_client
+ */
+void
+nfs_local_disable(struct nfs_client *clp)
+{
+ if (test_and_clear_bit(NFS_CS_LOCAL_IO, &clp->cl_flags)) {
+ dprintk("disabled local i/o\n");
+ nfs_local_put_lookup_ctx();
+ }
+}
+
+/*
+ * nfs_local_probe - probe local i/o support for an nfs_client
+ */
+void
+nfs_local_probe(struct nfs_client *clp)
+{
+ struct sockaddr_in *sin;
+ struct sockaddr_in6 *sin6;
+ struct nfs_local_addr *addr;
+ struct sockaddr *sap;
+ bool enable = false;
+
+ switch (clp->cl_addr.ss_family) {
+ case AF_INET:
+ sin = (struct sockaddr_in *)&clp->cl_addr;
+ if (ipv4_is_loopback(sin->sin_addr.s_addr)) {
+ dprintk("%s: detected IPv4 loopback address\n",
+ __func__);
+ enable = true;
+ }
+ break;
+ case AF_INET6:
+ sin6 = (struct sockaddr_in6 *)&clp->cl_addr;
+ if (memcmp(&sin6->sin6_addr, &in6addr_loopback,
+ sizeof(struct in6_addr)) == 0) {
+ dprintk("%s: detected IPv6 loopback address\n",
+ __func__);
+ enable = true;
+ }
+ break;
+ default:
+ break;
+ }
+
+ if (enable)
+ goto out;
+
+ list_for_each_entry(addr, &clp->cl_local_addrs, cl_addrs) {
+ sap = (struct sockaddr *)&addr->address;
+ if (rpc_cmp_addr((struct sockaddr *)&clp->cl_addr, sap)) {
+ dprintk("%s: detected local server.\n", __func__);
+ enable = true;
+ break;
+ }
+ }
+
+out:
+ if (enable)
+ nfs_local_enable(clp);
+}
+
+/*
+ * nfs_local_open_fh - open a local filehandle
+ *
+ * Returns a pointer to a struct file or an ERR_PTR
+ */
+struct file *
+nfs_local_open_fh(struct nfs_client *clp, const struct cred *cred,
+ struct nfs_fh *fh, const fmode_t mode)
+{
+ struct nfs_local_open_ctx *ctx = &__local_open_ctx;
+ struct file *filp;
+ int status;
+
+ if (mode & ~(FMODE_READ | FMODE_WRITE))
+ return ERR_PTR(-EINVAL);
+
+ status = ctx->open_f(clp->cl_rpcclient, cred, fh, mode, &filp);
+ if (status < 0) {
+ dprintk("%s: open local file failed error=%d\n",
+ __func__, status);
+ trace_nfs_local_open_fh(fh, mode, status);
+ switch (status) {
+ case -ENXIO:
+ nfs_local_disable(clp);
+ fallthrough;
+ case -ETIMEDOUT:
+ status = -EAGAIN;
+ }
+ filp = ERR_PTR(status);
+ }
+ return filp;
+}
+EXPORT_SYMBOL_GPL(nfs_local_open_fh);
+
+static struct bio_vec *
+nfs_bvec_alloc_and_import_pagevec(struct page **pagevec,
+ unsigned int npages, gfp_t flags)
+{
+ struct bio_vec *bvec, *p;
+
+ bvec = kmalloc_array(npages, sizeof(*bvec), flags);
+ if (bvec != NULL) {
+ for (p = bvec; npages > 0; p++, pagevec++, npages--) {
+ p->bv_page = *pagevec;
+ p->bv_len = PAGE_SIZE;
+ p->bv_offset = 0;
+ }
+ }
+ return bvec;
+}
+
+static void
+nfs_local_iocb_free(struct nfs_local_kiocb *iocb)
+{
+ kfree(iocb->bvec);
+ kfree(iocb);
+}
+
+static struct nfs_local_kiocb *
+nfs_local_iocb_alloc(struct nfs_pgio_header *hdr, struct file *filp,
+ gfp_t flags)
+{
+ struct nfs_local_kiocb *iocb;
+
+ iocb = kmalloc(sizeof(*iocb), flags);
+ if (iocb == NULL)
+ return NULL;
+ iocb->bvec = nfs_bvec_alloc_and_import_pagevec(hdr->page_array.pagevec,
+ hdr->page_array.npages, flags);
+ if (iocb->bvec == NULL) {
+ kfree(iocb);
+ return NULL;
+ }
+ init_sync_kiocb(&iocb->kiocb, filp);
+ iocb->kiocb.ki_pos = hdr->args.offset;
+ iocb->hdr = hdr;
+ /* FIXME: NFS_IOHDR_ODIRECT isn't ever set */
+ if (test_bit(NFS_IOHDR_ODIRECT, &hdr->flags))
+ iocb->kiocb.ki_flags |= IOCB_DIRECT|IOCB_DSYNC;
+ iocb->kiocb.ki_flags &= ~IOCB_APPEND;
+ return iocb;
+}
+
+static void
+nfs_local_iter_init(struct iov_iter *i, struct nfs_local_kiocb *iocb, int dir)
+{
+ struct nfs_pgio_header *hdr = iocb->hdr;
+
+ if (hdr->args.pgbase != 0) {
+ iov_iter_bvec(i, dir, iocb->bvec,
+ hdr->page_array.npages,
+ hdr->args.count + hdr->args.pgbase);
+ iov_iter_advance(i, hdr->args.pgbase);
+ } else
+ iov_iter_bvec(i, dir, iocb->bvec,
+ hdr->page_array.npages, hdr->args.count);
+}
+
+static void
+nfs_local_hdr_release(struct nfs_pgio_header *hdr,
+ const struct rpc_call_ops *call_ops)
+{
+ call_ops->rpc_call_done(&hdr->task, hdr);
+ call_ops->rpc_release(hdr);
+}
+
+static void
+nfs_local_pgio_init(struct nfs_pgio_header *hdr,
+ const struct rpc_call_ops *call_ops)
+{
+ hdr->task.tk_ops = call_ops;
+ if (!hdr->task.tk_start)
+ hdr->task.tk_start = ktime_get();
+}
+
+static void
+nfs_local_pgio_done(struct nfs_pgio_header *hdr, long status)
+{
+ if (status >= 0) {
+ hdr->res.count = status;
+ hdr->res.op_status = NFS4_OK;
+ hdr->task.tk_status = 0;
+ } else {
+ hdr->res.op_status = nfs4errno(status);
+ hdr->task.tk_status = status;
+ }
+}
+
+static void
+nfs_local_pgio_release(struct nfs_local_kiocb *iocb)
+{
+ struct nfs_pgio_header *hdr = iocb->hdr;
+
+ fput(iocb->kiocb.ki_filp);
+ nfs_local_iocb_free(iocb);
+ nfs_local_hdr_release(hdr, hdr->task.tk_ops);
+}
+
+static void
+nfs_local_read_aio_complete_work(struct work_struct *work)
+{
+ struct nfs_local_kiocb *iocb = container_of(work,
+ struct nfs_local_kiocb, work);
+
+ nfs_local_pgio_release(iocb);
+}
+
+/*
+ * Complete the I/O from iocb->kiocb.ki_complete()
+ *
+ * Note that this function can be called from a bottom half context,
+ * hence we need to queue the fput() etc to a workqueue
+ */
+static void
+nfs_local_pgio_complete(struct nfs_local_kiocb *iocb)
+{
+ queue_work(nfsiod_workqueue, &iocb->work);
+}
+
+static void
+nfs_local_read_done(struct nfs_local_kiocb *iocb, long status)
+{
+ struct nfs_pgio_header *hdr = iocb->hdr;
+ struct file *filp = iocb->kiocb.ki_filp;
+
+ nfs_local_pgio_done(hdr, status);
+
+ if (hdr->res.count != hdr->args.count ||
+ hdr->args.offset + hdr->res.count >= i_size_read(file_inode(filp)))
+ hdr->res.eof = true;
+
+ dprintk("%s: read %ld bytes eof %d.\n", __func__,
+ status > 0 ? status : 0, hdr->res.eof);
+}
+
+static void
+nfs_local_read_aio_complete(struct kiocb *kiocb, long ret)
+{
+ struct nfs_local_kiocb *iocb = container_of(kiocb,
+ struct nfs_local_kiocb, kiocb);
+
+ nfs_local_read_done(iocb, ret);
+ nfs_local_pgio_complete(iocb);
+}
+
+static int
+nfs_do_local_read(struct nfs_pgio_header *hdr, struct file *filp,
+ const struct rpc_call_ops *call_ops)
+{
+ struct nfs_local_kiocb *iocb;
+ struct iov_iter iter;
+ ssize_t status;
+
+ dprintk("%s: vfs_read count=%u pos=%llu\n",
+ __func__, hdr->args.count, hdr->args.offset);
+
+ iocb = nfs_local_iocb_alloc(hdr, filp, GFP_KERNEL);
+ if (iocb == NULL)
+ return -ENOMEM;
+ nfs_local_iter_init(&iter, iocb, READ);
+
+ nfs_local_pgio_init(hdr, call_ops);
+ hdr->res.eof = false;
+
+ if (iocb->kiocb.ki_flags & IOCB_DIRECT) {
+ INIT_WORK(&iocb->work, nfs_local_read_aio_complete_work);
+ iocb->kiocb.ki_complete = nfs_local_read_aio_complete;
+ }
+
+ status = filp->f_op->read_iter(&iocb->kiocb, &iter);
+ if (status != -EIOCBQUEUED) {
+ nfs_local_read_done(iocb, status);
+ nfs_local_pgio_release(iocb);
+ }
+ return 0;
+}
+
+static void
+nfs_copy_boot_verifier(struct nfs_write_verifier *verifier, struct inode *inode)
+{
+ struct nfs_client *clp = NFS_SERVER(inode)->nfs_client;
+ u32 *verf = (u32 *)verifier->data;
+ int seq = 0;
+
+ do {
+ read_seqbegin_or_lock(&clp->cl_boot_lock, &seq);
+ verf[0] = (u32)clp->cl_nfssvc_boot.tv_sec;
+ verf[1] = (u32)clp->cl_nfssvc_boot.tv_nsec;
+ } while (need_seqretry(&clp->cl_boot_lock, seq));
+ done_seqretry(&clp->cl_boot_lock, seq);
+}
+
+static void
+nfs_reset_boot_verifier(struct inode *inode)
+{
+ struct nfs_client *clp = NFS_SERVER(inode)->nfs_client;
+
+ write_seqlock(&clp->cl_boot_lock);
+ ktime_get_real_ts64(&clp->cl_nfssvc_boot);
+ write_sequnlock(&clp->cl_boot_lock);
+}
+
+static void
+nfs_set_local_verifier(struct inode *inode,
+ struct nfs_writeverf *verf,
+ enum nfs3_stable_how how)
+{
+
+ nfs_copy_boot_verifier(&verf->verifier, inode);
+ verf->committed = how;
+}
+
+static void
+nfs_get_vfs_attr(struct file *filp, struct nfs_fattr *fattr)
+{
+ struct kstat stat;
+
+ if (fattr != NULL && vfs_getattr(&filp->f_path, &stat,
+ STATX_INO |
+ STATX_ATIME |
+ STATX_MTIME |
+ STATX_CTIME |
+ STATX_SIZE |
+ STATX_BLOCKS,
+ AT_STATX_SYNC_AS_STAT) == 0) {
+ fattr->valid = NFS_ATTR_FATTR_FILEID |
+ NFS_ATTR_FATTR_CHANGE |
+ NFS_ATTR_FATTR_SIZE |
+ NFS_ATTR_FATTR_ATIME |
+ NFS_ATTR_FATTR_MTIME |
+ NFS_ATTR_FATTR_CTIME |
+ NFS_ATTR_FATTR_SPACE_USED;
+ fattr->fileid = stat.ino;
+ fattr->size = stat.size;
+ fattr->atime = stat.atime;
+ fattr->mtime = stat.mtime;
+ fattr->ctime = stat.ctime;
+ fattr->change_attr = nfs_timespec_to_change_attr(&fattr->ctime);
+ fattr->du.nfs3.used = stat.blocks << 9;
+ }
+}
+
+static void
+nfs_local_write_done(struct nfs_local_kiocb *iocb, long status)
+{
+ struct nfs_pgio_header *hdr = iocb->hdr;
+
+ dprintk("%s: wrote %ld bytes.\n", __func__, status > 0 ? status : 0);
+
+ /* Handle short writes as if they are ENOSPC */
+ if (status > 0 && status < hdr->args.count) {
+ hdr->mds_offset += status;
+ hdr->args.offset += status;
+ hdr->args.pgbase += status;
+ hdr->args.count -= status;
+ nfs_set_pgio_error(hdr, -ENOSPC, hdr->args.offset);
+ status = -ENOSPC;
+ }
+ if (status < 0)
+ nfs_reset_boot_verifier(hdr->inode);
+ nfs_local_pgio_done(hdr, status);
+}
+
+static void
+nfs_local_write_aio_complete_work(struct work_struct *work)
+{
+ struct nfs_local_kiocb *iocb = container_of(work,
+ struct nfs_local_kiocb, work);
+
+ nfs_get_vfs_attr(iocb->kiocb.ki_filp, iocb->hdr->res.fattr);
+ nfs_local_pgio_release(iocb);
+}
+
+static void
+nfs_local_write_aio_complete(struct kiocb *kiocb, long ret)
+{
+ struct nfs_local_kiocb *iocb = container_of(kiocb,
+ struct nfs_local_kiocb, kiocb);
+
+ nfs_local_write_done(iocb, ret);
+ nfs_local_pgio_complete(iocb);
+}
+
+static int
+nfs_do_local_write(struct nfs_pgio_header *hdr, struct file *filp,
+ const struct rpc_call_ops *call_ops)
+{
+ struct nfs_local_kiocb *iocb;
+ struct iov_iter iter;
+ ssize_t status;
+
+ dprintk("%s: vfs_write count=%u pos=%llu %s\n",
+ __func__, hdr->args.count, hdr->args.offset,
+ (hdr->args.stable == NFS_UNSTABLE) ? "unstable" : "stable");
+
+ iocb = nfs_local_iocb_alloc(hdr, filp, GFP_NOIO);
+ if (iocb == NULL)
+ return -ENOMEM;
+ nfs_local_iter_init(&iter, iocb, WRITE);
+
+ switch (hdr->args.stable) {
+ default:
+ break;
+ case NFS_DATA_SYNC:
+ iocb->kiocb.ki_flags |= IOCB_DSYNC;
+ break;
+ case NFS_FILE_SYNC:
+ iocb->kiocb.ki_flags |= IOCB_DSYNC|IOCB_SYNC;
+ }
+ nfs_local_pgio_init(hdr, call_ops);
+
+ if (iocb->kiocb.ki_flags & IOCB_DIRECT) {
+ INIT_WORK(&iocb->work, nfs_local_write_aio_complete_work);
+ iocb->kiocb.ki_complete = nfs_local_write_aio_complete;
+ }
+
+ nfs_set_local_verifier(hdr->inode, hdr->res.verf, hdr->args.stable);
+
+ file_start_write(filp);
+ status = filp->f_op->write_iter(&iocb->kiocb, &iter);
+ file_end_write(filp);
+ if (status != -EIOCBQUEUED) {
+ nfs_local_write_done(iocb, status);
+ nfs_get_vfs_attr(filp, hdr->res.fattr);
+ nfs_local_pgio_release(iocb);
+ }
+ return 0;
+}
+
+static struct file *
+nfs_local_file_open_cached(struct nfs_client *clp, const struct cred *cred,
+ struct nfs_fh *fh, struct nfs_open_context *ctx)
+{
+ struct file *filp = ctx->local_filp;
+
+ if (!filp) {
+ struct file *new = nfs_local_open_fh(clp, cred, fh, ctx->mode);
+ if (IS_ERR_OR_NULL(new))
+ return NULL;
+ /* try to put this one in the slot */
+ filp = cmpxchg(&ctx->local_filp, NULL, new);
+ if (filp != NULL)
+ fput(new);
+ else
+ filp = new;
+ }
+ return get_file(filp);
+}
+
+struct file *
+nfs_local_file_open(struct nfs_client *clp, const struct cred *cred,
+ struct nfs_fh *fh, struct nfs_open_context *ctx)
+{
+ if (!nfs_server_is_local(clp))
+ return NULL;
+ return nfs_local_file_open_cached(clp, cred, fh, ctx);
+}
+
+int
+nfs_local_doio(struct nfs_client *clp, struct file *filp,
+ struct nfs_pgio_header *hdr,
+ const struct rpc_call_ops *call_ops)
+{
+ int status = 0;
+
+ if (!hdr->args.count)
+ goto out_fput;
+ /* Don't support filesystems without read_iter/write_iter */
+ if (!filp->f_op->read_iter || !filp->f_op->write_iter) {
+ nfs_local_disable(clp);
+ status = -EAGAIN;
+ goto out_fput;
+ }
+
+ switch (hdr->rw_mode) {
+ case FMODE_READ:
+ status = nfs_do_local_read(hdr, filp, call_ops);
+ break;
+ case FMODE_WRITE:
+ status = nfs_do_local_write(hdr, filp, call_ops);
+ break;
+ default:
+ dprintk("%s: invalid mode: %d\n", __func__,
+ hdr->rw_mode);
+ status = -EINVAL;
+ }
+out_fput:
+ if (status != 0) {
+ fput(filp);
+ hdr->task.tk_status = status;
+ nfs_local_hdr_release(hdr, call_ops);
+ }
+ return status;
+}
+
+static void
+nfs_local_init_commit(struct nfs_commit_data *data,
+ const struct rpc_call_ops *call_ops)
+{
+ data->task.tk_ops = call_ops;
+}
+
+static int
+nfs_local_run_commit(struct file *filp, struct nfs_commit_data *data)
+{
+ loff_t start = data->args.offset;
+ loff_t end = LLONG_MAX;
+
+ if (data->args.count > 0) {
+ end = start + data->args.count - 1;
+ if (end < start)
+ end = LLONG_MAX;
+ }
+
+ dprintk("%s: commit %llu - %llu\n", __func__, start, end);
+ return vfs_fsync_range(filp, start, end, 0);
+}
+
+static void
+nfs_local_commit_done(struct nfs_commit_data *data, int status)
+{
+ if (status >= 0) {
+ nfs_set_local_verifier(data->inode,
+ data->res.verf,
+ NFS_FILE_SYNC);
+ data->res.op_status = NFS4_OK;
+ data->task.tk_status = 0;
+ } else {
+ nfs_reset_boot_verifier(data->inode);
+ data->res.op_status = nfs4errno(status);
+ data->task.tk_status = status;
+ }
+}
+
+static void
+nfs_local_release_commit_data(struct file *filp,
+ struct nfs_commit_data *data,
+ const struct rpc_call_ops *call_ops)
+{
+ fput(filp);
+ call_ops->rpc_call_done(&data->task, data);
+ call_ops->rpc_release(data);
+}
+
+static struct nfs_local_fsync_ctx *
+nfs_local_fsync_ctx_alloc(struct nfs_commit_data *data, struct file *filp,
+ gfp_t flags)
+{
+ struct nfs_local_fsync_ctx *ctx = kmalloc(sizeof(*ctx), flags);
+
+ if (ctx != NULL) {
+ ctx->filp = filp;
+ ctx->data = data;
+ INIT_WORK(&ctx->work, nfs_local_fsync_work);
+ kref_init(&ctx->kref);
+ }
+ return ctx;
+}
+
+static void
+nfs_local_fsync_ctx_kref_free(struct kref *kref)
+{
+ kfree(container_of(kref, struct nfs_local_fsync_ctx, kref));
+}
+
+static void
+nfs_local_fsync_ctx_put(struct nfs_local_fsync_ctx *ctx)
+{
+ kref_put(&ctx->kref, nfs_local_fsync_ctx_kref_free);
+}
+
+static void
+nfs_local_fsync_ctx_free(struct nfs_local_fsync_ctx *ctx)
+{
+ nfs_local_release_commit_data(ctx->filp, ctx->data,
+ ctx->data->task.tk_ops);
+ nfs_local_fsync_ctx_put(ctx);
+}
+
+static void
+nfs_local_fsync_work(struct work_struct *work)
+{
+ struct nfs_local_fsync_ctx *ctx;
+ int status;
+
+ ctx = container_of(work, struct nfs_local_fsync_ctx, work);
+
+ status = nfs_local_run_commit(ctx->filp, ctx->data);
+ nfs_local_commit_done(ctx->data, status);
+ nfs_local_fsync_ctx_free(ctx);
+}
+
+int
+nfs_local_commit(struct nfs_client *clp, struct file *filp,
+ struct nfs_commit_data *data,
+ const struct rpc_call_ops *call_ops, int how)
+{
+ struct nfs_local_fsync_ctx *ctx;
+
+ ctx = nfs_local_fsync_ctx_alloc(data, filp, GFP_KERNEL);
+ if (!ctx) {
+ nfs_local_commit_done(data, -ENOMEM);
+ nfs_local_release_commit_data(filp, data, call_ops);
+ return -ENOMEM;
+ }
+
+ nfs_local_init_commit(data, call_ops);
+ kref_get(&ctx->kref);
+ queue_work(nfsiod_workqueue, &ctx->work);
+ if (how & FLUSH_SYNC)
+ flush_work(&ctx->work);
+ nfs_local_fsync_ctx_put(ctx);
+ return 0;
+}
+
+static int
+nfs_client_add_addr(struct nfs_client *clnt, char *buf, gfp_t flags)
+{
+ struct nfs_local_addr *addr;
+ struct sockaddr *sap;
+
+ dprintk("%s: adding new local IP %s\n", __func__, buf);
+ addr = kmalloc(sizeof(*addr), flags);
+ if (!addr) {
+ printk(KERN_WARNING "NFS: cannot alloc new addr\n");
+ return -ENOMEM;
+ }
+ sap = (struct sockaddr *)&addr->address;
+ addr->addrlen = rpc_pton(clnt->cl_net, buf, strlen(buf),
+ sap, sizeof(addr->address));
+ if (!addr->addrlen) {
+ printk(KERN_WARNING "NFS: cannot parse new addr %s\n",
+ buf);
+ kfree(addr);
+ return -EINVAL;
+ }
+ list_add(&addr->cl_addrs, &clnt->cl_local_addrs);
+
+ return 0;
+}
+
+static int
+nfs_client_add_v4_addr(struct nfs_client *clnt, struct in_device *indev,
+ char *buf, size_t buflen)
+{
+ struct in_ifaddr *ifa;
+ int ret;
+
+ in_dev_for_each_ifa_rtnl(ifa, indev) {
+ snprintf(buf, buflen, "%pI4", &ifa->ifa_local);
+ ret = nfs_client_add_addr(clnt, buf, GFP_KERNEL);
+ if (ret < 0)
+ return ret;
+ }
+
+ return 0;
+}
+
+#if IS_ENABLED(CONFIG_IPV6)
+static int
+nfs_client_add_v6_addr(struct nfs_client *clnt, struct inet6_dev *in6dev,
+ char *buf, size_t buflen)
+{
+ struct inet6_ifaddr *ifp;
+ int ret = 0;
+
+ read_lock_bh(&in6dev->lock);
+ list_for_each_entry(ifp, &in6dev->addr_list, if_list) {
+ rpc_ntop6_addr_noscopeid(&ifp->addr, buf, buflen);
+ ret = nfs_client_add_addr(clnt, buf, GFP_ATOMIC);
+ if (ret < 0)
+ goto out;
+ }
+out:
+ read_unlock_bh(&in6dev->lock);
+ return ret;
+}
+#else /* CONFIG_IPV6 */
+static int
+nfs_client_add_v6_addr(struct nfs_client *clnt, struct inet6_dev *in6dev,
+ char *buf, size_t buflen)
+{
+ return 0;
+}
+#endif
+
+/* Find out all local IP addresses. Ignore errors
+ * because local IO can be optional.
+ */
+void
+nfs_probe_local_addr(struct nfs_client *clnt)
+{
+ struct net_device *dev;
+ struct in_device *indev;
+ struct inet6_dev *in6dev;
+ char buf[INET6_ADDRSTRLEN + IPV6_SCOPE_ID_LEN];
+ size_t buflen = sizeof(buf);
+
+ rtnl_lock();
+
+ for_each_netdev(clnt->cl_net, dev) {
+ if (dev->type == ARPHRD_LOOPBACK ||
+ !(dev->flags & IFF_UP))
+ continue;
+ indev = __in_dev_get_rtnl(dev);
+ if (indev &&
+ nfs_client_add_v4_addr(clnt, indev, buf, buflen) < 0)
+ break;
+ in6dev = __in6_dev_get(dev);
+ if (in6dev &&
+ nfs_client_add_v6_addr(clnt, in6dev, buf, buflen) < 0)
+ break;
+ }
+
+ rtnl_unlock();
+}
diff --git a/fs/nfs/nfstrace.h b/fs/nfs/nfstrace.h
index 1e710654af11..45d4086cdeb1 100644
--- a/fs/nfs/nfstrace.h
+++ b/fs/nfs/nfstrace.h
@@ -1681,6 +1681,35 @@ TRACE_EVENT(nfs_mount_path,
TP_printk("path='%s'", __get_str(path))
);

+TRACE_EVENT(nfs_local_open_fh,
+ TP_PROTO(
+ const struct nfs_fh *fh,
+ fmode_t fmode,
+ int error
+ ),
+
+ TP_ARGS(fh, fmode, error),
+
+ TP_STRUCT__entry(
+ __field(int, error)
+ __field(u32, fhandle)
+ __field(unsigned int, fmode)
+ ),
+
+ TP_fast_assign(
+ __entry->error = error;
+ __entry->fhandle = nfs_fhandle_hash(fh);
+ __entry->fmode = (__force unsigned int)fmode;
+ ),
+
+ TP_printk(
+ "error=%d fhandle=0x%08x mode=%s",
+ __entry->error,
+ __entry->fhandle,
+ show_fs_fmode_flags(__entry->fmode)
+ )
+);
+
DECLARE_EVENT_CLASS(nfs_xdr_event,
TP_PROTO(
const struct xdr_stream *xdr,
diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
index 3786d767e2ff..9210a1821ec9 100644
--- a/fs/nfs/pagelist.c
+++ b/fs/nfs/pagelist.c
@@ -848,7 +848,8 @@ int nfs_initiate_pgio(struct nfs_pageio_descriptor *desc,
struct nfs_client *clp, struct rpc_clnt *rpc_clnt,
struct nfs_pgio_header *hdr, const struct cred *cred,
const struct nfs_rpc_ops *rpc_ops,
- const struct rpc_call_ops *call_ops, int how, int flags)
+ const struct rpc_call_ops *call_ops, int how, int flags,
+ struct file *localio)
{
struct rpc_task *task;
struct rpc_message msg = {
@@ -878,10 +879,16 @@ int nfs_initiate_pgio(struct nfs_pageio_descriptor *desc,
hdr->args.count,
(unsigned long long)hdr->args.offset);

+ if (localio) {
+ nfs_local_doio(clp, localio, hdr, call_ops);
+ goto out;
+ }
+
task = rpc_run_task(&task_setup_data);
if (IS_ERR(task))
return PTR_ERR(task);
rpc_put_task(task);
+out:
return 0;
}
EXPORT_SYMBOL_GPL(nfs_initiate_pgio);
@@ -1080,7 +1087,8 @@ static int nfs_generic_pg_pgios(struct nfs_pageio_descriptor *desc)
NFS_PROTO(hdr->inode),
desc->pg_rpc_callops,
desc->pg_ioflags,
- RPC_TASK_CRED_NOREF | task_flags);
+ RPC_TASK_CRED_NOREF | task_flags,
+ NULL);
}
return ret;
}
diff --git a/fs/nfs/pnfs_nfs.c b/fs/nfs/pnfs_nfs.c
index b29b50c2c933..ac3c5e6d4c5e 100644
--- a/fs/nfs/pnfs_nfs.c
+++ b/fs/nfs/pnfs_nfs.c
@@ -538,7 +538,7 @@ pnfs_generic_commit_pagelist(struct inode *inode, struct list_head *mds_pages,
NFS_CLIENT(inode), data,
NFS_PROTO(data->inode),
data->mds_ops, how,
- RPC_TASK_CRED_NOREF);
+ RPC_TASK_CRED_NOREF, NULL);
} else {
nfs_init_commit(data, NULL, data->lseg, cinfo);
initiate_commit(data, how);
diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index c9cfa1308264..ba0b36b15bc1 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -1672,7 +1672,8 @@ int nfs_initiate_commit(struct nfs_client *clp,
struct nfs_commit_data *data,
const struct nfs_rpc_ops *nfs_ops,
const struct rpc_call_ops *call_ops,
- int how, int flags)
+ int how, int flags,
+ struct file *localio)
{
struct rpc_task *task;
int priority = flush_task_priority(how);
@@ -1691,6 +1692,7 @@ int nfs_initiate_commit(struct nfs_client *clp,
.flags = RPC_TASK_ASYNC | flags,
.priority = priority,
};
+ int status = 0;

if (nfs_server_capable(data->inode, NFS_CAP_MOVEABLE))
task_setup_data.flags |= RPC_TASK_MOVEABLE;
@@ -1701,13 +1703,19 @@ int nfs_initiate_commit(struct nfs_client *clp,

dprintk("NFS: initiated commit call\n");

+ if (localio) {
+ nfs_local_commit(clp, localio, data, call_ops, how);
+ goto out;
+ }
+
task = rpc_run_task(&task_setup_data);
if (IS_ERR(task))
return PTR_ERR(task);
if (how & FLUSH_SYNC)
rpc_wait_for_completion_task(task);
rpc_put_task(task);
- return 0;
+out:
+ return status;
}
EXPORT_SYMBOL_GPL(nfs_initiate_commit);

@@ -1819,7 +1827,7 @@ nfs_commit_list(struct inode *inode, struct list_head *head, int how,
return nfs_initiate_commit(NFS_SERVER(inode)->nfs_client,
NFS_CLIENT(inode), data, NFS_PROTO(inode),
data->mds_ops, how,
- RPC_TASK_CRED_NOREF | task_flags);
+ RPC_TASK_CRED_NOREF | task_flags, NULL);
}

/*
diff --git a/fs/nfsd/Makefile b/fs/nfsd/Makefile
index b8736a82e57c..702f277394f1 100644
--- a/fs/nfsd/Makefile
+++ b/fs/nfsd/Makefile
@@ -13,7 +13,7 @@ nfsd-y += trace.o
nfsd-y += nfssvc.o nfsctl.o nfsfh.o vfs.o \
export.o auth.o lockd.o nfscache.o \
stats.o filecache.o nfs3proc.o nfs3xdr.o \
- netlink.o
+ netlink.o localio.o
nfsd-$(CONFIG_NFSD_V2) += nfsproc.o nfsxdr.o
nfsd-$(CONFIG_NFSD_V2_ACL) += nfs2acl.o
nfsd-$(CONFIG_NFSD_V3_ACL) += nfs3acl.o
diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
index ad9083ca144b..99631fa56662 100644
--- a/fs/nfsd/filecache.c
+++ b/fs/nfsd/filecache.c
@@ -52,7 +52,7 @@
#define NFSD_FILE_CACHE_UP (0)

/* We only care about NFSD_MAY_READ/WRITE for this cache */
-#define NFSD_FILE_MAY_MASK (NFSD_MAY_READ|NFSD_MAY_WRITE)
+#define NFSD_FILE_MAY_MASK (NFSD_MAY_READ|NFSD_MAY_WRITE|NFSD_MAY_LOCALIO)

static DEFINE_PER_CPU(unsigned long, nfsd_file_cache_hits);
static DEFINE_PER_CPU(unsigned long, nfsd_file_acquisitions);
diff --git a/fs/nfsd/localio.c b/fs/nfsd/localio.c
new file mode 100644
index 000000000000..ff68454a4017
--- /dev/null
+++ b/fs/nfsd/localio.c
@@ -0,0 +1,179 @@
+/*
+ * NFS server support for local clients to bypass network stack
+ *
+ * Copyright (C) 2014 Weston Andros Adamson <[email protected]>
+ */
+
+#include <linux/exportfs.h>
+#include <linux/sunrpc/svcauth_gss.h>
+#include <linux/sunrpc/clnt.h>
+#include <linux/nfs.h>
+#include <linux/string.h>
+
+#include "nfsd.h"
+#include "vfs.h"
+#include "netns.h"
+#include "filecache.h"
+
+#define NFSDDBG_FACILITY NFSDDBG_FH
+
+static void
+nfsd_local_fakerqst_destroy(struct svc_rqst *rqstp)
+{
+ if (rqstp->rq_client)
+ auth_domain_put(rqstp->rq_client);
+ if (rqstp->rq_cred.cr_group_info)
+ put_group_info(rqstp->rq_cred.cr_group_info);
+ kfree(rqstp->rq_cred.cr_principal);
+ kfree(rqstp->rq_xprt);
+ kfree(rqstp);
+}
+
+static struct svc_rqst *
+nfsd_local_fakerqst_create(struct rpc_clnt *rpc_clnt, const struct cred *cred)
+{
+ struct svc_rqst *rqstp;
+ struct net *net = rpc_net_ns(rpc_clnt);
+ struct nfsd_net *nn = net_generic(net, nfsd_net_id);
+ int status;
+
+ if (!nn->nfsd_serv) {
+ dprintk("%s: localio denied. Server not running\n", __func__);
+ return ERR_PTR(-ENXIO);
+ }
+
+ rqstp = kzalloc(sizeof(*rqstp), GFP_KERNEL);
+ if (!rqstp)
+ return ERR_PTR(-ENOMEM);
+
+ rqstp->rq_xprt = kzalloc(sizeof(*rqstp->rq_xprt), GFP_KERNEL);
+ if (!rqstp->rq_xprt) {
+ status = -ENOMEM;
+ goto out_err;
+ }
+
+ rqstp->rq_xprt->xpt_net = net;
+ __set_bit(RQ_SECURE, &rqstp->rq_flags);
+ rqstp->rq_proc = 1;
+ rqstp->rq_vers = 3;
+ rqstp->rq_prot = IPPROTO_TCP;
+ rqstp->rq_server = nn->nfsd_serv;
+
+ /* Note: we're connecting to ourself, so source addr == peer addr */
+ rqstp->rq_addrlen = rpc_peeraddr(rpc_clnt,
+ (struct sockaddr *)&rqstp->rq_addr,
+ sizeof(rqstp->rq_addr));
+
+ if (!rpcauth_map_to_svc_cred(rpc_clnt->cl_auth, cred,
+ &rqstp->rq_cred)) {
+ dprintk("%s :map cred failed\n", __func__);
+ status = -EINVAL;
+ goto out_err;
+ }
+
+ /*
+ * set up enough for svcauth_unix_set_client to be able to wait
+ * for the cache downcall. Note that we do _not_ want to allow the
+ * request to be deferred for later revisit since this rqst and xprt
+ * are not set up to run inside of the normal svc_rqst engine.
+ */
+ INIT_LIST_HEAD(&rqstp->rq_xprt->xpt_deferred);
+ kref_init(&rqstp->rq_xprt->xpt_ref);
+ spin_lock_init(&rqstp->rq_xprt->xpt_lock);
+ rqstp->rq_chandle.thread_wait = 5 * HZ;
+
+ status = svcauth_unix_set_client(rqstp);
+ switch (status) {
+ case SVC_OK:
+ break;
+ case SVC_DENIED:
+ status = -ENXIO;
+ dprintk("%s: client %pISpc denied localio access\n",
+ __func__, (struct sockaddr *)&rqstp->rq_addr);
+ goto out_err;
+ default:
+ status = -ETIMEDOUT;
+ dprintk("%s: client %pISpc temporarily denied localio access\n",
+ __func__, (struct sockaddr *)&rqstp->rq_addr);
+ goto out_err;
+ }
+
+ return rqstp;
+
+out_err:
+ nfsd_local_fakerqst_destroy(rqstp);
+ return ERR_PTR(status);
+}
+
+/*
+ * nfsd_open_local_fh - lookup a local filehandle @nfs_fh and map to @file
+ *
+ * This function maps a local fh to a path on a local filesystem.
+ * This is useful when the nfs client has the local server mounted - it can
+ * avoid all the NFS overhead with reads, writes and commits.
+ *
+ * on successful return, caller is responsible for calling path_put. Also
+ * note that this is called from nfs.ko via find_symbol() to avoid an explicit
+ * dependency on knfsd. So, there is no forward declaration in a header file
+ * for it.
+ */
+int nfsd_open_local_fh(struct rpc_clnt *rpc_clnt,
+ const struct cred *cred,
+ const struct nfs_fh *nfs_fh,
+ const fmode_t fmode,
+ struct file **pfilp)
+{
+ const struct cred *save_cred;
+ struct svc_rqst *rqstp;
+ struct svc_fh fh;
+ struct nfsd_file *nf;
+ int status = 0;
+ int mayflags = NFSD_MAY_LOCALIO;
+ __be32 beres;
+
+ /* Save creds before calling into nfsd */
+ save_cred = get_current_cred();
+
+ rqstp = nfsd_local_fakerqst_create(rpc_clnt, cred);
+ if (IS_ERR(rqstp)) {
+ status = PTR_ERR(rqstp);
+ goto out_revertcred;
+ }
+
+ /* nfs_fh -> svc_fh */
+ if (nfs_fh->size > NFS4_FHSIZE) {
+ status = -EINVAL;
+ goto out;
+ }
+ fh_init(&fh, NFS4_FHSIZE);
+ fh.fh_handle.fh_size = nfs_fh->size;
+ memcpy(fh.fh_handle.fh_raw, nfs_fh->data, nfs_fh->size);
+
+ if (fmode & FMODE_READ)
+ mayflags |= NFSD_MAY_READ;
+ if (fmode & FMODE_WRITE)
+ mayflags |= NFSD_MAY_WRITE;
+
+ beres = nfsd_file_acquire(rqstp, &fh, mayflags, &nf);
+ if (beres) {
+ status = nfs_stat_to_errno(be32_to_cpu(beres));
+ dprintk("%s: fh_verify failed %d\n", __func__, status);
+ goto out_fh_put;
+ }
+
+ *pfilp = get_file(nf->nf_file);
+
+ nfsd_file_put(nf);
+out_fh_put:
+ fh_put(&fh);
+
+out:
+ nfsd_local_fakerqst_destroy(rqstp);
+out_revertcred:
+ revert_creds(save_cred);
+ return status;
+}
+EXPORT_SYMBOL_GPL(nfsd_open_local_fh);
+
+/* Compile time type checking, not used by anything */
+static nfs_to_nfsd_open_t __maybe_unused nfsd_open_local_fh_typecheck = nfsd_open_local_fh;
diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h
index 77bbd23aa150..9c0610fdd11c 100644
--- a/fs/nfsd/trace.h
+++ b/fs/nfsd/trace.h
@@ -86,7 +86,8 @@ DEFINE_NFSD_XDR_ERR_EVENT(cant_encode);
{ NFSD_MAY_NOT_BREAK_LEASE, "NOT_BREAK_LEASE" }, \
{ NFSD_MAY_BYPASS_GSS, "BYPASS_GSS" }, \
{ NFSD_MAY_READ_IF_EXEC, "READ_IF_EXEC" }, \
- { NFSD_MAY_64BIT_COOKIE, "64BIT_COOKIE" })
+ { NFSD_MAY_64BIT_COOKIE, "64BIT_COOKIE" }, \
+ { NFSD_MAY_LOCALIO, "LOCALIO" })

TRACE_EVENT(nfsd_compound,
TP_PROTO(
diff --git a/fs/nfsd/vfs.h b/fs/nfsd/vfs.h
index 57cd70062048..91c50649a8c7 100644
--- a/fs/nfsd/vfs.h
+++ b/fs/nfsd/vfs.h
@@ -36,6 +36,8 @@
#define NFSD_MAY_CREATE (NFSD_MAY_EXEC|NFSD_MAY_WRITE)
#define NFSD_MAY_REMOVE (NFSD_MAY_EXEC|NFSD_MAY_WRITE|NFSD_MAY_TRUNC)

+#define NFSD_MAY_LOCALIO 0x800000
+
struct nfsd_file;

/*
@@ -158,6 +160,12 @@ __be32 nfsd_permission(struct svc_rqst *, struct svc_export *,

void nfsd_filp_close(struct file *fp);

+int nfsd_open_local_fh(struct rpc_clnt *rpc_clnt,
+ const struct cred *cred,
+ const struct nfs_fh *nfs_fh,
+ const fmode_t fmode,
+ struct file **pfilp);
+
static inline int fh_want_write(struct svc_fh *fh)
{
int ret;
diff --git a/include/linux/nfs.h b/include/linux/nfs.h
index b94f51d17bc5..80843764fad3 100644
--- a/include/linux/nfs.h
+++ b/include/linux/nfs.h
@@ -8,6 +8,8 @@
#ifndef _LINUX_NFS_H
#define _LINUX_NFS_H

+#include <linux/cred.h>
+#include <linux/sunrpc/auth.h>
#include <linux/sunrpc/msg_prot.h>
#include <linux/string.h>
#include <linux/errno.h>
@@ -109,6 +111,10 @@ static inline int nfs_stat_to_errno(enum nfs_stat status)
return nfs_common_errtbl[i].errno;
}

+typedef int (*nfs_to_nfsd_open_t)(struct rpc_clnt *, const struct cred *,
+ const struct nfs_fh *, const fmode_t,
+ struct file **);
+
#ifdef CONFIG_CRC32
/**
* nfs_fhandle_hash - calculate the crc32 hash for the filehandle
diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h
index 039898d70954..a0bb947fdd1d 100644
--- a/include/linux/nfs_fs.h
+++ b/include/linux/nfs_fs.h
@@ -96,6 +96,8 @@ struct nfs_open_context {
struct list_head list;
struct nfs4_threshold *mdsthreshold;
struct rcu_head rcu_head;
+
+ struct file *local_filp;
};

struct nfs_open_dir_context {
diff --git a/include/linux/nfs_fs_sb.h b/include/linux/nfs_fs_sb.h
index 82a6f66fe1d0..6b603b0247f1 100644
--- a/include/linux/nfs_fs_sb.h
+++ b/include/linux/nfs_fs_sb.h
@@ -49,12 +49,14 @@ struct nfs_client {
#define NFS_CS_DS 7 /* - Server is a DS */
#define NFS_CS_REUSEPORT 8 /* - reuse src port on reconnect */
#define NFS_CS_PNFS 9 /* - Server used for pnfs */
+#define NFS_CS_LOCAL_IO 10 /* - client is local */
struct sockaddr_storage cl_addr; /* server identifier */
size_t cl_addrlen;
char * cl_hostname; /* hostname of server */
char * cl_acceptor; /* GSSAPI acceptor name */
struct list_head cl_share_link; /* link in global client list */
struct list_head cl_superblocks; /* List of nfs_server structs */
+ struct list_head cl_local_addrs; /* List of local addresses */

struct rpc_clnt * cl_rpcclient;
const struct nfs_rpc_ops *rpc_ops; /* NFS protocol vector */
diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h
index d09b9773b20c..764513a61601 100644
--- a/include/linux/nfs_xdr.h
+++ b/include/linux/nfs_xdr.h
@@ -1605,6 +1605,7 @@ enum {
NFS_IOHDR_RESEND_PNFS,
NFS_IOHDR_RESEND_MDS,
NFS_IOHDR_UNSTABLE_WRITES,
+ NFS_IOHDR_ODIRECT,
};

struct nfs_io_completion;
--
2.44.0


2024-06-07 14:50:30

by Mike Snitzer

[permalink] [raw]
Subject: [for-6.11 PATCH 15/29] NFS: Don't call filesystem write() routines directly

From: Trond Myklebust <[email protected]>

Some filesystem writeback routines can end up taking up a lot of stack
space (particularly xfs). Instead of risking running over due to the
extra overhead from the NFS stack, we should just call these routines
from a workqueue job.

Signed-off-by: Trond Myklebust <[email protected]>
Signed-off-by: Mike Snitzer <[email protected]>
---
fs/nfs/localio.c | 51 ++++++++++++++++++++++++++++++++++++------------
1 file changed, 38 insertions(+), 13 deletions(-)

diff --git a/fs/nfs/localio.c b/fs/nfs/localio.c
index 5939ca2216be..2c6811b20dcf 100644
--- a/fs/nfs/localio.c
+++ b/fs/nfs/localio.c
@@ -63,6 +63,12 @@ struct nfs_local_fsync_ctx {
};
static void nfs_local_fsync_work(struct work_struct *work);

+struct nfs_local_io_args {
+ struct nfs_local_kiocb *iocb;
+ struct work_struct work;
+ struct completion *done;
+};
+
/*
* We need to translate between nfs status return values and
* the local errno values which may not be the same.
@@ -597,14 +603,35 @@ nfs_local_write_aio_complete(struct kiocb *kiocb, long ret)
nfs_local_pgio_complete(iocb);
}

-static int
-nfs_do_local_write(struct nfs_pgio_header *hdr, struct file *filp,
- const struct rpc_call_ops *call_ops)
+static void nfs_local_call_write(struct work_struct *work)
{
- struct nfs_local_kiocb *iocb;
+ struct nfs_local_io_args *args =
+ container_of(work, struct nfs_local_io_args, work);
+ struct nfs_local_kiocb *iocb = args->iocb;
+ struct file *filp = iocb->kiocb.ki_filp;
struct iov_iter iter;
ssize_t status;

+ nfs_local_iter_init(&iter, iocb, WRITE);
+
+ file_start_write(filp);
+ status = filp->f_op->write_iter(&iocb->kiocb, &iter);
+ file_end_write(filp);
+ if (status != -EIOCBQUEUED) {
+ nfs_local_write_done(iocb, status);
+ nfs_get_vfs_attr(filp, iocb->hdr->res.fattr);
+ nfs_local_pgio_release(iocb);
+ }
+ complete(args->done);
+}
+
+static int nfs_do_local_write(struct nfs_pgio_header *hdr, struct file *filp,
+ const struct rpc_call_ops *call_ops)
+{
+ struct nfs_local_io_args args;
+ DECLARE_COMPLETION_ONSTACK(done);
+ struct nfs_local_kiocb *iocb;
+
dprintk("%s: vfs_write count=%u pos=%llu %s\n",
__func__, hdr->args.count, hdr->args.offset,
(hdr->args.stable == NFS_UNSTABLE) ? "unstable" : "stable");
@@ -612,7 +639,6 @@ nfs_do_local_write(struct nfs_pgio_header *hdr, struct file *filp,
iocb = nfs_local_iocb_alloc(hdr, filp, GFP_NOIO);
if (iocb == NULL)
return -ENOMEM;
- nfs_local_iter_init(&iter, iocb, WRITE);

switch (hdr->args.stable) {
default:
@@ -632,14 +658,13 @@ nfs_do_local_write(struct nfs_pgio_header *hdr, struct file *filp,

nfs_set_local_verifier(hdr->inode, hdr->res.verf, hdr->args.stable);

- file_start_write(filp);
- status = filp->f_op->write_iter(&iocb->kiocb, &iter);
- file_end_write(filp);
- if (status != -EIOCBQUEUED) {
- nfs_local_write_done(iocb, status);
- nfs_get_vfs_attr(filp, hdr->res.fattr);
- nfs_local_pgio_release(iocb);
- }
+ args.iocb = iocb;
+ args.done = &done;
+ INIT_WORK_ONSTACK(&args.work, nfs_local_call_write);
+
+ queue_work(nfsiod_workqueue, &args.work);
+ wait_for_completion(&done);
+ destroy_work_on_stack(&args.work);
return 0;
}

--
2.44.0


2024-06-07 14:50:33

by Mike Snitzer

[permalink] [raw]
Subject: [for-6.11 PATCH 17/29] NFS: Use completion rather than flush_work() in nfs_local_commit()

From: Trond Myklebust <[email protected]>

Make the code consistent with other routines.

Signed-off-by: Trond Myklebust <[email protected]>
Signed-off-by: Mike Snitzer <[email protected]>
---
fs/nfs/localio.c | 14 +++++++++++---
1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/fs/nfs/localio.c b/fs/nfs/localio.c
index d997f0a96627..d7918e26aeb6 100644
--- a/fs/nfs/localio.c
+++ b/fs/nfs/localio.c
@@ -60,6 +60,7 @@ struct nfs_local_fsync_ctx {
struct nfs_commit_data *data;
struct work_struct work;
struct kref kref;
+ struct completion *done;
};
static void nfs_local_fsync_work(struct work_struct *work);

@@ -813,6 +814,7 @@ nfs_local_fsync_ctx_alloc(struct nfs_commit_data *data, struct file *filp,
ctx->data = data;
INIT_WORK(&ctx->work, nfs_local_fsync_work);
kref_init(&ctx->kref);
+ ctx->done = NULL;
}
return ctx;
}
@@ -847,6 +849,8 @@ nfs_local_fsync_work(struct work_struct *work)

status = nfs_local_run_commit(ctx->filp, ctx->data);
nfs_local_commit_done(ctx->data, status);
+ if (ctx->done != NULL)
+ complete(ctx->done);
nfs_local_fsync_ctx_free(ctx);
}

@@ -866,9 +870,13 @@ nfs_local_commit(struct nfs_client *clp, struct file *filp,

nfs_local_init_commit(data, call_ops);
kref_get(&ctx->kref);
- queue_work(nfsiod_workqueue, &ctx->work);
- if (how & FLUSH_SYNC)
- flush_work(&ctx->work);
+ if (how & FLUSH_SYNC) {
+ DECLARE_COMPLETION_ONSTACK(done);
+ ctx->done = &done;
+ queue_work(nfsiod_workqueue, &ctx->work);
+ wait_for_completion(&done);
+ } else
+ queue_work(nfsiod_workqueue, &ctx->work);
nfs_local_fsync_ctx_put(ctx);
return 0;
}
--
2.44.0


2024-06-07 14:50:36

by Mike Snitzer

[permalink] [raw]
Subject: [for-6.11 PATCH 18/29] NFS: localio writes need to use a normal workqueue

From: Trond Myklebust <[email protected]>

When we start getting low on space, XFS goes and calls flush_work() on a
non-memreclaim work queue, which causes a priority inversion problem.

Signed-off-by: Trond Myklebust <[email protected]>
Signed-off-by: Mike Snitzer <[email protected]>
---
fs/nfs/inode.c | 24 +++++++++++++++++++-----
fs/nfs/internal.h | 1 +
fs/nfs/localio.c | 4 ++--
3 files changed, 22 insertions(+), 7 deletions(-)

diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
index 4f88b860494f..b80469bce8df 100644
--- a/fs/nfs/inode.c
+++ b/fs/nfs/inode.c
@@ -2394,6 +2394,7 @@ static void nfs_destroy_inodecache(void)
kmem_cache_destroy(nfs_inode_cachep);
}

+struct workqueue_struct *nfssync_workqueue;
struct workqueue_struct *nfsiod_workqueue;
EXPORT_SYMBOL_GPL(nfsiod_workqueue);

@@ -2404,9 +2405,17 @@ static int nfsiod_start(void)
{
struct workqueue_struct *wq;
dprintk("RPC: creating workqueue nfsiod\n");
+ wq = alloc_workqueue("nfs-sync", WQ_UNBOUND, 0);
+ if (wq == NULL)
+ return -ENOMEM;
+ nfssync_workqueue = wq;
wq = alloc_workqueue("nfsiod", WQ_MEM_RECLAIM | WQ_UNBOUND, 0);
- if (wq == NULL)
+ if (wq == NULL) {
+ wq = nfssync_workqueue;
+ nfsiod_workqueue = NULL;
+ destroy_workqueue(wq);
return -ENOMEM;
+ }
nfsiod_workqueue = wq;
return 0;
}
@@ -2419,10 +2428,15 @@ static void nfsiod_stop(void)
struct workqueue_struct *wq;

wq = nfsiod_workqueue;
- if (wq == NULL)
- return;
- nfsiod_workqueue = NULL;
- destroy_workqueue(wq);
+ if (wq != NULL) {
+ nfsiod_workqueue = NULL;
+ destroy_workqueue(wq);
+ }
+ wq = nfssync_workqueue;
+ if (wq != NULL) {
+ nfssync_workqueue = NULL;
+ destroy_workqueue(wq);
+ }
}

unsigned int nfs_net_id;
diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
index 67b348447a40..0927a1704bbb 100644
--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -446,6 +446,7 @@ int nfs_check_flags(int);

/* inode.c */
extern struct workqueue_struct *nfsiod_workqueue;
+extern struct workqueue_struct *nfssync_workqueue;
extern struct inode *nfs_alloc_inode(struct super_block *sb);
extern void nfs_free_inode(struct inode *);
extern int nfs_write_inode(struct inode *, struct writeback_control *);
diff --git a/fs/nfs/localio.c b/fs/nfs/localio.c
index d7918e26aeb6..d724f8d4dd65 100644
--- a/fs/nfs/localio.c
+++ b/fs/nfs/localio.c
@@ -511,7 +511,7 @@ static int nfs_do_local_read(struct nfs_pgio_header *hdr, struct file *filp,
args.done = &done;
INIT_WORK_ONSTACK(&args.work, nfs_local_call_read);

- queue_work(nfsiod_workqueue, &args.work);
+ queue_work(nfssync_workqueue, &args.work);
wait_for_completion(&done);
destroy_work_on_stack(&args.work);
return 0;
@@ -682,7 +682,7 @@ static int nfs_do_local_write(struct nfs_pgio_header *hdr, struct file *filp,
args.done = &done;
INIT_WORK_ONSTACK(&args.work, nfs_local_call_write);

- queue_work(nfsiod_workqueue, &args.work);
+ queue_work(nfssync_workqueue, &args.work);
wait_for_completion(&done);
destroy_work_on_stack(&args.work);
return 0;
--
2.44.0


2024-06-07 14:50:41

by Mike Snitzer

[permalink] [raw]
Subject: [for-6.11 PATCH 20/29] nfs/localio: discontinue network address based localio setup

Prepares for determining localio via client and server handshake.

The primary reason to avoid the current approach of probing all local
network interfaces for the client (during nfs_alloc_client) to use as
the basis for _inferring_ that client's server is local
(in nfs_local_probe) is: matching IP addresses is brittle, especially
so when you have network namespaces (i.e. containers), or when you
play games with NAT or iptables.

This commit also reverts an earlier commit ("sunrpc: add and export
rpc_ntop6_addr_noscopeid") which was only needed/useful in the context
of localio's sockaddr based matching.

Signed-off-by: Mike Snitzer <[email protected]>
---
fs/nfs/client.c | 9 --
fs/nfs/flexfilelayout/flexfilelayoutdev.c | 21 +---
fs/nfs/internal.h | 7 --
fs/nfs/localio.c | 145 +---------------------
include/linux/nfs_fs_sb.h | 1 -
include/linux/sunrpc/addr.h | 9 --
net/sunrpc/addr.c | 19 +--
7 files changed, 8 insertions(+), 203 deletions(-)

diff --git a/fs/nfs/client.c b/fs/nfs/client.c
index 288de750fd3b..c42faaed508c 100644
--- a/fs/nfs/client.c
+++ b/fs/nfs/client.c
@@ -170,7 +170,6 @@ struct nfs_client *nfs_alloc_client(const struct nfs_client_initdata *cl_init)
}

INIT_LIST_HEAD(&clp->cl_superblocks);
- INIT_LIST_HEAD(&clp->cl_local_addrs);
clp->cl_rpcclient = ERR_PTR(-EINVAL);

clp->cl_flags = cl_init->init_flags;
@@ -184,7 +183,6 @@ struct nfs_client *nfs_alloc_client(const struct nfs_client_initdata *cl_init)

clp->cl_principal = "*";
clp->cl_xprtsec = cl_init->xprtsec;
- nfs_probe_local_addr(clp);
return clp;

error_cleanup:
@@ -238,19 +236,12 @@ static void pnfs_init_server(struct nfs_server *server)
*/
void nfs_free_client(struct nfs_client *clp)
{
- struct nfs_local_addr *addr, *tmp;
-
nfs_local_disable(clp);

/* -EIO all pending I/O */
if (!IS_ERR(clp->cl_rpcclient))
rpc_shutdown_client(clp->cl_rpcclient);

- list_for_each_entry_safe(addr, tmp, &clp->cl_local_addrs, cl_addrs) {
- list_del(&addr->cl_addrs);
- kfree(addr);
- }
-
put_net(clp->cl_net);
put_nfs_version(clp->cl_nfs_mod);
kfree(clp->cl_hostname);
diff --git a/fs/nfs/flexfilelayout/flexfilelayoutdev.c b/fs/nfs/flexfilelayout/flexfilelayoutdev.c
index af329d9b7d1e..e58bedfb1dcc 100644
--- a/fs/nfs/flexfilelayout/flexfilelayoutdev.c
+++ b/fs/nfs/flexfilelayout/flexfilelayoutdev.c
@@ -348,22 +348,6 @@ ff_layout_init_mirror_ds(struct pnfs_layout_hdr *lo,
return false;
}

-static bool ff_layout_ds_is_local(struct nfs4_pnfs_ds *ds)
-{
- struct nfs_local_addr *addr;
- struct sockaddr *sap;
- struct nfs4_pnfs_ds_addr *da;
-
- list_for_each_entry(da, &ds->ds_addrs, da_node) {
- sap = (struct sockaddr *)&da->da_addr;
- list_for_each_entry(addr, &ds->ds_clp->cl_local_addrs, cl_addrs)
- if (rpc_cmp_addr((struct sockaddr *)&addr->address, sap))
- return true;
- }
-
- return false;
-}
-
/**
* nfs4_ff_layout_prepare_ds - prepare a DS connection for an RPC call
* @lseg: the layout segment we're operating on
@@ -416,10 +400,7 @@ nfs4_ff_layout_prepare_ds(struct pnfs_layout_segment *lseg,
* keep ds_clp even if DS is local, so that if local IO cannot
* proceed somehow, we can fall back to NFS whenever we want.
*/
- if (ff_layout_ds_is_local(ds)) {
- dprintk("%s: found local DS\n", __func__);
- nfs_local_enable(ds->ds_clp);
- }
+ nfs_local_probe(ds->ds_clp);
max_payload =
nfs_block_size(rpc_max_payload(ds->ds_clp->cl_rpcclient),
NULL);
diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
index 0927a1704bbb..6d75466ad356 100644
--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -204,12 +204,6 @@ struct nfs_mount_request {
struct net *net;
};

-struct nfs_local_addr {
- struct list_head cl_addrs;
- struct sockaddr_storage address;
- size_t addrlen;
-};
-
extern int nfs_mount(struct nfs_mount_request *info, int timeo, int retrans);
extern void nfs_umount(const struct nfs_mount_request *info);

@@ -475,7 +469,6 @@ extern int nfs_local_doio(struct nfs_client *, struct file *,
extern int nfs_local_commit(struct nfs_client *, struct file *,
struct nfs_commit_data *,
const struct rpc_call_ops *, int);
-extern void nfs_probe_local_addr(struct nfs_client *clnt);
extern bool nfs_server_is_local(const struct nfs_client *clp);

/* super.c */
diff --git a/fs/nfs/localio.c b/fs/nfs/localio.c
index d724f8d4dd65..96349b6e7585 100644
--- a/fs/nfs/localio.c
+++ b/fs/nfs/localio.c
@@ -20,8 +20,6 @@
#include <linux/nfs_fs.h>
#include <linux/nfs_xdr.h>

-#include <uapi/linux/if_arp.h>
-
#include "internal.h"
#include "pnfs.h"
#include "nfstrace.h"
@@ -216,7 +214,6 @@ nfs_local_enable(struct nfs_client *clp)
trace_nfs_local_enable(clp);
}
}
-EXPORT_SYMBOL_GPL(nfs_local_enable);

/*
* nfs_local_disable - disable local i/o for an nfs_client
@@ -236,50 +233,12 @@ nfs_local_disable(struct nfs_client *clp)
void
nfs_local_probe(struct nfs_client *clp)
{
- struct sockaddr_in *sin;
- struct sockaddr_in6 *sin6;
- struct nfs_local_addr *addr;
- struct sockaddr *sap;
bool enable = false;

- switch (clp->cl_addr.ss_family) {
- case AF_INET:
- sin = (struct sockaddr_in *)&clp->cl_addr;
- if (ipv4_is_loopback(sin->sin_addr.s_addr)) {
- dprintk("%s: detected IPv4 loopback address\n",
- __func__);
- enable = true;
- }
- break;
- case AF_INET6:
- sin6 = (struct sockaddr_in6 *)&clp->cl_addr;
- if (memcmp(&sin6->sin6_addr, &in6addr_loopback,
- sizeof(struct in6_addr)) == 0) {
- dprintk("%s: detected IPv6 loopback address\n",
- __func__);
- enable = true;
- }
- break;
- default:
- break;
- }
-
- if (enable)
- goto out;
-
- list_for_each_entry(addr, &clp->cl_local_addrs, cl_addrs) {
- sap = (struct sockaddr *)&addr->address;
- if (rpc_cmp_addr((struct sockaddr *)&clp->cl_addr, sap)) {
- dprintk("%s: detected local server.\n", __func__);
- enable = true;
- break;
- }
- }
-
-out:
if (enable)
nfs_local_enable(clp);
}
+EXPORT_SYMBOL_GPL(nfs_local_probe);

/*
* nfs_local_open_fh - open a local filehandle
@@ -880,105 +839,3 @@ nfs_local_commit(struct nfs_client *clp, struct file *filp,
nfs_local_fsync_ctx_put(ctx);
return 0;
}
-
-static int
-nfs_client_add_addr(struct nfs_client *clnt, char *buf, gfp_t flags)
-{
- struct nfs_local_addr *addr;
- struct sockaddr *sap;
-
- dprintk("%s: adding new local IP %s\n", __func__, buf);
- addr = kmalloc(sizeof(*addr), flags);
- if (!addr) {
- printk(KERN_WARNING "NFS: cannot alloc new addr\n");
- return -ENOMEM;
- }
- sap = (struct sockaddr *)&addr->address;
- addr->addrlen = rpc_pton(clnt->cl_net, buf, strlen(buf),
- sap, sizeof(addr->address));
- if (!addr->addrlen) {
- printk(KERN_WARNING "NFS: cannot parse new addr %s\n",
- buf);
- kfree(addr);
- return -EINVAL;
- }
- list_add(&addr->cl_addrs, &clnt->cl_local_addrs);
-
- return 0;
-}
-
-static int
-nfs_client_add_v4_addr(struct nfs_client *clnt, struct in_device *indev,
- char *buf, size_t buflen)
-{
- struct in_ifaddr *ifa;
- int ret;
-
- in_dev_for_each_ifa_rtnl(ifa, indev) {
- snprintf(buf, buflen, "%pI4", &ifa->ifa_local);
- ret = nfs_client_add_addr(clnt, buf, GFP_KERNEL);
- if (ret < 0)
- return ret;
- }
-
- return 0;
-}
-
-#if IS_ENABLED(CONFIG_IPV6)
-static int
-nfs_client_add_v6_addr(struct nfs_client *clnt, struct inet6_dev *in6dev,
- char *buf, size_t buflen)
-{
- struct inet6_ifaddr *ifp;
- int ret = 0;
-
- read_lock_bh(&in6dev->lock);
- list_for_each_entry(ifp, &in6dev->addr_list, if_list) {
- rpc_ntop6_addr_noscopeid(&ifp->addr, buf, buflen);
- ret = nfs_client_add_addr(clnt, buf, GFP_ATOMIC);
- if (ret < 0)
- goto out;
- }
-out:
- read_unlock_bh(&in6dev->lock);
- return ret;
-}
-#else /* CONFIG_IPV6 */
-static int
-nfs_client_add_v6_addr(struct nfs_client *clnt, struct inet6_dev *in6dev,
- char *buf, size_t buflen)
-{
- return 0;
-}
-#endif
-
-/* Find out all local IP addresses. Ignore errors
- * because local IO can be optional.
- */
-void
-nfs_probe_local_addr(struct nfs_client *clnt)
-{
- struct net_device *dev;
- struct in_device *indev;
- struct inet6_dev *in6dev;
- char buf[INET6_ADDRSTRLEN + IPV6_SCOPE_ID_LEN];
- size_t buflen = sizeof(buf);
-
- rtnl_lock();
-
- for_each_netdev(clnt->cl_net, dev) {
- if (dev->type == ARPHRD_LOOPBACK ||
- !(dev->flags & IFF_UP))
- continue;
- indev = __in_dev_get_rtnl(dev);
- if (indev &&
- nfs_client_add_v4_addr(clnt, indev, buf, buflen) < 0)
- break;
- in6dev = __in6_dev_get(dev);
- if (in6dev &&
- nfs_client_add_v6_addr(clnt, in6dev, buf, buflen) < 0)
- break;
- }
-
- rtnl_unlock();
-}
diff --git a/include/linux/nfs_fs_sb.h b/include/linux/nfs_fs_sb.h
index 6b603b0247f1..00fe469bc72e 100644
--- a/include/linux/nfs_fs_sb.h
+++ b/include/linux/nfs_fs_sb.h
@@ -56,7 +56,6 @@ struct nfs_client {
char * cl_acceptor; /* GSSAPI acceptor name */
struct list_head cl_share_link; /* link in global client list */
struct list_head cl_superblocks; /* List of nfs_server structs */
- struct list_head cl_local_addrs; /* List of local addresses */

struct rpc_clnt * cl_rpcclient;
const struct nfs_rpc_ops *rpc_ops; /* NFS protocol vector */
diff --git a/include/linux/sunrpc/addr.h b/include/linux/sunrpc/addr.h
index e1007bddc3c4..07d454873b6d 100644
--- a/include/linux/sunrpc/addr.h
+++ b/include/linux/sunrpc/addr.h
@@ -68,9 +68,6 @@ static inline bool __rpc_copy_addr4(struct sockaddr *dst,
}

#if IS_ENABLED(CONFIG_IPV6)
-extern size_t rpc_ntop6_addr_noscopeid(const struct in6_addr *addr,
- char *buf, const int buflen);
-
static inline bool rpc_cmp_addr6(const struct sockaddr *sap1,
const struct sockaddr *sap2)
{
@@ -97,12 +94,6 @@ static inline bool __rpc_copy_addr6(struct sockaddr *dst,
return true;
}
#else /* !(IS_ENABLED(CONFIG_IPV6) */
-static size_t rpc_ntop6_addr_noscopeid(const struct in6_addr *addr,
- char *buf, const int buflen)
-{
- return 0;
-}
-
static inline bool rpc_cmp_addr6(const struct sockaddr *sap1,
const struct sockaddr *sap2)
{
diff --git a/net/sunrpc/addr.c b/net/sunrpc/addr.c
index 78a123a7c39b..97ff11973c49 100644
--- a/net/sunrpc/addr.c
+++ b/net/sunrpc/addr.c
@@ -25,9 +25,12 @@

#if IS_ENABLED(CONFIG_IPV6)

-size_t rpc_ntop6_addr_noscopeid(const struct in6_addr *addr,
- char *buf, const int buflen)
+static size_t rpc_ntop6_noscopeid(const struct sockaddr *sap,
+ char *buf, const int buflen)
{
+ const struct sockaddr_in6 *sin6 = (struct sockaddr_in6 *)sap;
+ const struct in6_addr *addr = &sin6->sin6_addr;
+
/*
* RFC 4291, Section 2.2.2
*
@@ -52,23 +55,13 @@ size_t rpc_ntop6_addr_noscopeid(const struct in6_addr *addr,
*/
if (ipv6_addr_v4mapped(addr))
return snprintf(buf, buflen, "::ffff:%pI4",
- &addr->s6_addr32[3]);
+ &addr->s6_addr32[3]);

/*
* RFC 4291, Section 2.2.1
*/
return snprintf(buf, buflen, "%pI6c", addr);
}
-EXPORT_SYMBOL_GPL(rpc_ntop6_addr_noscopeid);
-
-static size_t rpc_ntop6_noscopeid(const struct sockaddr *sap,
- char *buf, const int buflen)
-{
- const struct sockaddr_in6 *sin6 = (struct sockaddr_in6 *)sap;
- const struct in6_addr *addr = &sin6->sin6_addr;
-
- return rpc_ntop6_addr_noscopeid(addr, buf, buflen);
-}

static size_t rpc_ntop6(const struct sockaddr *sap,
char *buf, const size_t buflen)
--
2.44.0


2024-06-07 14:50:46

by Mike Snitzer

[permalink] [raw]
Subject: [for-6.11 PATCH 22/29] nfs: implement v3 client support for NFS_LOCALIO_PROGRAM

LOCALIOPROC_GETUUID allows client to discover server's uuid.

nfs_local_probe() will retrieve server's uuid via LOCALIO protocol and
verify the server with that uuid it is known to be local. This ensures
client and server 1: support localio 2: are local to each other.

Signed-off-by: Mike Snitzer <[email protected]>
---
fs/nfs/client.c | 10 +++++--
fs/nfs/localio.c | 62 +++++++++++++++++++++++++++++++++++----
fs/nfs/nfs3_fs.h | 1 +
fs/nfs/nfs3client.c | 42 ++++++++++++++++++++++++++
fs/nfs/nfs3proc.c | 3 ++
fs/nfs/nfs3xdr.c | 58 ++++++++++++++++++++++++++++++++++++
include/linux/nfs_fs_sb.h | 1 +
include/linux/nfs_xdr.h | 9 ++++++
include/uapi/linux/nfs.h | 4 +++
9 files changed, 181 insertions(+), 9 deletions(-)

diff --git a/fs/nfs/client.c b/fs/nfs/client.c
index c42faaed508c..589aeba8ccbb 100644
--- a/fs/nfs/client.c
+++ b/fs/nfs/client.c
@@ -170,7 +170,7 @@ struct nfs_client *nfs_alloc_client(const struct nfs_client_initdata *cl_init)
}

INIT_LIST_HEAD(&clp->cl_superblocks);
- clp->cl_rpcclient = ERR_PTR(-EINVAL);
+ clp->cl_rpcclient = clp->cl_rpcclient_localio = ERR_PTR(-EINVAL);

clp->cl_flags = cl_init->init_flags;
clp->cl_proto = cl_init->proto;
@@ -241,6 +241,8 @@ void nfs_free_client(struct nfs_client *clp)
/* -EIO all pending I/O */
if (!IS_ERR(clp->cl_rpcclient))
rpc_shutdown_client(clp->cl_rpcclient);
+ if (!IS_ERR(clp->cl_rpcclient_localio))
+ rpc_shutdown_client(clp->cl_rpcclient_localio);

put_net(clp->cl_net);
put_nfs_version(clp->cl_nfs_mod);
@@ -429,8 +431,10 @@ struct nfs_client *nfs_get_client(const struct nfs_client_initdata *cl_init)
list_add_tail(&new->cl_share_link,
&nn->nfs_client_list);
spin_unlock(&nn->nfs_client_lock);
- nfs_local_probe(new);
- return rpc_ops->init_client(new, cl_init);
+ new = rpc_ops->init_client(new, cl_init);
+ if (!IS_ERR(new))
+ nfs_local_probe(new);
+ return new;
}

spin_unlock(&nn->nfs_client_lock);
diff --git a/fs/nfs/localio.c b/fs/nfs/localio.c
index 96349b6e7585..145708444998 100644
--- a/fs/nfs/localio.c
+++ b/fs/nfs/localio.c
@@ -13,6 +13,7 @@
#include <linux/sunrpc/addr.h>
#include <linux/inetdevice.h>
#include <net/addrconf.h>
+#include <linux/nfslocalio.h>
#include <linux/module.h>
#include <linux/bvec.h>

@@ -227,16 +228,65 @@ nfs_local_disable(struct nfs_client *clp)
}
}

+static bool nfs_local_server_getuuid(struct nfs_client *clp, uuid_t *nfsd_uuid)
+{
+ u8 uuid[UUID_SIZE];
+ struct nfs_getuuidres res = {
+ uuid,
+ };
+ struct rpc_message msg = {
+ .rpc_resp = &res,
+ };
+ int status;
+
+ clp->rpc_ops->init_localioclient(clp);
+ if (IS_ERR(clp->cl_rpcclient_localio))
+ return false;
+
+ dprintk("%s: NFS issuing getuuid\n", __func__);
+ msg.rpc_proc = &clp->cl_rpcclient_localio->cl_procinfo[LOCALIOPROC_GETUUID];
+ status = rpc_call_sync(clp->cl_rpcclient_localio, &msg, 0);
+ dprintk("%s: NFS reply getuuid: status=%d uuid=%pU uuid_len=%u\n",
+ __func__, status, res.uuid, res.len);
+ if (status || res.len != UUID_SIZE)
+ return false;
+
+ import_uuid(nfsd_uuid, res.uuid);
+
+ return true;
+}
+
/*
- * nfs_local_probe - probe local i/o support for an nfs_client
+ * nfs_local_probe - probe local i/o support for an nfs_server and nfs_client
+ * - called after alloc_client and init_client (so cl_rpcclient exists)
*/
-void
-nfs_local_probe(struct nfs_client *clp)
+void nfs_local_probe(struct nfs_client *clp)
{
- bool enable = false;
+ uuid_t uuid;

- if (enable)
- nfs_local_enable(clp);
+ if (!localio_enabled)
+ return;
+
+ switch (clp->cl_rpcclient->cl_vers) {
+ case 3:
+ /*
+ * Retrieve server's uuid via LOCALIO protocol and verify the
+ * server with that uuid it is known to be local. This ensures
+ * client and server 1: support localio 2: are local to each other.
+ */
+ if (!nfs_local_server_getuuid(clp, &uuid))
+ return;
+ /* Verify client's nfsd, with specififed uuid, is local */
+ if (!nfsd_uuid_is_local(&uuid))
+ return;
+ break;
+ case 4:
+ default:
+ return; /* localio not supported */
+ }
+
+ dprintk("%s: detected local server.\n", __func__);
+ nfs_local_enable(clp);
}
EXPORT_SYMBOL_GPL(nfs_local_probe);

diff --git a/fs/nfs/nfs3_fs.h b/fs/nfs/nfs3_fs.h
index b333ea119ef5..efdf2b6519e9 100644
--- a/fs/nfs/nfs3_fs.h
+++ b/fs/nfs/nfs3_fs.h
@@ -30,6 +30,7 @@ static inline int nfs3_proc_setacls(struct inode *inode, struct posix_acl *acl,
struct nfs_server *nfs3_create_server(struct fs_context *);
struct nfs_server *nfs3_clone_server(struct nfs_server *, struct nfs_fh *,
struct nfs_fattr *, rpc_authflavor_t);
+void nfs3_init_localioclient(struct nfs_client *);

/* nfs3super.c */
extern struct nfs_subversion nfs_v3;
diff --git a/fs/nfs/nfs3client.c b/fs/nfs/nfs3client.c
index b0c8a39c2bbd..c41122ee808c 100644
--- a/fs/nfs/nfs3client.c
+++ b/fs/nfs/nfs3client.c
@@ -7,6 +7,8 @@
#include "netns.h"
#include "sysfs.h"

+#define NFSDBG_FACILITY NFSDBG_CLIENT
+
#ifdef CONFIG_NFS_V3_ACL
static struct rpc_stat nfsacl_rpcstat = { &nfsacl_program };
static const struct rpc_version *nfsacl_version[] = {
@@ -130,3 +132,43 @@ struct nfs_client *nfs3_set_ds_client(struct nfs_server *mds_srv,
return clp;
}
EXPORT_SYMBOL_GPL(nfs3_set_ds_client);
+
+#if defined(CONFIG_NFS_V3_LOCALIO)
+static struct rpc_stat nfslocalio_rpcstat = { &nfslocalio_program3 };
+static const struct rpc_version *nfslocalio_version[] = {
+ [3] = &nfslocalio_version3,
+};
+
+const struct rpc_program nfslocalio_program3 = {
+ .name = "nfslocalio",
+ .number = NFS_LOCALIO_PROGRAM,
+ .nrvers = ARRAY_SIZE(nfslocalio_version),
+ .version = nfslocalio_version,
+ .stats = &nfslocalio_rpcstat,
+};
+
+/*
+ * Initialise an NFSv3 localio client connection
+ */
+void nfs3_init_localioclient(struct nfs_client *clp)
+{
+ if (unlikely(!IS_ERR(clp->cl_rpcclient_localio)))
+ goto out;
+
+ clp->cl_rpcclient_localio = rpc_bind_new_program(clp->cl_rpcclient,
+ &nfslocalio_program3, 3);
+ if (IS_ERR(clp->cl_rpcclient_localio)) {
+ dprintk_rcu("%s: server (%s) does not support NFS v3 LOCALIO\n", __func__,
+ rpc_peeraddr2str(clp->cl_rpcclient, RPC_DISPLAY_ADDR));
+ return;
+ }
+out:
+ /* No errors! Assume that localio is supported */
+ dprintk_rcu("%s: server (%s) supports NFS v3 LOCALIO\n", __func__,
+ rpc_peeraddr2str(clp->cl_rpcclient_localio, RPC_DISPLAY_ADDR));
+}
+#else
+void nfs3_init_localioclient(struct nfs_client *clp)
+{
+}
+#endif /* CONFIG_NFS_V3_LOCALIO */
diff --git a/fs/nfs/nfs3proc.c b/fs/nfs/nfs3proc.c
index 74bda639a7cf..40b6e4d1e7be 100644
--- a/fs/nfs/nfs3proc.c
+++ b/fs/nfs/nfs3proc.c
@@ -1067,4 +1067,7 @@ const struct nfs_rpc_ops nfs_v3_clientops = {
.free_client = nfs_free_client,
.create_server = nfs3_create_server,
.clone_server = nfs3_clone_server,
+#if defined(CONFIG_NFS_V3_LOCALIO)
+ .init_localioclient = nfs3_init_localioclient,
+#endif
};
diff --git a/fs/nfs/nfs3xdr.c b/fs/nfs/nfs3xdr.c
index 60f032be805a..49689a9a2111 100644
--- a/fs/nfs/nfs3xdr.c
+++ b/fs/nfs/nfs3xdr.c
@@ -2579,3 +2579,61 @@ const struct rpc_version nfsacl_version3 = {
.counts = nfs3_acl_counts,
};
#endif /* CONFIG_NFS_V3_ACL */
+
+#if defined(CONFIG_NFS_V3_LOCALIO)
+
+#define LOCALIO3_getuuidres_sz (1+NFS3_filename_sz)
+
+static void nfs3_xdr_enc_getuuidargs(struct rpc_rqst *req,
+ struct xdr_stream *xdr,
+ const void *data)
+{
+ /* void function */
+}
+
+static inline int nfs3_decode_getuuidresok(struct xdr_stream *xdr,
+ struct nfs_getuuidres *result)
+{
+ return decode_inline_filename3(xdr, &result->uuid, &result->len);
+}
+
+static int nfs3_xdr_dec_getuuidres(struct rpc_rqst *req,
+ struct xdr_stream *xdr,
+ void *result)
+{
+ enum nfs_stat status;
+ int error;
+
+ error = decode_nfsstat3(xdr, &status);
+ if (unlikely(error))
+ goto out;
+ if (status != NFS3_OK)
+ goto out_default;
+ error = nfs3_decode_getuuidresok(xdr, result);
+out:
+ return error;
+out_default:
+ return nfs3_stat_to_errno(status);
+}
+
+static const struct rpc_procinfo nfs3_localio_procedures[] = {
+ [LOCALIOPROC_GETUUID] = {
+ .p_proc = LOCALIOPROC_GETUUID,
+ .p_encode = nfs3_xdr_enc_getuuidargs,
+ .p_decode = nfs3_xdr_dec_getuuidres,
+ .p_arglen = 1,
+ .p_replen = LOCALIO3_getuuidres_sz,
+ .p_timer = 0,
+ .p_name = "GETUUID",
+ },
+};
+
+static unsigned int nfs3_localio_counts[ARRAY_SIZE(nfs3_localio_procedures)];
+const struct rpc_version nfslocalio_version3 = {
+ .number = 3,
+ .nrprocs = ARRAY_SIZE(nfs3_localio_procedures),
+ .procs = nfs3_localio_procedures,
+ .counts = nfs3_localio_counts,
+};
+
+#endif /* CONFIG_NFS_V3_LOCALIO */
diff --git a/include/linux/nfs_fs_sb.h b/include/linux/nfs_fs_sb.h
index 00fe469bc72e..efcdb4d8e9de 100644
--- a/include/linux/nfs_fs_sb.h
+++ b/include/linux/nfs_fs_sb.h
@@ -130,6 +130,7 @@ struct nfs_client {
/* localio */
struct timespec64 cl_nfssvc_boot;
seqlock_t cl_boot_lock;
+ struct rpc_clnt * cl_rpcclient_localio; /* localio RPC client handle */
};

/*
diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h
index 764513a61601..9a030e9bd9cf 100644
--- a/include/linux/nfs_xdr.h
+++ b/include/linux/nfs_xdr.h
@@ -1002,6 +1002,11 @@ struct nfs3_getaclres {
struct posix_acl * acl_default;
};

+struct nfs_getuuidres {
+ const char * uuid;
+ unsigned int len;
+};
+
#if IS_ENABLED(CONFIG_NFS_V4)

typedef u64 clientid4;
@@ -1819,6 +1824,7 @@ struct nfs_rpc_ops {
int (*discover_trunking)(struct nfs_server *, struct nfs_fh *);
void (*enable_swap)(struct inode *inode);
void (*disable_swap)(struct inode *inode);
+ void (*init_localioclient)(struct nfs_client *);
};

/*
@@ -1834,4 +1840,7 @@ extern const struct rpc_version nfs_version4;
extern const struct rpc_version nfsacl_version3;
extern const struct rpc_program nfsacl_program;

+extern const struct rpc_version nfslocalio_version3;
+extern const struct rpc_program nfslocalio_program3;
+
#endif
diff --git a/include/uapi/linux/nfs.h b/include/uapi/linux/nfs.h
index f356f2ba3814..e72f5564bdc0 100644
--- a/include/uapi/linux/nfs.h
+++ b/include/uapi/linux/nfs.h
@@ -33,6 +33,10 @@
#define NFS_MNT_VERSION 1
#define NFS_MNT3_VERSION 3

+#define NFS_LOCALIO_PROGRAM 100229
+#define LOCALIOPROC_NULL 0
+#define LOCALIOPROC_GETUUID 1
+
#define NFS_PIPE_DIRNAME "nfs"

/*
--
2.44.0


2024-06-07 14:50:56

by Mike Snitzer

[permalink] [raw]
Subject: [for-6.11 PATCH 26/29] nfsd: implement v4 server support for NFS_LOCALIO_PROGRAM

Signed-off-by: Mike Snitzer <[email protected]>
---
fs/nfsd/localio.c | 70 +++++++++++++++++++++++++++++++++++++++++++++++
fs/nfsd/nfsd.h | 5 ++++
fs/nfsd/nfssvc.c | 25 ++++++++++-------
3 files changed, 90 insertions(+), 10 deletions(-)

diff --git a/fs/nfsd/localio.c b/fs/nfsd/localio.c
index eda4fa49b316..e4d2adf9531f 100644
--- a/fs/nfsd/localio.c
+++ b/fs/nfsd/localio.c
@@ -16,6 +16,7 @@
#include "filecache.h"
#include "cache.h"
#include "xdr3.h"
+#include "xdr4.h"

#define NFSDDBG_FACILITY NFSDDBG_FH

@@ -267,3 +268,72 @@ const struct svc_version nfsd_localio_version3 = {
.vs_xdrsize = NFS3_SVC_XDRSIZE,
};
#endif /* CONFIG_NFSD_V3_LOCALIO */
+
+#if defined(CONFIG_NFSD_V4_LOCALIO)
+static bool nfs4svc_encode_getuuidres(struct svc_rqst *rqstp,
+ struct xdr_stream *xdr)
+{
+ struct nfsd_getuuidres *resp = rqstp->rq_resp;
+ __be32 *p;
+
+ p = xdr_reserve_space(xdr, 8);
+ if (!p)
+ return 0;
+ *p++ = cpu_to_be32(LOCALIOPROC_GETUUID);
+ *p++ = resp->status;
+
+ if (resp->status == nfs_ok) {
+ u8 uuid[UUID_SIZE];
+
+ export_uuid(uuid, &resp->uuid);
+ p = xdr_reserve_space(xdr, 4 + UUID_SIZE);
+ if (!p)
+ return 0;
+ xdr_encode_opaque(p, uuid, UUID_SIZE);
+ dprintk("%s: nfs_ok uuid=%pU uuid_len=%lu\n",
+ __func__, uuid, sizeof(uuid));
+ }
+
+ return 1;
+}
+
+#define ST 1 /* status */
+#define NFS4_filename_sz (1+(NFS4_MAXNAMLEN>>2))
+
+static const struct svc_procedure nfsd_localio_procedures4[2] = {
+ [LOCALIOPROC_NULL] = {
+ .pc_func = nfsd_proc_null,
+ .pc_decode = nfssvc_decode_voidarg,
+ .pc_encode = nfssvc_encode_voidres,
+ .pc_argsize = sizeof(struct nfsd_voidargs),
+ .pc_ressize = sizeof(struct nfsd_voidres),
+ .pc_cachetype = RC_NOCACHE,
+ .pc_xdrressize = ST,
+ .pc_name = "NULL",
+ },
+ [LOCALIOPROC_GETUUID] = {
+ .pc_func = nfsd_proc_getuuid,
+ .pc_decode = nfssvc_decode_voidarg,
+ .pc_encode = nfs4svc_encode_getuuidres,
+ .pc_argsize = sizeof(struct nfsd_voidargs),
+ .pc_ressize = sizeof(struct nfsd_getuuidres),
+ .pc_cachetype = RC_NOCACHE,
+ .pc_xdrressize = ST+NFS4_filename_sz,
+ .pc_name = "GETUUID",
+ },
+};
+
+static DEFINE_PER_CPU_ALIGNED(unsigned long,
+ nfsd_localio_count4[ARRAY_SIZE(nfsd_localio_procedures4)]);
+const struct svc_version nfsd_localio_version4 = {
+ .vs_vers = 4,
+ .vs_nproc = 2,
+ .vs_proc = nfsd_localio_procedures4,
+ .vs_dispatch = nfsd_dispatch,
+ .vs_count = nfsd_localio_count4,
+ .vs_xdrsize = NFS4_SVC_XDRSIZE,
+ .vs_rpcb_optnl = true,
+ .vs_need_cong_ctrl = true,
+
+};
+#endif /* CONFIG_NFSD_V4_LOCALIO */
diff --git a/fs/nfsd/nfsd.h b/fs/nfsd/nfsd.h
index d6771669531d..dd225330837f 100644
--- a/fs/nfsd/nfsd.h
+++ b/fs/nfsd/nfsd.h
@@ -147,6 +147,11 @@ extern const struct svc_version nfsd_localio_version3;
#else
#define nfsd_localio_version3 NULL
#endif
+#if defined(CONFIG_NFSD_V4_LOCALIO)
+extern const struct svc_version nfsd_localio_version4;
+#else
+#define nfsd_localio_version4 NULL
+#endif

struct nfsd_net;

diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
index fab699699869..72ed4ed11c95 100644
--- a/fs/nfsd/nfssvc.c
+++ b/fs/nfsd/nfssvc.c
@@ -38,7 +38,7 @@
atomic_t nfsd_th_cnt = ATOMIC_INIT(0);
extern struct svc_program nfsd_program;
static int nfsd(void *vrqstp);
-#if defined(CONFIG_NFSD_V3_LOCALIO)
+#if defined(CONFIG_NFSD_V3_LOCALIO) || defined(CONFIG_NFSD_V4_LOCALIO)
static int nfsd_localio_rpcbind_set(struct net *,
const struct svc_program *,
u32, int,
@@ -47,7 +47,7 @@ static int nfsd_localio_rpcbind_set(struct net *,
static __be32 nfsd_localio_init_request(struct svc_rqst *,
const struct svc_program *,
struct svc_process_info *);
-#endif /* CONFIG_NFSD_V3_LOCALIO */
+#endif /* CONFIG_NFSD_V3_LOCALIO || CONFIG_NFSD_V4_LOCALIO */
#if defined(CONFIG_NFSD_V2_ACL) || defined(CONFIG_NFSD_V3_ACL)
static int nfsd_acl_rpcbind_set(struct net *,
const struct svc_program *,
@@ -91,9 +91,14 @@ DEFINE_SPINLOCK(nfsd_drc_lock);
unsigned long nfsd_drc_max_mem;
unsigned long nfsd_drc_mem_used;

-#if defined(CONFIG_NFSD_V3_LOCALIO)
+#if defined(CONFIG_NFSD_V3_LOCALIO) || defined(CONFIG_NFSD_V4_LOCALIO)
static const struct svc_version *nfsd_localio_version[] = {
+#if defined(CONFIG_NFSD_V3_LOCALIO)
[3] = &nfsd_localio_version3,
+#endif
+#if defined(CONFIG_NFSD_V4_LOCALIO)
+ [4] = &nfsd_localio_version4,
+#endif
};

#define NFSD_LOCALIO_MINVERS 3
@@ -109,7 +114,7 @@ static struct svc_program nfsd_localio_program = {
.pg_init_request = nfsd_localio_init_request,
.pg_rpcbind_set = nfsd_localio_rpcbind_set,
};
-#endif /* CONFIG_NFSD_V3_LOCALIO */
+#endif /* CONFIG_NFSD_V3_LOCALIO || CONFIG_NFSD_V4_LOCALIO */

#if defined(CONFIG_NFSD_V2_ACL) || defined(CONFIG_NFSD_V3_ACL)
static const struct svc_version *nfsd_acl_version[] = {
@@ -125,9 +130,9 @@ static const struct svc_version *nfsd_acl_version[] = {
#define NFSD_ACL_NRVERS ARRAY_SIZE(nfsd_acl_version)

static struct svc_program nfsd_acl_program = {
-#if defined(CONFIG_NFSD_V3_LOCALIO)
+#if defined(CONFIG_NFSD_V3_LOCALIO) || defined(CONFIG_NFSD_V4_LOCALIO)
.pg_next = &nfsd_localio_program,
-#endif /* CONFIG_NFSD_V3_LOCALIO */
+#endif /* CONFIG_NFSD_V3_LOCALIO || CONFIG_NFSD_V4_LOCALIO */
.pg_prog = NFS_ACL_PROGRAM,
.pg_nvers = NFSD_ACL_NRVERS,
.pg_vers = nfsd_acl_version,
@@ -157,9 +162,9 @@ struct svc_program nfsd_program = {
#if defined(CONFIG_NFSD_V2_ACL) || defined(CONFIG_NFSD_V3_ACL)
.pg_next = &nfsd_acl_program,
#else
-#if defined(CONFIG_NFSD_V3_LOCALIO)
+#if defined(CONFIG_NFSD_V3_LOCALIO) || defined(CONFIG_NFSD_V4_LOCALIO)
.pg_next = &nfsd_localio_program,
-#endif /* CONFIG_NFSD_V3_LOCALIO */
+#endif /* CONFIG_NFSD_V3_LOCALIO || CONFIG_NFSD_V4_LOCALIO */
#endif
.pg_prog = NFS_PROGRAM, /* program number */
.pg_nvers = NFSD_NRVERS, /* nr of entries in nfsd_version */
@@ -855,7 +860,7 @@ nfsd_svc(int nrservs, struct net *net, const struct cred *cred, const char *scop
return error;
}

-#if defined(CONFIG_NFSD_V3_LOCALIO)
+#if defined(CONFIG_NFSD_V3_LOCALIO) || defined(CONFIG_NFSD_V4_LOCALIO)
static bool
nfsd_support_localio_version(int vers)
{
@@ -889,7 +894,7 @@ nfsd_localio_init_request(struct svc_rqst *rqstp,

return rpc_prog_unavail;
}
-#endif /* CONFIG_NFSD_V3_LOCALIO */
+#endif /* CONFIG_NFSD_V3_LOCALIO || CONFIG_NFSD_V4_LOCALIO */

#if defined(CONFIG_NFSD_V2_ACL) || defined(CONFIG_NFSD_V3_ACL)
static bool
--
2.44.0


2024-06-07 18:06:11

by Mike Snitzer

[permalink] [raw]
Subject: [for-6.11 PATCH 30/29] nfs/nfsd: ensure localio server always uses its network namespace

Pass the stored cl_nfssvc_net from the client to the server as first
argument to nfsd_open_local_fh() to ensure the proper network
namespace is used for localio.

Otherwise, before this commit, the nfs_client's network namespace was
used (as extracted from the client's cl_rpcclient). This is clearly
not going to allow proper functionality if the client and server
happen to have disjoint network namespaces.

Elected to not rename the nfsd_uuid_t structure despite it growing a
non-uuid member. Can revisit later.

Signed-off-by: Mike Snitzer <[email protected]>
---
fs/nfs/client.c | 1 +
fs/nfs/localio.c | 7 +++++--
fs/nfs_common/nfslocalio.c | 15 +++++++++------
fs/nfsd/localio.c | 9 +++++----
fs/nfsd/nfssvc.c | 1 +
fs/nfsd/vfs.h | 3 ++-
include/linux/nfs_fs_sb.h | 1 +
include/linux/nfslocalio.h | 10 ++++++----
8 files changed, 30 insertions(+), 17 deletions(-)

diff --git a/fs/nfs/client.c b/fs/nfs/client.c
index 3d356fb05aee..16636c68148f 100644
--- a/fs/nfs/client.c
+++ b/fs/nfs/client.c
@@ -171,6 +171,7 @@ struct nfs_client *nfs_alloc_client(const struct nfs_client_initdata *cl_init)

INIT_LIST_HEAD(&clp->cl_superblocks);
clp->cl_rpcclient = clp->cl_rpcclient_localio = ERR_PTR(-EINVAL);
+ clp->cl_nfssvc_net = NULL;
clp->nfsd_open_local_fh = NULL;

clp->cl_flags = cl_init->init_flags;
diff --git a/fs/nfs/localio.c b/fs/nfs/localio.c
index fb1ebc9715ff..1c970763bcc5 100644
--- a/fs/nfs/localio.c
+++ b/fs/nfs/localio.c
@@ -187,6 +187,7 @@ static bool nfs_local_server_getuuid(struct nfs_client *clp, uuid_t *nfsd_uuid)
void nfs_local_probe(struct nfs_client *clp)
{
uuid_t uuid;
+ struct net *net = NULL;

if (!localio_enabled)
return;
@@ -202,8 +203,9 @@ void nfs_local_probe(struct nfs_client *clp)
if (!nfs_local_server_getuuid(clp, &uuid))
return;
/* Verify client's nfsd, with specififed uuid, is local */
- if (!nfsd_uuid_is_local(&uuid))
+ if (!nfsd_uuid_is_local(&uuid, &net))
return;
+ clp->cl_nfssvc_net = net;
break;
default:
return; /* localio not supported */
@@ -229,7 +231,8 @@ nfs_local_open_fh(struct nfs_client *clp, const struct cred *cred,
if (mode & ~(FMODE_READ | FMODE_WRITE))
return ERR_PTR(-EINVAL);

- status = clp->nfsd_open_local_fh(clp->cl_rpcclient, cred, fh, mode, &filp);
+ status = clp->nfsd_open_local_fh(clp->cl_nfssvc_net, clp->cl_rpcclient,
+ cred, fh, mode, &filp);
if (status < 0) {
dprintk("%s: open local file failed error=%d\n",
__func__, status);
diff --git a/fs/nfs_common/nfslocalio.c b/fs/nfs_common/nfslocalio.c
index c454c4100976..086e09b3ec38 100644
--- a/fs/nfs_common/nfslocalio.c
+++ b/fs/nfs_common/nfslocalio.c
@@ -12,29 +12,32 @@ MODULE_LICENSE("GPL");
/*
* Global list of nfsd_uuid_t instances, add/remove
* is protected by fs/nfsd/nfssvc.c:nfsd_mutex.
- * Reads are protected RCU read lock (see below).
+ * Reads are protected by RCU read lock (see below).
*/
LIST_HEAD(nfsd_uuids);
EXPORT_SYMBOL(nfsd_uuids);

/* Must be called with RCU read lock held. */
-static const uuid_t * nfsd_uuid_lookup(const uuid_t *uuid)
+static const uuid_t * nfsd_uuid_lookup(const uuid_t *uuid,
+ struct net **netp)
{
nfsd_uuid_t *nfsd_uuid;

list_for_each_entry_rcu(nfsd_uuid, &nfsd_uuids, list)
- if (uuid_equal(&nfsd_uuid->uuid, uuid))
+ if (uuid_equal(&nfsd_uuid->uuid, uuid)) {
+ *netp = nfsd_uuid->net;
return &nfsd_uuid->uuid;
+ }

return &uuid_null;
}

-bool nfsd_uuid_is_local(const uuid_t *uuid)
+bool nfsd_uuid_is_local(const uuid_t *uuid, struct net **netp)
{
const uuid_t *nfsd_uuid;

rcu_read_lock();
- nfsd_uuid = nfsd_uuid_lookup(uuid);
+ nfsd_uuid = nfsd_uuid_lookup(uuid, netp);
rcu_read_unlock();

return !uuid_is_null(nfsd_uuid);
@@ -51,7 +54,7 @@ EXPORT_SYMBOL_GPL(nfsd_uuid_is_local);
* This allows some sanity checking, like giving up on localio if nfsd isn't loaded.
*/

-extern int nfsd_open_local_fh(struct rpc_clnt *rpc_clnt,
+extern int nfsd_open_local_fh(struct net *, struct rpc_clnt *rpc_clnt,
const struct cred *cred, const struct nfs_fh *nfs_fh,
const fmode_t fmode, struct file **pfilp);

diff --git a/fs/nfsd/localio.c b/fs/nfsd/localio.c
index c4324a0fff57..0ff9ea6b8944 100644
--- a/fs/nfsd/localio.c
+++ b/fs/nfsd/localio.c
@@ -35,10 +35,10 @@ nfsd_local_fakerqst_destroy(struct svc_rqst *rqstp)
}

static struct svc_rqst *
-nfsd_local_fakerqst_create(struct rpc_clnt *rpc_clnt, const struct cred *cred)
+nfsd_local_fakerqst_create(struct net *net, struct rpc_clnt *rpc_clnt,
+ const struct cred *cred)
{
struct svc_rqst *rqstp;
- struct net *net = rpc_net_ns(rpc_clnt);
struct nfsd_net *nn = net_generic(net, nfsd_net_id);
int status;

@@ -122,7 +122,8 @@ nfsd_local_fakerqst_create(struct rpc_clnt *rpc_clnt, const struct cred *cred)
* dependency on knfsd. So, there is no forward declaration in a header file
* for it.
*/
-int nfsd_open_local_fh(struct rpc_clnt *rpc_clnt,
+int nfsd_open_local_fh(struct net *net,
+ struct rpc_clnt *rpc_clnt,
const struct cred *cred,
const struct nfs_fh *nfs_fh,
const fmode_t fmode,
@@ -139,7 +140,7 @@ int nfsd_open_local_fh(struct rpc_clnt *rpc_clnt,
/* Save creds before calling into nfsd */
save_cred = get_current_cred();

- rqstp = nfsd_local_fakerqst_create(rpc_clnt, cred);
+ rqstp = nfsd_local_fakerqst_create(net, rpc_clnt, cred);
if (IS_ERR(rqstp)) {
status = PTR_ERR(rqstp);
goto out_revertcred;
diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
index 72ed4ed11c95..f63cdeef9c64 100644
--- a/fs/nfsd/nfssvc.c
+++ b/fs/nfsd/nfssvc.c
@@ -473,6 +473,7 @@ static int nfsd_startup_net(struct net *net, const struct cred *cred)
#endif
#if defined(CONFIG_NFSD_V3_LOCALIO) || defined(CONFIG_NFSD_V4_LOCALIO)
INIT_LIST_HEAD(&nn->nfsd_uuid.list);
+ nn->nfsd_uuid.net = net;
list_add_tail_rcu(&nn->nfsd_uuid.list, &nfsd_uuids);
#endif
nn->nfsd_net_up = true;
diff --git a/fs/nfsd/vfs.h b/fs/nfsd/vfs.h
index 91c50649a8c7..af07bb146e81 100644
--- a/fs/nfsd/vfs.h
+++ b/fs/nfsd/vfs.h
@@ -160,7 +160,8 @@ __be32 nfsd_permission(struct svc_rqst *, struct svc_export *,

void nfsd_filp_close(struct file *fp);

-int nfsd_open_local_fh(struct rpc_clnt *rpc_clnt,
+int nfsd_open_local_fh(struct net *net,
+ struct rpc_clnt *rpc_clnt,
const struct cred *cred,
const struct nfs_fh *nfs_fh,
const fmode_t fmode,
diff --git a/include/linux/nfs_fs_sb.h b/include/linux/nfs_fs_sb.h
index f5760b05ec87..f47ea512eb0a 100644
--- a/include/linux/nfs_fs_sb.h
+++ b/include/linux/nfs_fs_sb.h
@@ -132,6 +132,7 @@ struct nfs_client {
struct timespec64 cl_nfssvc_boot;
seqlock_t cl_boot_lock;
struct rpc_clnt * cl_rpcclient_localio; /* localio RPC client handle */
+ struct net * cl_nfssvc_net;
nfs_to_nfsd_open_t nfsd_open_local_fh;
};

diff --git a/include/linux/nfslocalio.h b/include/linux/nfslocalio.h
index b8df1b9f248d..c9592ad0afe2 100644
--- a/include/linux/nfslocalio.h
+++ b/include/linux/nfslocalio.h
@@ -8,6 +8,7 @@
#include <linux/list.h>
#include <linux/uuid.h>
#include <linux/nfs.h>
+#include <net/net_namespace.h>

/*
* Global list of nfsd_uuid_t instances, add/remove
@@ -23,13 +24,14 @@ extern struct list_head nfsd_uuids;
typedef struct {
uuid_t uuid;
struct list_head list;
+ struct net *net; /* nfsd's network namespace */
} nfsd_uuid_t;

-bool nfsd_uuid_is_local(const uuid_t *uuid);
+bool nfsd_uuid_is_local(const uuid_t *uuid, struct net **netp);

-typedef int (*nfs_to_nfsd_open_t)(struct rpc_clnt *, const struct cred *,
- const struct nfs_fh *, const fmode_t,
- struct file **);
+typedef int (*nfs_to_nfsd_open_t)(struct net *, struct rpc_clnt *,
+ const struct cred *, const struct nfs_fh *,
+ const fmode_t, struct file **);

nfs_to_nfsd_open_t get_nfsd_open_local_fh(void);
void put_nfsd_open_local_fh(void);
--
2.44.0


2024-06-07 18:09:51

by Mike Snitzer

[permalink] [raw]
Subject: Re: [for-6.11 PATCH 00/29] nfs/nfsd: add support for localio bypass

On Fri, Jun 07, 2024 at 10:26:17AM -0400, Mike Snitzer wrote:
>
> My container testing was done in terms of podman managed containers.
> I'd appreciate additional review relative to network namespaces.
> fs/nfsd/localio.c:nfsd_local_fakerqst_create() in particular is simply
> using the client's network namespace with rpc_net_ns(rpc_clnt). I have
> an extra patch that updates nfsd_open_local_fh()'s first argument to
> be the server's 'struct net' -- but I stopped short of formally
> including that change in this series because it hasn't proven needed
> (but more exotic hypothetical scenarios could easily expose the need
> for it). I can append it to the series as an "RFC PATCH 30/29" as
> needed.

I did just post that 30/29 patch to this thread.

And here is my git tree for these changes:
https://git.kernel.org/pub/scm/linux/kernel/git/snitzer/linux.git/log/?h=nfs-localio-for-6.11

2024-06-09 12:36:45

by Jeff Layton

[permalink] [raw]
Subject: Re: [for-6.11 PATCH 07/29] sunrpc: add and export rpc_ntop6_addr_noscopeid

On Fri, 2024-06-07 at 10:26 -0400, Mike Snitzer wrote:
> From: Peng Tao <[email protected]>
>

Still looking over the set, but this could use some justification.

> Signed-off-by: Peng Tao <[email protected]>
> Signed-off-by: Lance Shelton <[email protected]>
> Signed-off-by: Trond Myklebust <[email protected]>
> Signed-off-by: Mike Snitzer <[email protected]>
> ---
> include/linux/sunrpc/addr.h | 9 +++++++++
> net/sunrpc/addr.c | 19 +++++++++++++------
> 2 files changed, 22 insertions(+), 6 deletions(-)
>
> diff --git a/include/linux/sunrpc/addr.h b/include/linux/sunrpc/addr.h
> index 07d454873b6d..e1007bddc3c4 100644
> --- a/include/linux/sunrpc/addr.h
> +++ b/include/linux/sunrpc/addr.h
> @@ -68,6 +68,9 @@ static inline bool __rpc_copy_addr4(struct sockaddr *dst,
> }
>
> #if IS_ENABLED(CONFIG_IPV6)
> +extern size_t rpc_ntop6_addr_noscopeid(const struct in6_addr *addr,
> + char *buf, const int buflen);
> +
> static inline bool rpc_cmp_addr6(const struct sockaddr *sap1,
> const struct sockaddr *sap2)
> {
> @@ -94,6 +97,12 @@ static inline bool __rpc_copy_addr6(struct sockaddr *dst,
> return true;
> }
> #else /* !(IS_ENABLED(CONFIG_IPV6) */
> +static size_t rpc_ntop6_addr_noscopeid(const struct in6_addr *addr,
> + char *buf, const int buflen)
> +{
> + return 0;
> +}
> +
> static inline bool rpc_cmp_addr6(const struct sockaddr *sap1,
> const struct sockaddr *sap2)
> {
> diff --git a/net/sunrpc/addr.c b/net/sunrpc/addr.c
> index 97ff11973c49..78a123a7c39b 100644
> --- a/net/sunrpc/addr.c
> +++ b/net/sunrpc/addr.c
> @@ -25,12 +25,9 @@
>
> #if IS_ENABLED(CONFIG_IPV6)
>
> -static size_t rpc_ntop6_noscopeid(const struct sockaddr *sap,
> - char *buf, const int buflen)
> +size_t rpc_ntop6_addr_noscopeid(const struct in6_addr *addr,
> + char *buf, const int buflen)
> {
> - const struct sockaddr_in6 *sin6 = (struct sockaddr_in6 *)sap;
> - const struct in6_addr *addr = &sin6->sin6_addr;
> -
> /*
> * RFC 4291, Section 2.2.2
> *
> @@ -55,13 +52,23 @@ static size_t rpc_ntop6_noscopeid(const struct sockaddr *sap,
> */
> if (ipv6_addr_v4mapped(addr))
> return snprintf(buf, buflen, "::ffff:%pI4",
> - &addr->s6_addr32[3]);
> + &addr->s6_addr32[3]);
>
> /*
> * RFC 4291, Section 2.2.1
> */
> return snprintf(buf, buflen, "%pI6c", addr);
> }
> +EXPORT_SYMBOL_GPL(rpc_ntop6_addr_noscopeid);
> +
> +static size_t rpc_ntop6_noscopeid(const struct sockaddr *sap,
> + char *buf, const int buflen)
> +{
> + const struct sockaddr_in6 *sin6 = (struct sockaddr_in6 *)sap;
> + const struct in6_addr *addr = &sin6->sin6_addr;
> +
> + return rpc_ntop6_addr_noscopeid(addr, buf, buflen);
> +}
>
> static size_t rpc_ntop6(const struct sockaddr *sap,
> char *buf, const size_t buflen)

--
Jeff Layton <[email protected]>

2024-06-09 15:44:44

by Chuck Lever III

[permalink] [raw]
Subject: Re: [for-6.11 PATCH 30/29] nfs/nfsd: ensure localio server always uses its network namespace

On Fri, Jun 07, 2024 at 02:06:01PM -0400, Mike Snitzer wrote:
> Pass the stored cl_nfssvc_net from the client to the server as first
> argument to nfsd_open_local_fh() to ensure the proper network
> namespace is used for localio.
>
> Otherwise, before this commit, the nfs_client's network namespace was
> used (as extracted from the client's cl_rpcclient). This is clearly
> not going to allow proper functionality if the client and server
> happen to have disjoint network namespaces.
>
> Elected to not rename the nfsd_uuid_t structure despite it growing a
> non-uuid member. Can revisit later.
>
> Signed-off-by: Mike Snitzer <[email protected]>
> ---
> fs/nfs/client.c | 1 +
> fs/nfs/localio.c | 7 +++++--
> fs/nfs_common/nfslocalio.c | 15 +++++++++------
> fs/nfsd/localio.c | 9 +++++----
> fs/nfsd/nfssvc.c | 1 +
> fs/nfsd/vfs.h | 3 ++-
> include/linux/nfs_fs_sb.h | 1 +
> include/linux/nfslocalio.h | 10 ++++++----
> 8 files changed, 30 insertions(+), 17 deletions(-)
>
> diff --git a/fs/nfs/client.c b/fs/nfs/client.c
> index 3d356fb05aee..16636c68148f 100644
> --- a/fs/nfs/client.c
> +++ b/fs/nfs/client.c
> @@ -171,6 +171,7 @@ struct nfs_client *nfs_alloc_client(const struct nfs_client_initdata *cl_init)
>
> INIT_LIST_HEAD(&clp->cl_superblocks);
> clp->cl_rpcclient = clp->cl_rpcclient_localio = ERR_PTR(-EINVAL);
> + clp->cl_nfssvc_net = NULL;
> clp->nfsd_open_local_fh = NULL;
>
> clp->cl_flags = cl_init->init_flags;
> diff --git a/fs/nfs/localio.c b/fs/nfs/localio.c
> index fb1ebc9715ff..1c970763bcc5 100644
> --- a/fs/nfs/localio.c
> +++ b/fs/nfs/localio.c
> @@ -187,6 +187,7 @@ static bool nfs_local_server_getuuid(struct nfs_client *clp, uuid_t *nfsd_uuid)
> void nfs_local_probe(struct nfs_client *clp)
> {
> uuid_t uuid;
> + struct net *net = NULL;
>
> if (!localio_enabled)
> return;
> @@ -202,8 +203,9 @@ void nfs_local_probe(struct nfs_client *clp)
> if (!nfs_local_server_getuuid(clp, &uuid))
> return;
> /* Verify client's nfsd, with specififed uuid, is local */
> - if (!nfsd_uuid_is_local(&uuid))
> + if (!nfsd_uuid_is_local(&uuid, &net))
> return;
> + clp->cl_nfssvc_net = net;
> break;
> default:
> return; /* localio not supported */
> @@ -229,7 +231,8 @@ nfs_local_open_fh(struct nfs_client *clp, const struct cred *cred,
> if (mode & ~(FMODE_READ | FMODE_WRITE))
> return ERR_PTR(-EINVAL);
>
> - status = clp->nfsd_open_local_fh(clp->cl_rpcclient, cred, fh, mode, &filp);
> + status = clp->nfsd_open_local_fh(clp->cl_nfssvc_net, clp->cl_rpcclient,
> + cred, fh, mode, &filp);
> if (status < 0) {
> dprintk("%s: open local file failed error=%d\n",
> __func__, status);
> diff --git a/fs/nfs_common/nfslocalio.c b/fs/nfs_common/nfslocalio.c
> index c454c4100976..086e09b3ec38 100644
> --- a/fs/nfs_common/nfslocalio.c
> +++ b/fs/nfs_common/nfslocalio.c
> @@ -12,29 +12,32 @@ MODULE_LICENSE("GPL");
> /*
> * Global list of nfsd_uuid_t instances, add/remove
> * is protected by fs/nfsd/nfssvc.c:nfsd_mutex.
> - * Reads are protected RCU read lock (see below).
> + * Reads are protected by RCU read lock (see below).
> */
> LIST_HEAD(nfsd_uuids);
> EXPORT_SYMBOL(nfsd_uuids);
>
> /* Must be called with RCU read lock held. */
> -static const uuid_t * nfsd_uuid_lookup(const uuid_t *uuid)
> +static const uuid_t * nfsd_uuid_lookup(const uuid_t *uuid,
> + struct net **netp)
> {
> nfsd_uuid_t *nfsd_uuid;
>
> list_for_each_entry_rcu(nfsd_uuid, &nfsd_uuids, list)
> - if (uuid_equal(&nfsd_uuid->uuid, uuid))
> + if (uuid_equal(&nfsd_uuid->uuid, uuid)) {
> + *netp = nfsd_uuid->net;
> return &nfsd_uuid->uuid;
> + }
>
> return &uuid_null;
> }
>
> -bool nfsd_uuid_is_local(const uuid_t *uuid)
> +bool nfsd_uuid_is_local(const uuid_t *uuid, struct net **netp)
> {
> const uuid_t *nfsd_uuid;
>
> rcu_read_lock();
> - nfsd_uuid = nfsd_uuid_lookup(uuid);
> + nfsd_uuid = nfsd_uuid_lookup(uuid, netp);
> rcu_read_unlock();
>
> return !uuid_is_null(nfsd_uuid);
> @@ -51,7 +54,7 @@ EXPORT_SYMBOL_GPL(nfsd_uuid_is_local);
> * This allows some sanity checking, like giving up on localio if nfsd isn't loaded.
> */
>
> -extern int nfsd_open_local_fh(struct rpc_clnt *rpc_clnt,
> +extern int nfsd_open_local_fh(struct net *, struct rpc_clnt *rpc_clnt,
> const struct cred *cred, const struct nfs_fh *nfs_fh,
> const fmode_t fmode, struct file **pfilp);
>
> diff --git a/fs/nfsd/localio.c b/fs/nfsd/localio.c
> index c4324a0fff57..0ff9ea6b8944 100644
> --- a/fs/nfsd/localio.c
> +++ b/fs/nfsd/localio.c
> @@ -35,10 +35,10 @@ nfsd_local_fakerqst_destroy(struct svc_rqst *rqstp)
> }
>
> static struct svc_rqst *
> -nfsd_local_fakerqst_create(struct rpc_clnt *rpc_clnt, const struct cred *cred)
> +nfsd_local_fakerqst_create(struct net *net, struct rpc_clnt *rpc_clnt,
> + const struct cred *cred)
> {
> struct svc_rqst *rqstp;
> - struct net *net = rpc_net_ns(rpc_clnt);
> struct nfsd_net *nn = net_generic(net, nfsd_net_id);
> int status;
>
> @@ -122,7 +122,8 @@ nfsd_local_fakerqst_create(struct rpc_clnt *rpc_clnt, const struct cred *cred)
> * dependency on knfsd. So, there is no forward declaration in a header file
> * for it.
> */
> -int nfsd_open_local_fh(struct rpc_clnt *rpc_clnt,
> +int nfsd_open_local_fh(struct net *net,
> + struct rpc_clnt *rpc_clnt,
> const struct cred *cred,
> const struct nfs_fh *nfs_fh,
> const fmode_t fmode,
> @@ -139,7 +140,7 @@ int nfsd_open_local_fh(struct rpc_clnt *rpc_clnt,
> /* Save creds before calling into nfsd */
> save_cred = get_current_cred();
>
> - rqstp = nfsd_local_fakerqst_create(rpc_clnt, cred);
> + rqstp = nfsd_local_fakerqst_create(net, rpc_clnt, cred);
> if (IS_ERR(rqstp)) {
> status = PTR_ERR(rqstp);
> goto out_revertcred;
> diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
> index 72ed4ed11c95..f63cdeef9c64 100644
> --- a/fs/nfsd/nfssvc.c
> +++ b/fs/nfsd/nfssvc.c
> @@ -473,6 +473,7 @@ static int nfsd_startup_net(struct net *net, const struct cred *cred)
> #endif
> #if defined(CONFIG_NFSD_V3_LOCALIO) || defined(CONFIG_NFSD_V4_LOCALIO)
> INIT_LIST_HEAD(&nn->nfsd_uuid.list);
> + nn->nfsd_uuid.net = net;
> list_add_tail_rcu(&nn->nfsd_uuid.list, &nfsd_uuids);
> #endif
> nn->nfsd_net_up = true;
> diff --git a/fs/nfsd/vfs.h b/fs/nfsd/vfs.h
> index 91c50649a8c7..af07bb146e81 100644
> --- a/fs/nfsd/vfs.h
> +++ b/fs/nfsd/vfs.h
> @@ -160,7 +160,8 @@ __be32 nfsd_permission(struct svc_rqst *, struct svc_export *,
>
> void nfsd_filp_close(struct file *fp);
>
> -int nfsd_open_local_fh(struct rpc_clnt *rpc_clnt,
> +int nfsd_open_local_fh(struct net *net,
> + struct rpc_clnt *rpc_clnt,
> const struct cred *cred,
> const struct nfs_fh *nfs_fh,
> const fmode_t fmode,
> diff --git a/include/linux/nfs_fs_sb.h b/include/linux/nfs_fs_sb.h
> index f5760b05ec87..f47ea512eb0a 100644
> --- a/include/linux/nfs_fs_sb.h
> +++ b/include/linux/nfs_fs_sb.h
> @@ -132,6 +132,7 @@ struct nfs_client {
> struct timespec64 cl_nfssvc_boot;
> seqlock_t cl_boot_lock;
> struct rpc_clnt * cl_rpcclient_localio; /* localio RPC client handle */
> + struct net * cl_nfssvc_net;
> nfs_to_nfsd_open_t nfsd_open_local_fh;
> };
>
> diff --git a/include/linux/nfslocalio.h b/include/linux/nfslocalio.h
> index b8df1b9f248d..c9592ad0afe2 100644
> --- a/include/linux/nfslocalio.h
> +++ b/include/linux/nfslocalio.h
> @@ -8,6 +8,7 @@
> #include <linux/list.h>
> #include <linux/uuid.h>
> #include <linux/nfs.h>
> +#include <net/net_namespace.h>
>
> /*
> * Global list of nfsd_uuid_t instances, add/remove
> @@ -23,13 +24,14 @@ extern struct list_head nfsd_uuids;
> typedef struct {
> uuid_t uuid;
> struct list_head list;
> + struct net *net; /* nfsd's network namespace */
> } nfsd_uuid_t;
>
> -bool nfsd_uuid_is_local(const uuid_t *uuid);
> +bool nfsd_uuid_is_local(const uuid_t *uuid, struct net **netp);
>
> -typedef int (*nfs_to_nfsd_open_t)(struct rpc_clnt *, const struct cred *,
> - const struct nfs_fh *, const fmode_t,
> - struct file **);
> +typedef int (*nfs_to_nfsd_open_t)(struct net *, struct rpc_clnt *,
> + const struct cred *, const struct nfs_fh *,
> + const fmode_t, struct file **);
>
> nfs_to_nfsd_open_t get_nfsd_open_local_fh(void);
> void put_nfsd_open_local_fh(void);
> --
> 2.44.0
>

For some reason, I received only patch 30/29.

--
Chuck Lever

2024-06-10 12:02:23

by Jeff Layton

[permalink] [raw]
Subject: Re: [for-6.11 PATCH 01/29] nfs: pass nfs_client to nfs_initiate_pgio

On Fri, 2024-06-07 at 10:26 -0400, Mike Snitzer wrote:
> From: Weston Andros Adamson <[email protected]>
>
> The nfs_client is needed for localio support.
>
> Signed-off-by: Weston Andros Adamson <[email protected]>
> Signed-off-by: Lance Shelton <[email protected]>
> Signed-off-by: Trond Myklebust <[email protected]>
> Signed-off-by: Mike Snitzer <[email protected]>
> ---
>  fs/nfs/filelayout/filelayout.c         |  4 ++--
>  fs/nfs/flexfilelayout/flexfilelayout.c |  6 ++++--
>  fs/nfs/internal.h                      |  5 +++--
>  fs/nfs/pagelist.c                      | 10 ++++++----
>  4 files changed, 15 insertions(+), 10 deletions(-)
>
> diff --git a/fs/nfs/filelayout/filelayout.c
> b/fs/nfs/filelayout/filelayout.c
> index 29d84dc66ca3..43e16e9e0176 100644
> --- a/fs/nfs/filelayout/filelayout.c
> +++ b/fs/nfs/filelayout/filelayout.c
> @@ -486,7 +486,7 @@ filelayout_read_pagelist(struct nfs_pgio_header
> *hdr)
>   hdr->mds_offset = offset;
>  
>   /* Perform an asynchronous read to ds */
> - nfs_initiate_pgio(ds_clnt, hdr, hdr->cred,
> + nfs_initiate_pgio(ds->ds_clp, ds_clnt, hdr, hdr->cred,
>     NFS_PROTO(hdr->inode),
> &filelayout_read_call_ops,
>     0, RPC_TASK_SOFTCONN);
>   return PNFS_ATTEMPTED;
> @@ -528,7 +528,7 @@ filelayout_write_pagelist(struct nfs_pgio_header
> *hdr, int sync)
>   hdr->args.offset = filelayout_get_dserver_offset(lseg,
> offset);
>  
>   /* Perform an asynchronous write */
> - nfs_initiate_pgio(ds_clnt, hdr, hdr->cred,
> + nfs_initiate_pgio(ds->ds_clp, ds_clnt, hdr, hdr->cred,
>     NFS_PROTO(hdr->inode),
> &filelayout_write_call_ops,
>     sync, RPC_TASK_SOFTCONN);
>   return PNFS_ATTEMPTED;
> diff --git a/fs/nfs/flexfilelayout/flexfilelayout.c
> b/fs/nfs/flexfilelayout/flexfilelayout.c
> index 24188af56d5b..327f1a5c9fbe 100644
> --- a/fs/nfs/flexfilelayout/flexfilelayout.c
> +++ b/fs/nfs/flexfilelayout/flexfilelayout.c
> @@ -1803,7 +1803,8 @@ ff_layout_read_pagelist(struct nfs_pgio_header
> *hdr)
>   hdr->mds_offset = offset;
>  
>   /* Perform an asynchronous read to ds */
> - nfs_initiate_pgio(ds_clnt, hdr, ds_cred, ds->ds_clp-
> >rpc_ops,
> + nfs_initiate_pgio(ds->ds_clp, ds_clnt, hdr, ds_cred,
> +   ds->ds_clp->rpc_ops,
>     vers == 3 ? &ff_layout_read_call_ops_v3 :
>         &ff_layout_read_call_ops_v4,
>     0, RPC_TASK_SOFTCONN);
> @@ -1871,7 +1872,8 @@ ff_layout_write_pagelist(struct nfs_pgio_header
> *hdr, int sync)
>   hdr->args.offset = offset;
>  
>   /* Perform an asynchronous write */
> - nfs_initiate_pgio(ds_clnt, hdr, ds_cred, ds->ds_clp-
> >rpc_ops,
> + nfs_initiate_pgio(ds->ds_clp, ds_clnt, hdr, ds_cred,
> +   ds->ds_clp->rpc_ops,
>     vers == 3 ? &ff_layout_write_call_ops_v3 :
>         &ff_layout_write_call_ops_v4,
>     sync, RPC_TASK_SOFTCONN);
> diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
> index 9f0f4534744b..a9c0c29f7804 100644
> --- a/fs/nfs/internal.h
> +++ b/fs/nfs/internal.h
> @@ -306,8 +306,9 @@ extern const struct nfs_pageio_ops
> nfs_pgio_rw_ops;
>  struct nfs_pgio_header *nfs_pgio_header_alloc(const struct
> nfs_rw_ops *);
>  void nfs_pgio_header_free(struct nfs_pgio_header *);
>  int nfs_generic_pgio(struct nfs_pageio_descriptor *, struct
> nfs_pgio_header *);
> -int nfs_initiate_pgio(struct rpc_clnt *clnt, struct nfs_pgio_header
> *hdr,
> -       const struct cred *cred, const struct
> nfs_rpc_ops *rpc_ops,
> +int nfs_initiate_pgio(struct nfs_client *clp, struct rpc_clnt
> *rpc_clnt,
> +       struct nfs_pgio_header *hdr, const struct cred
> *cred,
> +       const struct nfs_rpc_ops *rpc_ops,
>         const struct rpc_call_ops *call_ops, int how,
> int flags);
>  void nfs_free_request(struct nfs_page *req);
>  struct nfs_pgio_mirror *
> diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
> index 6efb5068c116..d9b795c538cd 100644
> --- a/fs/nfs/pagelist.c
> +++ b/fs/nfs/pagelist.c
> @@ -844,8 +844,9 @@ static void nfs_pgio_prepare(struct rpc_task
> *task, void *calldata)
>   rpc_exit(task, err);
>  }
>  
> -int nfs_initiate_pgio(struct rpc_clnt *clnt, struct nfs_pgio_header
> *hdr,
> -       const struct cred *cred, const struct
> nfs_rpc_ops *rpc_ops,
> +int nfs_initiate_pgio(struct nfs_client *clp, struct rpc_clnt
> *rpc_clnt,
> +       struct nfs_pgio_header *hdr, const struct cred
> *cred,
> +       const struct nfs_rpc_ops *rpc_ops,
>         const struct rpc_call_ops *call_ops, int how,
> int flags)
>  {
>   struct rpc_task *task;
> @@ -855,7 +856,7 @@ int nfs_initiate_pgio(struct rpc_clnt *clnt,
> struct nfs_pgio_header *hdr,
>   .rpc_cred = cred,
>   };
>   struct rpc_task_setup task_setup_data = {
> - .rpc_client = clnt,
> + .rpc_client = rpc_clnt,
>   .task = &hdr->task,
>   .rpc_message = &msg,
>   .callback_ops = call_ops,
> @@ -1070,7 +1071,8 @@ static int nfs_generic_pg_pgios(struct
> nfs_pageio_descriptor *desc)
>   if (ret == 0) {
>   if (NFS_SERVER(hdr->inode)->nfs_client-
> >cl_minorversion)
>   task_flags = RPC_TASK_MOVEABLE;
> - ret = nfs_initiate_pgio(NFS_CLIENT(hdr->inode),
> + ret = nfs_initiate_pgio(NFS_SERVER(hdr->inode)-
> >nfs_client,
> + NFS_CLIENT(hdr->inode),
>   hdr,
>   hdr->cred,
>   NFS_PROTO(hdr->inode),

My first inclination was that this is redundant since the nfs_client
has a pointer to the rpc_client in it, but in the pNFS situation I
suppose they aren't necessary the same thing, so I guess you have to do
it this way.

Reviewed-by: Jeff Layton <[email protected]>

2024-06-10 12:20:16

by Jeff Layton

[permalink] [raw]
Subject: Re: [for-6.11 PATCH 06/29] sunrpc: add rpcauth_map_to_svc_cred

On Fri, 2024-06-07 at 10:26 -0400, Mike Snitzer wrote:
> From: Weston Andros Adamson <[email protected]>
>
> Add new funtion rpcauth_map_to_svc_cred which maps a generic rpc_cred
> to an
> svc_cred suitable for use in nfsd.
>
> This is needed by the localio code to map nfs client creds to nfs
> server
> credentials.
>
> Signed-off-by: Weston Andros Adamson <[email protected]>
> Signed-off-by: Lance Shelton <[email protected]>
> Signed-off-by: Trond Myklebust <[email protected]>
> Signed-off-by: Mike Snitzer <[email protected]>
> ---
>  include/linux/sunrpc/auth.h |  4 ++++
>  net/sunrpc/auth.c           | 16 ++++++++++++++++
>  2 files changed, 20 insertions(+)
>
> diff --git a/include/linux/sunrpc/auth.h
> b/include/linux/sunrpc/auth.h
> index 61e58327b1aa..5ebf031361a1 100644
> --- a/include/linux/sunrpc/auth.h
> +++ b/include/linux/sunrpc/auth.h
> @@ -11,6 +11,7 @@
>  #define _LINUX_SUNRPC_AUTH_H
>  
>  #include <linux/sunrpc/sched.h>
> +#include <linux/sunrpc/svcauth.h>
>  #include <linux/sunrpc/msg_prot.h>
>  #include <linux/sunrpc/xdr.h>
>  
> @@ -184,6 +185,9 @@ int rpcauth_uptodatecred(struct
> rpc_task *);
>  int rpcauth_init_credcache(struct rpc_auth *);
>  void rpcauth_destroy_credcache(struct rpc_auth
> *);
>  void rpcauth_clear_credcache(struct
> rpc_cred_cache *);
> +bool rpcauth_map_to_svc_cred(struct rpc_auth *,
> + const struct cred *,
> + struct svc_cred *);
>  char * rpcauth_stringify_acceptor(struct rpc_cred
> *);
>  
>  static inline
> diff --git a/net/sunrpc/auth.c b/net/sunrpc/auth.c
> index 04534ea537c8..55a03a3bcac2 100644
> --- a/net/sunrpc/auth.c
> +++ b/net/sunrpc/auth.c
> @@ -308,6 +308,22 @@ rpcauth_init_credcache(struct rpc_auth *auth)
>  }
>  EXPORT_SYMBOL_GPL(rpcauth_init_credcache);
>  
> +bool
> +rpcauth_map_to_svc_cred(struct rpc_auth *auth, const struct cred
> *cred,
> + struct svc_cred *svc)
> +{
> + svc->cr_uid = cred->uid;
> + svc->cr_gid = cred->gid;
> + svc->cr_flavor = auth->au_flavor;
> + svc->cr_principal = NULL;
> + svc->cr_gss_mech = NULL;

Setting the above to NULLs makes me a little nervous, but these values
are usually handled at the RPC layer during svc_authenticate. The
localio code hooks in below that, so this should hopefully not be a
problem.


> + if (cred->group_info)
> + svc->cr_group_info = get_group_info(cred-
> >group_info);
> +
> + return true;
> +}
> +EXPORT_SYMBOL_GPL(rpcauth_map_to_svc_cred);
> +
>  char *
>  rpcauth_stringify_acceptor(struct rpc_cred *cred)
>  {

--
Jeff Layton <[email protected]>

2024-06-10 12:21:57

by Jeff Layton

[permalink] [raw]
Subject: Re: [for-6.11 PATCH 04/29] sunrpc: handle NULL req->defer in cache_defer_req

On Fri, 2024-06-07 at 10:26 -0400, Mike Snitzer wrote:
> From: Weston Andros Adamson <[email protected]>
>
> Dont crash with a NULL pointer dereference when req->defer isn't
> set. This is needed for the localio path.
>
> Signed-off-by: Weston Andros Adamson <[email protected]>
> Signed-off-by: Lance Shelton <[email protected]>
> Signed-off-by: Trond Myklebust <[email protected]>
> Signed-off-by: Mike Snitzer <[email protected]>
> ---
>  net/sunrpc/cache.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/net/sunrpc/cache.c b/net/sunrpc/cache.c
> index 95ff74706104..b757b891382c 100644
> --- a/net/sunrpc/cache.c
> +++ b/net/sunrpc/cache.c
> @@ -714,6 +714,8 @@ static bool cache_defer_req(struct cache_req
> *req, struct cache_head *item)
>   return false;
>   }
>  
> + if (!req->defer)
> + return false;
>   dreq = req->defer(req);
>   if (dreq == NULL)
>   return false;

I've gone over it many times, but I still don't quite "get" the
deferral handling code. I think the above is probably safe, but please
do Cc Neil Brown on later postings of this series since he has a better
grasp of that code.
--
Jeff Layton <[email protected]>

2024-06-10 12:43:39

by Jeff Layton

[permalink] [raw]
Subject: Re: [for-6.11 PATCH 10/29] nfs/nfsd: add "local io" support

On Fri, 2024-06-07 at 10:26 -0400, Mike Snitzer wrote:
> From: Weston Andros Adamson <[email protected]>
>
> Add client support for bypassing NFS for localhost reads, writes, and commits.
>
> This is only useful when the client and the server are running on the same
> host and in the same container.
>
> This has dynamic binding with the nfsd module. Local i/o will only work if
> nfsd is already loaded.
>
> [snitm: rebase accounted for commit d8b26071e65e8 ("NFSD: simplify struct nfsfh")
>  and commit 7c98f7cb8fda ("remove call_{read,write}_iter() functions")]
>
> Signed-off-by: Weston Andros Adamson <[email protected]>
> Signed-off-by: Jeff Layton <[email protected]>
> Signed-off-by: Peng Tao <[email protected]>
> Signed-off-by: Lance Shelton <[email protected]>
> Signed-off-by: Trond Myklebust <[email protected]>
> Signed-off-by: Mike Snitzer <[email protected]>
> ---
>  fs/nfs/Makefile                        |   2 +-
>  fs/nfs/client.c                        |  12 +
>  fs/nfs/filelayout/filelayout.c         |   6 +-
>  fs/nfs/flexfilelayout/flexfilelayout.c |   6 +-
>  fs/nfs/inode.c                         |   5 +
>  fs/nfs/internal.h                      |  32 +-
>  fs/nfs/localio.c                       | 933 +++++++++++++++++++++++++
>  fs/nfs/nfstrace.h                      |  29 +
>  fs/nfs/pagelist.c                      |  12 +-
>  fs/nfs/pnfs_nfs.c                      |   2 +-
>  fs/nfs/write.c                         |  14 +-
>  fs/nfsd/Makefile                       |   2 +-
>  fs/nfsd/filecache.c                    |   2 +-
>  fs/nfsd/localio.c                      | 179 +++++
>  fs/nfsd/trace.h                        |   3 +-
>  fs/nfsd/vfs.h                          |   8 +
>  include/linux/nfs.h                    |   6 +
>  include/linux/nfs_fs.h                 |   2 +
>  include/linux/nfs_fs_sb.h              |   2 +
>  include/linux/nfs_xdr.h                |   1 +
>  20 files changed, 1240 insertions(+), 18 deletions(-)
>  create mode 100644 fs/nfs/localio.c
>  create mode 100644 fs/nfsd/localio.c
>
> diff --git a/fs/nfs/Makefile b/fs/nfs/Makefile
> index 5f6db37f461e..af64cf5ea420 100644
> --- a/fs/nfs/Makefile
> +++ b/fs/nfs/Makefile
> @@ -9,7 +9,7 @@ CFLAGS_nfstrace.o += -I$(src)
>  nfs-y  := client.o dir.o file.o getroot.o inode.o super.o \
>      io.o direct.o pagelist.o read.o symlink.o unlink.o \
>      write.o namespace.o mount_clnt.o nfstrace.o \
> -    export.o sysfs.o fs_context.o
> +    export.o sysfs.o fs_context.o localio.o
>  nfs-$(CONFIG_ROOT_NFS) += nfsroot.o
>  nfs-$(CONFIG_SYSCTL) += sysctl.o
>  nfs-$(CONFIG_NFS_FSCACHE) += fscache.o
> diff --git a/fs/nfs/client.c b/fs/nfs/client.c
> index dd3278dcfca8..288de750fd3b 100644
> --- a/fs/nfs/client.c
> +++ b/fs/nfs/client.c
> @@ -170,6 +170,7 @@ struct nfs_client *nfs_alloc_client(const struct nfs_client_initdata *cl_init)
>   }
>  
>   INIT_LIST_HEAD(&clp->cl_superblocks);
> + INIT_LIST_HEAD(&clp->cl_local_addrs);
>   clp->cl_rpcclient = ERR_PTR(-EINVAL);
>  
>   clp->cl_flags = cl_init->init_flags;
> @@ -183,6 +184,7 @@ struct nfs_client *nfs_alloc_client(const struct nfs_client_initdata *cl_init)
>  
>   clp->cl_principal = "*";
>   clp->cl_xprtsec = cl_init->xprtsec;
> + nfs_probe_local_addr(clp);
>   return clp;
>  
>  error_cleanup:
> @@ -236,10 +238,19 @@ static void pnfs_init_server(struct nfs_server *server)
>   */
>  void nfs_free_client(struct nfs_client *clp)
>  {
> + struct nfs_local_addr *addr, *tmp;
> +
> + nfs_local_disable(clp);
> +
>   /* -EIO all pending I/O */
>   if (!IS_ERR(clp->cl_rpcclient))
>   rpc_shutdown_client(clp->cl_rpcclient);
>  
> + list_for_each_entry_safe(addr, tmp, &clp->cl_local_addrs, cl_addrs) {
> + list_del(&addr->cl_addrs);
> + kfree(addr);
> + }
> +
>   put_net(clp->cl_net);
>   put_nfs_version(clp->cl_nfs_mod);
>   kfree(clp->cl_hostname);
> @@ -427,6 +438,7 @@ struct nfs_client *nfs_get_client(const struct nfs_client_initdata *cl_init)
>   list_add_tail(&new->cl_share_link,
>   &nn->nfs_client_list);
>   spin_unlock(&nn->nfs_client_lock);
> + nfs_local_probe(new);
>   return rpc_ops->init_client(new, cl_init);
>   }
>  
> diff --git a/fs/nfs/filelayout/filelayout.c b/fs/nfs/filelayout/filelayout.c
> index d66f2efbd92f..bd8c717c31d2 100644
> --- a/fs/nfs/filelayout/filelayout.c
> +++ b/fs/nfs/filelayout/filelayout.c
> @@ -489,7 +489,7 @@ filelayout_read_pagelist(struct nfs_pageio_descriptor *desc,
>   /* Perform an asynchronous read to ds */
>   nfs_initiate_pgio(desc, ds->ds_clp, ds_clnt, hdr, hdr->cred,
>     NFS_PROTO(hdr->inode), &filelayout_read_call_ops,
> -   0, RPC_TASK_SOFTCONN);
> +   0, RPC_TASK_SOFTCONN, NULL);
>   return PNFS_ATTEMPTED;
>  }
>  
> @@ -532,7 +532,7 @@ filelayout_write_pagelist(struct nfs_pageio_descriptor *desc,
>   /* Perform an asynchronous write */
>   nfs_initiate_pgio(desc, ds->ds_clp, ds_clnt, hdr, hdr->cred,
>     NFS_PROTO(hdr->inode), &filelayout_write_call_ops,
> -   sync, RPC_TASK_SOFTCONN);
> +   sync, RPC_TASK_SOFTCONN, NULL);
>   return PNFS_ATTEMPTED;
>  }
>  
> @@ -1014,7 +1014,7 @@ static int filelayout_initiate_commit(struct nfs_commit_data *data, int how)
>   return nfs_initiate_commit(ds->ds_clp, ds_clnt, data,
>      NFS_PROTO(data->inode),
>      &filelayout_commit_call_ops, how,
> -    RPC_TASK_SOFTCONN);
> +    RPC_TASK_SOFTCONN, NULL);
>  out_err:
>   pnfs_generic_prepare_to_resend_writes(data);
>   pnfs_generic_commit_release(data);
> diff --git a/fs/nfs/flexfilelayout/flexfilelayout.c b/fs/nfs/flexfilelayout/flexfilelayout.c
> index d7e9e5ef4085..ce6cb5d82427 100644
> --- a/fs/nfs/flexfilelayout/flexfilelayout.c
> +++ b/fs/nfs/flexfilelayout/flexfilelayout.c
> @@ -1808,7 +1808,7 @@ ff_layout_read_pagelist(struct nfs_pageio_descriptor *desc,
>     ds->ds_clp->rpc_ops,
>     vers == 3 ? &ff_layout_read_call_ops_v3 :
>         &ff_layout_read_call_ops_v4,
> -   0, RPC_TASK_SOFTCONN);
> +   0, RPC_TASK_SOFTCONN, NULL);
>   put_cred(ds_cred);
>   return PNFS_ATTEMPTED;
>  
> @@ -1878,7 +1878,7 @@ ff_layout_write_pagelist(struct nfs_pageio_descriptor *desc,
>     ds->ds_clp->rpc_ops,
>     vers == 3 ? &ff_layout_write_call_ops_v3 :
>         &ff_layout_write_call_ops_v4,
> -   sync, RPC_TASK_SOFTCONN);
> +   sync, RPC_TASK_SOFTCONN, NULL);
>   put_cred(ds_cred);
>   return PNFS_ATTEMPTED;
>  
> @@ -1953,7 +1953,7 @@ static int ff_layout_initiate_commit(struct nfs_commit_data *data, int how)
>   ret = nfs_initiate_commit(ds->ds_clp, ds_clnt, data, ds->ds_clp->rpc_ops,
>      vers == 3 ? &ff_layout_commit_call_ops_v3 :
>          &ff_layout_commit_call_ops_v4,
> -    how, RPC_TASK_SOFTCONN);
> +    how, RPC_TASK_SOFTCONN, NULL);
>   put_cred(ds_cred);
>   return ret;
>  out_err:
> diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
> index acef52ecb1bb..4f88b860494f 100644
> --- a/fs/nfs/inode.c
> +++ b/fs/nfs/inode.c
> @@ -39,6 +39,7 @@
>  #include <linux/slab.h>
>  #include <linux/compat.h>
>  #include <linux/freezer.h>
> +#include <linux/file.h>
>  #include <linux/uaccess.h>
>  #include <linux/iversion.h>
>  
> @@ -1053,6 +1054,7 @@ struct nfs_open_context *alloc_nfs_open_context(struct dentry *dentry,
>   ctx->lock_context.open_context = ctx;
>   INIT_LIST_HEAD(&ctx->list);
>   ctx->mdsthreshold = NULL;
> + ctx->local_filp = NULL;
>   return ctx;
>  }
>  EXPORT_SYMBOL_GPL(alloc_nfs_open_context);
> @@ -1084,6 +1086,8 @@ static void __put_nfs_open_context(struct nfs_open_context *ctx, int is_sync)
>   nfs_sb_deactive(sb);
>   put_rpccred(rcu_dereference_protected(ctx->ll_cred, 1));
>   kfree(ctx->mdsthreshold);
> + if (!IS_ERR_OR_NULL(ctx->local_filp))
> + fput(ctx->local_filp);
>   kfree_rcu(ctx, rcu_head);
>  }
>  
> @@ -2495,6 +2499,7 @@ static int __init init_nfs_fs(void)
>   if (err)
>   goto out1;
>  
> + nfs_local_init();
>   err = register_nfs_fs();
>   if (err)
>   goto out0;
> diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
> index 873c2339b78a..67b348447a40 100644
> --- a/fs/nfs/internal.h
> +++ b/fs/nfs/internal.h
> @@ -204,6 +204,12 @@ struct nfs_mount_request {
>   struct net *net;
>  };
>  
> +struct nfs_local_addr {
> + struct list_head cl_addrs;
> + struct sockaddr_storage address;
> + size_t addrlen;
> +};
> +
>  extern int nfs_mount(struct nfs_mount_request *info, int timeo, int retrans);
>  extern void nfs_umount(const struct nfs_mount_request *info);
>  
> @@ -309,7 +315,8 @@ int nfs_generic_pgio(struct nfs_pageio_descriptor *, struct nfs_pgio_header *);
>  int nfs_initiate_pgio(struct nfs_pageio_descriptor *, struct nfs_client *clp,
>         struct rpc_clnt *rpc_clnt, struct nfs_pgio_header *hdr,
>         const struct cred *cred, const struct nfs_rpc_ops *rpc_ops,
> -       const struct rpc_call_ops *call_ops, int how, int flags);
> +       const struct rpc_call_ops *call_ops, int how, int flags,
> +       struct file *localio);
>  void nfs_free_request(struct nfs_page *req);
>  struct nfs_pgio_mirror *
>  nfs_pgio_current_mirror(struct nfs_pageio_descriptor *desc);
> @@ -450,6 +457,26 @@ extern void nfs_set_cache_invalid(struct inode *inode, unsigned long flags);
>  extern bool nfs_check_cache_invalid(struct inode *, unsigned long);
>  extern int nfs_wait_bit_killable(struct wait_bit_key *key, int mode);
>  
> +/* localio.c */
> +extern void nfs_local_init(void);
> +extern void nfs_local_enable(struct nfs_client *);
> +extern void nfs_local_disable(struct nfs_client *);
> +extern void nfs_local_probe(struct nfs_client *);
> +extern struct file *nfs_local_open_fh(struct nfs_client *, const struct cred *,
> +       struct nfs_fh *, const fmode_t);
> +extern struct file *nfs_local_file_open(struct nfs_client *clp,
> + const struct cred *cred,
> + struct nfs_fh *fh,
> + struct nfs_open_context *ctx);
> +extern int nfs_local_doio(struct nfs_client *, struct file *,
> +   struct nfs_pgio_header *,
> +   const struct rpc_call_ops *);
> +extern int nfs_local_commit(struct nfs_client *, struct file *,
> +     struct nfs_commit_data *,
> +     const struct rpc_call_ops *, int);
> +extern void nfs_probe_local_addr(struct nfs_client *clnt);
> +extern bool nfs_server_is_local(const struct nfs_client *clp);
> +
>  /* super.c */
>  extern const struct super_operations nfs_sops;
>  bool nfs_auth_info_match(const struct nfs_auth_info *, rpc_authflavor_t);
> @@ -530,7 +557,8 @@ extern int nfs_initiate_commit(struct nfs_client *clp,
>          struct nfs_commit_data *data,
>          const struct nfs_rpc_ops *nfs_ops,
>          const struct rpc_call_ops *call_ops,
> -        int how, int flags);
> +        int how, int flags,
> +        struct file *localio);
>  extern void nfs_init_commit(struct nfs_commit_data *data,
>       struct list_head *head,
>       struct pnfs_layout_segment *lseg,
> diff --git a/fs/nfs/localio.c b/fs/nfs/localio.c
> new file mode 100644
> index 000000000000..5c69eb0fe7b6
> --- /dev/null
> +++ b/fs/nfs/localio.c
> @@ -0,0 +1,933 @@
> +/*
> + *  linux/fs/nfs/localio.c
> + *
> + *  Copyright (C) 2014  Weston Andros Adamson <[email protected]>
> + *
> + */
> +
> +#include <linux/module.h>
> +#include <linux/errno.h>
> +#include <linux/vfs.h>
> +#include <linux/file.h>
> +#include <linux/inet.h>
> +#include <linux/sunrpc/addr.h>
> +#include <linux/inetdevice.h>
> +#include <net/addrconf.h>
> +#include <linux/module.h>
> +#include <linux/bvec.h>
> +
> +#include <linux/nfs.h>
> +#include <linux/nfs_fs.h>
> +#include <linux/nfs_xdr.h>
> +
> +#include <uapi/linux/if_arp.h>
> +
> +#include "internal.h"
> +#include "pnfs.h"
> +#include "nfstrace.h"
> +
> +#define NFSDBG_FACILITY NFSDBG_VFS
> +
> +extern int nfsd_open_local_fh(struct rpc_clnt *rpc_clnt,
> +       const struct cred *cred,
> +       const struct nfs_fh *nfs_fh, const fmode_t fmode,
> +       struct file **pfilp);
> +/*
> + * The localio code needs to call into nfsd to do the filehandle -> struct path
> + * mapping, but cannot be statically linked, because that will make the nfs
> + * module depend on the nfsd module.
> + *
> + * Instead, do dynamic linking to the nfsd module. This way the nfs module
> + * will only hold a reference on nfsd when it's actually in use. This also
> + * allows some sanity checking, like giving up on localio if nfsd isn't loaded.
> + */
> +
> +struct nfs_local_open_ctx {
> + spinlock_t lock;
> + nfs_to_nfsd_open_t open_f;
> + atomic_t refcount;
> +};
> +
> +struct nfs_local_kiocb {
> + struct kiocb kiocb;
> + struct bio_vec *bvec;
> + struct nfs_pgio_header *hdr;
> + struct work_struct work;
> +};
> +
> +struct nfs_local_fsync_ctx {
> + struct file *filp;
> + struct nfs_commit_data *data;
> + struct work_struct work;
> + struct kref kref;
> +};
> +static void nfs_local_fsync_work(struct work_struct *work);
> +
> +/*
> + * We need to translate between nfs status return values and
> + * the local errno values which may not be the same.
> + */
> +static struct {
> + __u32 stat;
> + int errno;
> +} nfs_errtbl[] = {
> + { NFS4_OK, 0 },
> + { NFS4ERR_PERM, -EPERM },
> + { NFS4ERR_NOENT, -ENOENT },
> + { NFS4ERR_IO, -EIO },
> + { NFS4ERR_NXIO, -ENXIO },
> + { NFS4ERR_FBIG, -E2BIG },
> + { NFS4ERR_STALE, -EBADF },
> + { NFS4ERR_ACCESS, -EACCES },
> + { NFS4ERR_EXIST, -EEXIST },
> + { NFS4ERR_XDEV, -EXDEV },
> + { NFS4ERR_MLINK, -EMLINK },
> + { NFS4ERR_NOTDIR, -ENOTDIR },
> + { NFS4ERR_ISDIR, -EISDIR },
> + { NFS4ERR_INVAL, -EINVAL },
> + { NFS4ERR_FBIG, -EFBIG },
> + { NFS4ERR_NOSPC, -ENOSPC },
> + { NFS4ERR_ROFS, -EROFS },
> + { NFS4ERR_NAMETOOLONG, -ENAMETOOLONG },
> + { NFS4ERR_NOTEMPTY, -ENOTEMPTY },
> + { NFS4ERR_DQUOT, -EDQUOT },
> + { NFS4ERR_STALE, -ESTALE },
> + { NFS4ERR_STALE, -EOPENSTALE },
> + { NFS4ERR_DELAY, -ETIMEDOUT },
> + { NFS4ERR_DELAY, -ERESTARTSYS },
> + { NFS4ERR_DELAY, -EAGAIN },
> + { NFS4ERR_DELAY, -ENOMEM },
> + { NFS4ERR_IO, -ETXTBSY },
> + { NFS4ERR_IO, -EBUSY },
> + { NFS4ERR_BADHANDLE, -EBADHANDLE },
> + { NFS4ERR_BAD_COOKIE, -EBADCOOKIE },
> + { NFS4ERR_NOTSUPP, -EOPNOTSUPP },
> + { NFS4ERR_TOOSMALL, -ETOOSMALL },
> + { NFS4ERR_SERVERFAULT, -ESERVERFAULT },
> + { NFS4ERR_SERVERFAULT, -ENFILE },
> + { NFS4ERR_IO, -EREMOTEIO },
> + { NFS4ERR_IO, -EUCLEAN },
> + { NFS4ERR_PERM, -ENOKEY },
> + { NFS4ERR_BADTYPE, -EBADTYPE },
> + { NFS4ERR_SYMLINK, -ELOOP },
> + { NFS4ERR_DEADLOCK, -EDEADLK },
> +};
> +
> +/*
> + * Convert an NFS error code to a local one.
> + * This one is used jointly by NFSv2 and NFSv3.
> + */
> +static __u32
> +nfs4errno(int errno)
> +{
> + unsigned int i;
> + for (i = 0; i < ARRAY_SIZE(nfs_errtbl); i++) {
> + if (nfs_errtbl[i].errno == errno)
> + return nfs_errtbl[i].stat;
> + }
> + /* If we cannot translate the error, the recovery routines should
> + * handle it.
> + * Note: remaining NFSv4 error codes have values > 10000, so should
> + * not conflict with native Linux error codes.
> + */
> + return NFS4ERR_SERVERFAULT;
> +}
> +
> +static struct nfs_local_open_ctx __local_open_ctx __read_mostly;
> +
> +static bool localio_enabled __read_mostly = true;
> +module_param(localio_enabled, bool, 0644);
> +
> +bool nfs_server_is_local(const struct nfs_client *clp)
> +{
> + return test_bit(NFS_CS_LOCAL_IO, &clp->cl_flags) != 0 &&
> + localio_enabled;
> +}
> +EXPORT_SYMBOL_GPL(nfs_server_is_local);
> +
> +void
> +nfs_local_init(void)
> +{
> + struct nfs_local_open_ctx *ctx = &__local_open_ctx;
> +
> + ctx->open_f = NULL;
> + spin_lock_init(&ctx->lock);
> + atomic_set(&ctx->refcount, 0);
> +}
> +
> +static bool
> +nfs_local_get_lookup_ctx(void)
> +{
> + struct nfs_local_open_ctx *ctx = &__local_open_ctx;
> + nfs_to_nfsd_open_t fn = NULL;
> +
> + spin_lock(&ctx->lock);
> + if (ctx->open_f == NULL) {
> + spin_unlock(&ctx->lock);
> +
> + fn = symbol_request(nfsd_open_local_fh);
> + if (!fn)
> + return false;
> +
> + spin_lock(&ctx->lock);
> + /* catch race */
> + if (ctx->open_f == NULL) {
> + ctx->open_f = fn;
> + fn = NULL;
> + }
> + }
> + atomic_inc(&ctx->refcount);
> + spin_unlock(&ctx->lock);
> + if (fn)
> + symbol_put(nfsd_open_local_fh);
> + return true;
> +}
> +
> +static void
> +nfs_local_put_lookup_ctx(void)
> +{
> + struct nfs_local_open_ctx *ctx = &__local_open_ctx;
> + nfs_to_nfsd_open_t fn;
> +
> + if (atomic_dec_and_lock(&ctx->refcount, &ctx->lock)) {
> + fn = ctx->open_f;
> + ctx->open_f = NULL;
> + spin_unlock(&ctx->lock);
> + if (fn)
> + symbol_put(nfsd_open_local_fh);
> + dprintk("destroy lookup context\n");
> + }
> +}
> +
> +/*
> + * nfs_local_enable - attempt to enable local i/o for an nfs_client
> + */
> +void
> +nfs_local_enable(struct nfs_client *clp)
> +{
> + if (nfs_local_get_lookup_ctx()) {
> + dprintk("enabled local i/o\n");
> + set_bit(NFS_CS_LOCAL_IO, &clp->cl_flags);
> + }
> +}
> +EXPORT_SYMBOL_GPL(nfs_local_enable);
> +
> +/*
> + * nfs_local_disable - disable local i/o for an nfs_client
> + */
> +void
> +nfs_local_disable(struct nfs_client *clp)
> +{
> + if (test_and_clear_bit(NFS_CS_LOCAL_IO, &clp->cl_flags)) {
> + dprintk("disabled local i/o\n");
> + nfs_local_put_lookup_ctx();
> + }
> +}
> +
> +/*
> + * nfs_local_probe - probe local i/o support for an nfs_client
> + */
> +void
> +nfs_local_probe(struct nfs_client *clp)
> +{
> + struct sockaddr_in *sin;
> + struct sockaddr_in6 *sin6;
> + struct nfs_local_addr *addr;
> + struct sockaddr *sap;
> + bool enable = false;
> +
> + switch (clp->cl_addr.ss_family) {
> + case AF_INET:
> + sin = (struct sockaddr_in *)&clp->cl_addr;
> + if (ipv4_is_loopback(sin->sin_addr.s_addr)) {
> + dprintk("%s: detected IPv4 loopback address\n",
> + __func__);
> + enable = true;
> + }
> + break;
> + case AF_INET6:
> + sin6 = (struct sockaddr_in6 *)&clp->cl_addr;
> + if (memcmp(&sin6->sin6_addr, &in6addr_loopback,
> +     sizeof(struct in6_addr)) == 0) {
> + dprintk("%s: detected IPv6 loopback address\n",
> + __func__);
> + enable = true;
> + }
> + break;
> + default:
> + break;
> + }
> +
> + if (enable)
> + goto out;
> +
> + list_for_each_entry(addr, &clp->cl_local_addrs, cl_addrs) {
> + sap = (struct sockaddr *)&addr->address;
> + if (rpc_cmp_addr((struct sockaddr *)&clp->cl_addr, sap)) {
> + dprintk("%s: detected local server.\n", __func__);
> + enable = true;
> + break;
> + }
> + }
> +
> +out:
> + if (enable)
> + nfs_local_enable(clp);
> +}
> +
> +/*
> + * nfs_local_open_fh - open a local filehandle
> + *
> + * Returns a pointer to a struct file or an ERR_PTR
> + */
> +struct file *
> +nfs_local_open_fh(struct nfs_client *clp, const struct cred *cred,
> +   struct nfs_fh *fh, const fmode_t mode)
> +{
> + struct nfs_local_open_ctx *ctx = &__local_open_ctx;
> + struct file *filp;
> + int status;
> +
> + if (mode & ~(FMODE_READ | FMODE_WRITE))
> + return ERR_PTR(-EINVAL);
> +
> + status = ctx->open_f(clp->cl_rpcclient, cred, fh, mode, &filp);
> + if (status < 0) {
> + dprintk("%s: open local file failed error=%d\n",
> + __func__, status);
> + trace_nfs_local_open_fh(fh, mode, status);
> + switch (status) {
> + case -ENXIO:
> + nfs_local_disable(clp);
> + fallthrough;
> + case -ETIMEDOUT:
> + status = -EAGAIN;
> + }
> + filp = ERR_PTR(status);
> + }
> + return filp;
> +}
> +EXPORT_SYMBOL_GPL(nfs_local_open_fh);
> +
> +static struct bio_vec *
> +nfs_bvec_alloc_and_import_pagevec(struct page **pagevec,
> + unsigned int npages, gfp_t flags)
> +{
> + struct bio_vec *bvec, *p;
> +
> + bvec = kmalloc_array(npages, sizeof(*bvec), flags);
> + if (bvec != NULL) {
> + for (p = bvec; npages > 0; p++, pagevec++, npages--) {
> + p->bv_page = *pagevec;
> + p->bv_len = PAGE_SIZE;
> + p->bv_offset = 0;
> + }
> + }
> + return bvec;
> +}
> +
> +static void
> +nfs_local_iocb_free(struct nfs_local_kiocb *iocb)
> +{
> + kfree(iocb->bvec);
> + kfree(iocb);
> +}
> +
> +static struct nfs_local_kiocb *
> +nfs_local_iocb_alloc(struct nfs_pgio_header *hdr, struct file *filp,
> + gfp_t flags)
> +{
> + struct nfs_local_kiocb *iocb;
> +
> + iocb = kmalloc(sizeof(*iocb), flags);
> + if (iocb == NULL)
> + return NULL;
> + iocb->bvec = nfs_bvec_alloc_and_import_pagevec(hdr->page_array.pagevec,
> + hdr->page_array.npages, flags);
> + if (iocb->bvec == NULL) {
> + kfree(iocb);
> + return NULL;
> + }
> + init_sync_kiocb(&iocb->kiocb, filp);
> + iocb->kiocb.ki_pos = hdr->args.offset;
> + iocb->hdr = hdr;
> + /* FIXME: NFS_IOHDR_ODIRECT isn't ever set */
> + if (test_bit(NFS_IOHDR_ODIRECT, &hdr->flags))
> + iocb->kiocb.ki_flags |= IOCB_DIRECT|IOCB_DSYNC;
> + iocb->kiocb.ki_flags &= ~IOCB_APPEND;
> + return iocb;
> +}
> +
> +static void
> +nfs_local_iter_init(struct iov_iter *i, struct nfs_local_kiocb *iocb, int dir)
> +{
> + struct nfs_pgio_header *hdr = iocb->hdr;
> +
> + if (hdr->args.pgbase != 0) {
> + iov_iter_bvec(i, dir, iocb->bvec,
> + hdr->page_array.npages,
> + hdr->args.count + hdr->args.pgbase);
> + iov_iter_advance(i, hdr->args.pgbase);
> + } else
> + iov_iter_bvec(i, dir, iocb->bvec,
> + hdr->page_array.npages, hdr->args.count);
> +}
> +
> +static void
> +nfs_local_hdr_release(struct nfs_pgio_header *hdr,
> + const struct rpc_call_ops *call_ops)
> +{
> + call_ops->rpc_call_done(&hdr->task, hdr);
> + call_ops->rpc_release(hdr);
> +}
> +
> +static void
> +nfs_local_pgio_init(struct nfs_pgio_header *hdr,
> + const struct rpc_call_ops *call_ops)
> +{
> + hdr->task.tk_ops = call_ops;
> + if (!hdr->task.tk_start)
> + hdr->task.tk_start = ktime_get();
> +}
> +
> +static void
> +nfs_local_pgio_done(struct nfs_pgio_header *hdr, long status)
> +{
> + if (status >= 0) {
> + hdr->res.count = status;
> + hdr->res.op_status = NFS4_OK;
> + hdr->task.tk_status = 0;
> + } else {
> + hdr->res.op_status = nfs4errno(status);
> + hdr->task.tk_status = status;
> + }
> +}
> +
> +static void
> +nfs_local_pgio_release(struct nfs_local_kiocb *iocb)
> +{
> + struct nfs_pgio_header *hdr = iocb->hdr;
> +
> + fput(iocb->kiocb.ki_filp);
> + nfs_local_iocb_free(iocb);
> + nfs_local_hdr_release(hdr, hdr->task.tk_ops);
> +}
> +
> +static void
> +nfs_local_read_aio_complete_work(struct work_struct *work)
> +{
> + struct nfs_local_kiocb *iocb = container_of(work,
> + struct nfs_local_kiocb, work);
> +
> + nfs_local_pgio_release(iocb);
> +}
> +
> +/*
> + * Complete the I/O from iocb->kiocb.ki_complete()
> + *
> + * Note that this function can be called from a bottom half context,
> + * hence we need to queue the fput() etc to a workqueue
> + */
> +static void
> +nfs_local_pgio_complete(struct nfs_local_kiocb *iocb)
> +{
> + queue_work(nfsiod_workqueue, &iocb->work);
> +}
> +
> +static void
> +nfs_local_read_done(struct nfs_local_kiocb *iocb, long status)
> +{
> + struct nfs_pgio_header *hdr = iocb->hdr;
> + struct file *filp = iocb->kiocb.ki_filp;
> +
> + nfs_local_pgio_done(hdr, status);
> +
> + if (hdr->res.count != hdr->args.count ||
> +     hdr->args.offset + hdr->res.count >= i_size_read(file_inode(filp)))
> + hdr->res.eof = true;
> +
> + dprintk("%s: read %ld bytes eof %d.\n", __func__,
> + status > 0 ? status : 0, hdr->res.eof);
> +}
> +
> +static void
> +nfs_local_read_aio_complete(struct kiocb *kiocb, long ret)
> +{
> + struct nfs_local_kiocb *iocb = container_of(kiocb,
> + struct nfs_local_kiocb, kiocb);
> +
> + nfs_local_read_done(iocb, ret);
> + nfs_local_pgio_complete(iocb);
> +}
> +
> +static int
> +nfs_do_local_read(struct nfs_pgio_header *hdr, struct file *filp,
> + const struct rpc_call_ops *call_ops)
> +{
> + struct nfs_local_kiocb *iocb;
> + struct iov_iter iter;
> + ssize_t status;
> +
> + dprintk("%s: vfs_read count=%u pos=%llu\n",
> + __func__, hdr->args.count, hdr->args.offset);
> +
> + iocb = nfs_local_iocb_alloc(hdr, filp, GFP_KERNEL);
> + if (iocb == NULL)
> + return -ENOMEM;
> + nfs_local_iter_init(&iter, iocb, READ);
> +
> + nfs_local_pgio_init(hdr, call_ops);
> + hdr->res.eof = false;
> +
> + if (iocb->kiocb.ki_flags & IOCB_DIRECT) {
> + INIT_WORK(&iocb->work, nfs_local_read_aio_complete_work);
> + iocb->kiocb.ki_complete = nfs_local_read_aio_complete;
> + }
> +
> + status = filp->f_op->read_iter(&iocb->kiocb, &iter);
> + if (status != -EIOCBQUEUED) {
> + nfs_local_read_done(iocb, status);
> + nfs_local_pgio_release(iocb);
> + }
> + return 0;
> +}
> +
> +static void
> +nfs_copy_boot_verifier(struct nfs_write_verifier *verifier, struct inode *inode)
> +{
> + struct nfs_client *clp = NFS_SERVER(inode)->nfs_client;
> + u32 *verf = (u32 *)verifier->data;
> + int seq = 0;
> +
> + do {
> + read_seqbegin_or_lock(&clp->cl_boot_lock, &seq);
> + verf[0] = (u32)clp->cl_nfssvc_boot.tv_sec;
> + verf[1] = (u32)clp->cl_nfssvc_boot.tv_nsec;
> + } while (need_seqretry(&clp->cl_boot_lock, seq));
> + done_seqretry(&clp->cl_boot_lock, seq);
> +}
> +
> +static void
> +nfs_reset_boot_verifier(struct inode *inode)
> +{
> + struct nfs_client *clp = NFS_SERVER(inode)->nfs_client;
> +
> + write_seqlock(&clp->cl_boot_lock);
> + ktime_get_real_ts64(&clp->cl_nfssvc_boot);
> + write_sequnlock(&clp->cl_boot_lock);
> +}
> +
> +static void
> +nfs_set_local_verifier(struct inode *inode,
> + struct nfs_writeverf *verf,
> + enum nfs3_stable_how how)
> +{
> +
> + nfs_copy_boot_verifier(&verf->verifier, inode);
> + verf->committed = how;
> +}
> +
> +static void
> +nfs_get_vfs_attr(struct file *filp, struct nfs_fattr *fattr)
> +{
> + struct kstat stat;
> +
> + if (fattr != NULL && vfs_getattr(&filp->f_path, &stat,
> + STATX_INO |
> + STATX_ATIME |
> + STATX_MTIME |
> + STATX_CTIME |
> + STATX_SIZE |
> + STATX_BLOCKS,
> + AT_STATX_SYNC_AS_STAT) == 0) {
> + fattr->valid = NFS_ATTR_FATTR_FILEID |
> + NFS_ATTR_FATTR_CHANGE |
> + NFS_ATTR_FATTR_SIZE |
> + NFS_ATTR_FATTR_ATIME |
> + NFS_ATTR_FATTR_MTIME |
> + NFS_ATTR_FATTR_CTIME |
> + NFS_ATTR_FATTR_SPACE_USED;
> + fattr->fileid = stat.ino;
> + fattr->size = stat.size;
> + fattr->atime = stat.atime;
> + fattr->mtime = stat.mtime;
> + fattr->ctime = stat.ctime;
> + fattr->change_attr = nfs_timespec_to_change_attr(&fattr->ctime);
> + fattr->du.nfs3.used = stat.blocks << 9;
> + }
> +}
> +
> +static void
> +nfs_local_write_done(struct nfs_local_kiocb *iocb, long status)
> +{
> + struct nfs_pgio_header *hdr = iocb->hdr;
> +
> + dprintk("%s: wrote %ld bytes.\n", __func__, status > 0 ? status : 0);
> +
> + /* Handle short writes as if they are ENOSPC */
> + if (status > 0 && status < hdr->args.count) {
> + hdr->mds_offset += status;
> + hdr->args.offset += status;
> + hdr->args.pgbase += status;
> + hdr->args.count -= status;
> + nfs_set_pgio_error(hdr, -ENOSPC, hdr->args.offset);
> + status = -ENOSPC;
> + }
> + if (status < 0)
> + nfs_reset_boot_verifier(hdr->inode);
> + nfs_local_pgio_done(hdr, status);
> +}
> +
> +static void
> +nfs_local_write_aio_complete_work(struct work_struct *work)
> +{
> + struct nfs_local_kiocb *iocb = container_of(work,
> + struct nfs_local_kiocb, work);
> +
> + nfs_get_vfs_attr(iocb->kiocb.ki_filp, iocb->hdr->res.fattr);
> + nfs_local_pgio_release(iocb);
> +}
> +
> +static void
> +nfs_local_write_aio_complete(struct kiocb *kiocb, long ret)
> +{
> + struct nfs_local_kiocb *iocb = container_of(kiocb,
> + struct nfs_local_kiocb, kiocb);
> +
> + nfs_local_write_done(iocb, ret);
> + nfs_local_pgio_complete(iocb);
> +}
> +
> +static int
> +nfs_do_local_write(struct nfs_pgio_header *hdr, struct file *filp,
> + const struct rpc_call_ops *call_ops)
> +{
> + struct nfs_local_kiocb *iocb;
> + struct iov_iter iter;
> + ssize_t status;
> +
> + dprintk("%s: vfs_write count=%u pos=%llu %s\n",
> + __func__, hdr->args.count, hdr->args.offset,
> + (hdr->args.stable == NFS_UNSTABLE) ?  "unstable" : "stable");
> +
> + iocb = nfs_local_iocb_alloc(hdr, filp, GFP_NOIO);
> + if (iocb == NULL)
> + return -ENOMEM;
> + nfs_local_iter_init(&iter, iocb, WRITE);
> +
> + switch (hdr->args.stable) {
> + default:
> + break;
> + case NFS_DATA_SYNC:
> + iocb->kiocb.ki_flags |= IOCB_DSYNC;
> + break;
> + case NFS_FILE_SYNC:
> + iocb->kiocb.ki_flags |= IOCB_DSYNC|IOCB_SYNC;
> + }
> + nfs_local_pgio_init(hdr, call_ops);
> +
> + if (iocb->kiocb.ki_flags & IOCB_DIRECT) {
> + INIT_WORK(&iocb->work, nfs_local_write_aio_complete_work);
> + iocb->kiocb.ki_complete = nfs_local_write_aio_complete;
> + }
> +
> + nfs_set_local_verifier(hdr->inode, hdr->res.verf, hdr->args.stable);
> +
> + file_start_write(filp);
> + status = filp->f_op->write_iter(&iocb->kiocb, &iter);
> + file_end_write(filp);
> + if (status != -EIOCBQUEUED) {
> + nfs_local_write_done(iocb, status);
> + nfs_get_vfs_attr(filp, hdr->res.fattr);
> + nfs_local_pgio_release(iocb);
> + }
> + return 0;
> +}
> +
> +static struct file *
> +nfs_local_file_open_cached(struct nfs_client *clp, const struct cred *cred,
> +    struct nfs_fh *fh, struct nfs_open_context *ctx)
> +{
> + struct file *filp = ctx->local_filp;
> +
> + if (!filp) {
> + struct file *new = nfs_local_open_fh(clp, cred, fh, ctx->mode);
> + if (IS_ERR_OR_NULL(new))
> + return NULL;
> + /* try to put this one in the slot */
> + filp = cmpxchg(&ctx->local_filp, NULL, new);
> + if (filp != NULL)
> + fput(new);
> + else
> + filp = new;
> + }
> + return get_file(filp);
> +}
> +
> +struct file *
> +nfs_local_file_open(struct nfs_client *clp, const struct cred *cred,
> +     struct nfs_fh *fh, struct nfs_open_context *ctx)
> +{
> + if (!nfs_server_is_local(clp))
> + return NULL;
> + return nfs_local_file_open_cached(clp, cred, fh, ctx);
> +}
> +
> +int
> +nfs_local_doio(struct nfs_client *clp, struct file *filp,
> +        struct nfs_pgio_header *hdr,
> +        const struct rpc_call_ops *call_ops)
> +{
> + int status = 0;
> +
> + if (!hdr->args.count)
> + goto out_fput;
> + /* Don't support filesystems without read_iter/write_iter */
> + if (!filp->f_op->read_iter || !filp->f_op->write_iter) {
> + nfs_local_disable(clp);
> + status = -EAGAIN;
> + goto out_fput;
> + }
> +
> + switch (hdr->rw_mode) {
> + case FMODE_READ:
> + status = nfs_do_local_read(hdr, filp, call_ops);
> + break;
> + case FMODE_WRITE:
> + status = nfs_do_local_write(hdr, filp, call_ops);
> + break;
> + default:
> + dprintk("%s: invalid mode: %d\n", __func__,
> + hdr->rw_mode);
> + status = -EINVAL;
> + }
> +out_fput:
> + if (status != 0) {
> + fput(filp);
> + hdr->task.tk_status = status;
> + nfs_local_hdr_release(hdr, call_ops);
> + }
> + return status;
> +}
> +
> +static void
> +nfs_local_init_commit(struct nfs_commit_data *data,
> + const struct rpc_call_ops *call_ops)
> +{
> + data->task.tk_ops = call_ops;
> +}
> +
> +static int
> +nfs_local_run_commit(struct file *filp, struct nfs_commit_data *data)
> +{
> + loff_t start = data->args.offset;
> + loff_t end = LLONG_MAX;
> +
> + if (data->args.count > 0) {
> + end = start + data->args.count - 1;
> + if (end < start)
> + end = LLONG_MAX;
> + }
> +
> + dprintk("%s: commit %llu - %llu\n", __func__, start, end);
> + return vfs_fsync_range(filp, start, end, 0);
> +}
> +
> +static void
> +nfs_local_commit_done(struct nfs_commit_data *data, int status)
> +{
> + if (status >= 0) {
> + nfs_set_local_verifier(data->inode,
> + data->res.verf,
> + NFS_FILE_SYNC);
> + data->res.op_status = NFS4_OK;
> + data->task.tk_status = 0;
> + } else {
> + nfs_reset_boot_verifier(data->inode);
> + data->res.op_status = nfs4errno(status);
> + data->task.tk_status = status;
> + }
> +}
> +
> +static void
> +nfs_local_release_commit_data(struct file *filp,
> + struct nfs_commit_data *data,
> + const struct rpc_call_ops *call_ops)
> +{
> + fput(filp);
> + call_ops->rpc_call_done(&data->task, data);
> + call_ops->rpc_release(data);
> +}
> +
> +static struct nfs_local_fsync_ctx *
> +nfs_local_fsync_ctx_alloc(struct nfs_commit_data *data, struct file *filp,
> + gfp_t flags)
> +{
> + struct nfs_local_fsync_ctx *ctx = kmalloc(sizeof(*ctx), flags);
> +
> + if (ctx != NULL) {
> + ctx->filp = filp;
> + ctx->data = data;
> + INIT_WORK(&ctx->work, nfs_local_fsync_work);
> + kref_init(&ctx->kref);
> + }
> + return ctx;
> +}
> +
> +static void
> +nfs_local_fsync_ctx_kref_free(struct kref *kref)
> +{
> + kfree(container_of(kref, struct nfs_local_fsync_ctx, kref));
> +}
> +
> +static void
> +nfs_local_fsync_ctx_put(struct nfs_local_fsync_ctx *ctx)
> +{
> + kref_put(&ctx->kref, nfs_local_fsync_ctx_kref_free);
> +}
> +
> +static void
> +nfs_local_fsync_ctx_free(struct nfs_local_fsync_ctx *ctx)
> +{
> + nfs_local_release_commit_data(ctx->filp, ctx->data,
> + ctx->data->task.tk_ops);
> + nfs_local_fsync_ctx_put(ctx);
> +}
> +
> +static void
> +nfs_local_fsync_work(struct work_struct *work)
> +{
> + struct nfs_local_fsync_ctx *ctx;
> + int status;
> +
> + ctx = container_of(work, struct nfs_local_fsync_ctx, work);
> +
> + status = nfs_local_run_commit(ctx->filp, ctx->data);
> + nfs_local_commit_done(ctx->data, status);
> + nfs_local_fsync_ctx_free(ctx);
> +}
> +
> +int
> +nfs_local_commit(struct nfs_client *clp, struct file *filp,
> + struct nfs_commit_data *data,
> + const struct rpc_call_ops *call_ops, int how)
> +{
> + struct nfs_local_fsync_ctx *ctx;
> +
> + ctx = nfs_local_fsync_ctx_alloc(data, filp, GFP_KERNEL);
> + if (!ctx) {
> + nfs_local_commit_done(data, -ENOMEM);
> + nfs_local_release_commit_data(filp, data, call_ops);
> + return -ENOMEM;
> + }
> +
> + nfs_local_init_commit(data, call_ops);
> + kref_get(&ctx->kref);
> + queue_work(nfsiod_workqueue, &ctx->work);
> + if (how & FLUSH_SYNC)
> + flush_work(&ctx->work);
> + nfs_local_fsync_ctx_put(ctx);
> + return 0;
> +}
> +
> +static int
> +nfs_client_add_addr(struct nfs_client *clnt, char *buf, gfp_t flags)
> +{
> + struct nfs_local_addr *addr;
> + struct sockaddr *sap;
> +
> + dprintk("%s: adding new local IP %s\n", __func__, buf);
> + addr = kmalloc(sizeof(*addr), flags);
> + if (!addr) {
> + printk(KERN_WARNING "NFS: cannot alloc new addr\n");
> + return -ENOMEM;
> + }
> + sap = (struct sockaddr *)&addr->address;
> + addr->addrlen = rpc_pton(clnt->cl_net, buf, strlen(buf),
> + sap, sizeof(addr->address));
> + if (!addr->addrlen) {
> + printk(KERN_WARNING "NFS: cannot parse new addr %s\n",
> + buf);
> + kfree(addr);
> + return -EINVAL;
> + }
> + list_add(&addr->cl_addrs, &clnt->cl_local_addrs);
> +
> + return 0;
> +}
> +
> +static int
> +nfs_client_add_v4_addr(struct nfs_client *clnt, struct in_device *indev,
> +        char *buf, size_t buflen)
> +{
> + struct in_ifaddr *ifa;
> + int ret;
> +
> + in_dev_for_each_ifa_rtnl(ifa, indev) {
> + snprintf(buf, buflen, "%pI4", &ifa->ifa_local);
> + ret = nfs_client_add_addr(clnt, buf, GFP_KERNEL);
> + if (ret < 0)
> + return ret;
> + }
> +
> + return 0;
> +}
> +
> +#if IS_ENABLED(CONFIG_IPV6)
> +static int
> +nfs_client_add_v6_addr(struct nfs_client *clnt, struct inet6_dev *in6dev,
> +        char *buf, size_t buflen)
> +{
> + struct inet6_ifaddr *ifp;
> + int ret = 0;
> +
> + read_lock_bh(&in6dev->lock);
> + list_for_each_entry(ifp, &in6dev->addr_list, if_list) {
> + rpc_ntop6_addr_noscopeid(&ifp->addr, buf, buflen);
> + ret = nfs_client_add_addr(clnt, buf, GFP_ATOMIC);
> + if (ret < 0)
> + goto out;
> + }
> +out:
> + read_unlock_bh(&in6dev->lock);
> + return ret;
> +}
> +#else /* CONFIG_IPV6 */
> +static int
> +nfs_client_add_v6_addr(struct nfs_client *clnt, struct inet6_dev *in6dev,
> +        char *buf, size_t buflen)
> +{
> + return 0;
> +}
> +#endif
> +
> +/* Find out all local IP addresses. Ignore errors
> + * because local IO can be optional.
> + */
> +void
> +nfs_probe_local_addr(struct nfs_client *clnt)
> +{
> + struct net_device *dev;
> + struct in_device *indev;
> + struct inet6_dev *in6dev;
> + char buf[INET6_ADDRSTRLEN + IPV6_SCOPE_ID_LEN];
> + size_t buflen = sizeof(buf);
> +
> + rtnl_lock();
> +
> + for_each_netdev(clnt->cl_net, dev) {
> + if (dev->type == ARPHRD_LOOPBACK ||
> +     !(dev->flags & IFF_UP))
> + continue;
> + indev = __in_dev_get_rtnl(dev);
> + if (indev &&
> +     nfs_client_add_v4_addr(clnt, indev, buf, buflen) < 0)
> + break;
> + in6dev = __in6_dev_get(dev);
> + if (in6dev &&
> +     nfs_client_add_v6_addr(clnt, in6dev, buf, buflen) < 0)
> + break;
> + }
> +
> + rtnl_unlock();
> +}
> diff --git a/fs/nfs/nfstrace.h b/fs/nfs/nfstrace.h
> index 1e710654af11..45d4086cdeb1 100644
> --- a/fs/nfs/nfstrace.h
> +++ b/fs/nfs/nfstrace.h
> @@ -1681,6 +1681,35 @@ TRACE_EVENT(nfs_mount_path,
>   TP_printk("path='%s'", __get_str(path))
>  );
>  
> +TRACE_EVENT(nfs_local_open_fh,
> + TP_PROTO(
> + const struct nfs_fh *fh,
> + fmode_t fmode,
> + int error
> + ),
> +
> + TP_ARGS(fh, fmode, error),
> +
> + TP_STRUCT__entry(
> + __field(int, error)
> + __field(u32, fhandle)
> + __field(unsigned int, fmode)
> + ),
> +
> + TP_fast_assign(
> + __entry->error = error;
> + __entry->fhandle = nfs_fhandle_hash(fh);
> + __entry->fmode = (__force unsigned int)fmode;
> + ),
> +
> + TP_printk(
> + "error=%d fhandle=0x%08x mode=%s",
> + __entry->error,
> + __entry->fhandle,
> + show_fs_fmode_flags(__entry->fmode)
> + )
> +);
> +
>  DECLARE_EVENT_CLASS(nfs_xdr_event,
>   TP_PROTO(
>   const struct xdr_stream *xdr,
> diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
> index 3786d767e2ff..9210a1821ec9 100644
> --- a/fs/nfs/pagelist.c
> +++ b/fs/nfs/pagelist.c
> @@ -848,7 +848,8 @@ int nfs_initiate_pgio(struct nfs_pageio_descriptor *desc,
>         struct nfs_client *clp, struct rpc_clnt *rpc_clnt,
>         struct nfs_pgio_header *hdr, const struct cred *cred,
>         const struct nfs_rpc_ops *rpc_ops,
> -       const struct rpc_call_ops *call_ops, int how, int flags)
> +       const struct rpc_call_ops *call_ops, int how, int flags,
> +       struct file *localio)
>  {
>   struct rpc_task *task;
>   struct rpc_message msg = {
> @@ -878,10 +879,16 @@ int nfs_initiate_pgio(struct nfs_pageio_descriptor *desc,
>   hdr->args.count,
>   (unsigned long long)hdr->args.offset);
>  
> + if (localio) {
> + nfs_local_doio(clp, localio, hdr, call_ops);
> + goto out;
> + }
> +
>   task = rpc_run_task(&task_setup_data);
>   if (IS_ERR(task))
>   return PTR_ERR(task);
>   rpc_put_task(task);
> +out:
>   return 0;
>  }
>  EXPORT_SYMBOL_GPL(nfs_initiate_pgio);
> @@ -1080,7 +1087,8 @@ static int nfs_generic_pg_pgios(struct nfs_pageio_descriptor *desc)
>   NFS_PROTO(hdr->inode),
>   desc->pg_rpc_callops,
>   desc->pg_ioflags,
> - RPC_TASK_CRED_NOREF | task_flags);
> + RPC_TASK_CRED_NOREF | task_flags,
> + NULL);
>   }
>   return ret;
>  }
> diff --git a/fs/nfs/pnfs_nfs.c b/fs/nfs/pnfs_nfs.c
> index b29b50c2c933..ac3c5e6d4c5e 100644
> --- a/fs/nfs/pnfs_nfs.c
> +++ b/fs/nfs/pnfs_nfs.c
> @@ -538,7 +538,7 @@ pnfs_generic_commit_pagelist(struct inode *inode, struct list_head *mds_pages,
>       NFS_CLIENT(inode), data,
>       NFS_PROTO(data->inode),
>       data->mds_ops, how,
> -     RPC_TASK_CRED_NOREF);
> +     RPC_TASK_CRED_NOREF, NULL);
>   } else {
>   nfs_init_commit(data, NULL, data->lseg, cinfo);
>   initiate_commit(data, how);
> diff --git a/fs/nfs/write.c b/fs/nfs/write.c
> index c9cfa1308264..ba0b36b15bc1 100644
> --- a/fs/nfs/write.c
> +++ b/fs/nfs/write.c
> @@ -1672,7 +1672,8 @@ int nfs_initiate_commit(struct nfs_client *clp,
>   struct nfs_commit_data *data,
>   const struct nfs_rpc_ops *nfs_ops,
>   const struct rpc_call_ops *call_ops,
> - int how, int flags)
> + int how, int flags,
> + struct file *localio)
>  {
>   struct rpc_task *task;
>   int priority = flush_task_priority(how);
> @@ -1691,6 +1692,7 @@ int nfs_initiate_commit(struct nfs_client *clp,
>   .flags = RPC_TASK_ASYNC | flags,
>   .priority = priority,
>   };
> + int status = 0;
>  
>   if (nfs_server_capable(data->inode, NFS_CAP_MOVEABLE))
>   task_setup_data.flags |= RPC_TASK_MOVEABLE;
> @@ -1701,13 +1703,19 @@ int nfs_initiate_commit(struct nfs_client *clp,
>  
>   dprintk("NFS: initiated commit call\n");
>  
> + if (localio) {
> + nfs_local_commit(clp, localio, data, call_ops, how);
> + goto out;
> + }
> +
>   task = rpc_run_task(&task_setup_data);
>   if (IS_ERR(task))
>   return PTR_ERR(task);
>   if (how & FLUSH_SYNC)
>   rpc_wait_for_completion_task(task);
>   rpc_put_task(task);
> - return 0;
> +out:
> + return status;
>  }
>  EXPORT_SYMBOL_GPL(nfs_initiate_commit);
>  
> @@ -1819,7 +1827,7 @@ nfs_commit_list(struct inode *inode, struct list_head *head, int how,
>   return nfs_initiate_commit(NFS_SERVER(inode)->nfs_client,
>      NFS_CLIENT(inode), data, NFS_PROTO(inode),
>      data->mds_ops, how,
> -    RPC_TASK_CRED_NOREF | task_flags);
> +    RPC_TASK_CRED_NOREF | task_flags, NULL);
>  }
>  
>  /*
> diff --git a/fs/nfsd/Makefile b/fs/nfsd/Makefile
> index b8736a82e57c..702f277394f1 100644
> --- a/fs/nfsd/Makefile
> +++ b/fs/nfsd/Makefile
> @@ -13,7 +13,7 @@ nfsd-y += trace.o
>  nfsd-y  += nfssvc.o nfsctl.o nfsfh.o vfs.o \
>      export.o auth.o lockd.o nfscache.o \
>      stats.o filecache.o nfs3proc.o nfs3xdr.o \
> -    netlink.o
> +    netlink.o localio.o

Isn't there a Kconfig option this should be behind?

>  nfsd-$(CONFIG_NFSD_V2) += nfsproc.o nfsxdr.o
>  nfsd-$(CONFIG_NFSD_V2_ACL) += nfs2acl.o
>  nfsd-$(CONFIG_NFSD_V3_ACL) += nfs3acl.o
> diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
> index ad9083ca144b..99631fa56662 100644
> --- a/fs/nfsd/filecache.c
> +++ b/fs/nfsd/filecache.c
> @@ -52,7 +52,7 @@
>  #define NFSD_FILE_CACHE_UP      (0)
>  
>  /* We only care about NFSD_MAY_READ/WRITE for this cache */
> -#define NFSD_FILE_MAY_MASK (NFSD_MAY_READ|NFSD_MAY_WRITE)
> +#define NFSD_FILE_MAY_MASK (NFSD_MAY_READ|NFSD_MAY_WRITE|NFSD_MAY_LOCALIO)
>  
>  static DEFINE_PER_CPU(unsigned long, nfsd_file_cache_hits);
>  static DEFINE_PER_CPU(unsigned long, nfsd_file_acquisitions);
> diff --git a/fs/nfsd/localio.c b/fs/nfsd/localio.c
> new file mode 100644
> index 000000000000..ff68454a4017
> --- /dev/null
> +++ b/fs/nfsd/localio.c
> @@ -0,0 +1,179 @@
> +/*
> + * NFS server support for local clients to bypass network stack
> + *
> + * Copyright (C) 2014 Weston Andros Adamson <[email protected]>
> + */
> +
> +#include <linux/exportfs.h>
> +#include <linux/sunrpc/svcauth_gss.h>
> +#include <linux/sunrpc/clnt.h>
> +#include <linux/nfs.h>
> +#include <linux/string.h>
> +
> +#include "nfsd.h"
> +#include "vfs.h"
> +#include "netns.h"
> +#include "filecache.h"
> +
> +#define NFSDDBG_FACILITY NFSDDBG_FH
> +
> +static void
> +nfsd_local_fakerqst_destroy(struct svc_rqst *rqstp)
> +{
> + if (rqstp->rq_client)
> + auth_domain_put(rqstp->rq_client);
> + if (rqstp->rq_cred.cr_group_info)
> + put_group_info(rqstp->rq_cred.cr_group_info);
> + kfree(rqstp->rq_cred.cr_principal);
> + kfree(rqstp->rq_xprt);
> + kfree(rqstp);
> +}
> +
> +static struct svc_rqst *
> +nfsd_local_fakerqst_create(struct rpc_clnt *rpc_clnt, const struct cred *cred)
> +{
> + struct svc_rqst *rqstp;
> + struct net *net = rpc_net_ns(rpc_clnt);
> + struct nfsd_net *nn = net_generic(net, nfsd_net_id);
> + int status;
> +
> + if (!nn->nfsd_serv) {
> + dprintk("%s: localio denied. Server not running\n", __func__);
> + return ERR_PTR(-ENXIO);
> + }
> +

Note that the above check is racy. The nfsd_serv can go away at any
time since you're not holding the (global) nfsd_mutex (I assume?).

> + rqstp = kzalloc(sizeof(*rqstp), GFP_KERNEL);
> + if (!rqstp)
> + return ERR_PTR(-ENOMEM);
> +
> + rqstp->rq_xprt = kzalloc(sizeof(*rqstp->rq_xprt), GFP_KERNEL);
> + if (!rqstp->rq_xprt) {
> + status = -ENOMEM;
> + goto out_err;
> + }
> +
> + rqstp->rq_xprt->xpt_net = net;
> + __set_bit(RQ_SECURE, &rqstp->rq_flags);
> + rqstp->rq_proc = 1;
> + rqstp->rq_vers = 3;
> + rqstp->rq_prot = IPPROTO_TCP;
> + rqstp->rq_server = nn->nfsd_serv;
> +

I suspect you need to carry a reference of some sort so that the
nfsd_serv doesn't go away out from under you while this is running,
since this is not operating in nfsd thread context.

Typically, every nfsd thread holds a reference to the serv (in serv-
>sv_nrthreads), so that when you shut down all of the threads, it goes
away. The catch is that that refcount is currently under the protection
of the global nfsd_mutex and I doubt you want to take that in this
codepath.



> + /* Note: we're connecting to ourself, so source addr == peer addr */
> + rqstp->rq_addrlen = rpc_peeraddr(rpc_clnt,
> + (struct sockaddr *)&rqstp->rq_addr,
> + sizeof(rqstp->rq_addr));
> +
> + if (!rpcauth_map_to_svc_cred(rpc_clnt->cl_auth, cred,
> +      &rqstp->rq_cred)) {
> + dprintk("%s :map cred failed\n", __func__);
> + status = -EINVAL;
> + goto out_err;
> + }
> +
> + /*
> + * set up enough for svcauth_unix_set_client to be able to wait
> + * for the cache downcall. Note that we do _not_ want to allow the
> + * request to be deferred for later revisit since this rqst and xprt
> + * are not set up to run inside of the normal svc_rqst engine.
> + */
> + INIT_LIST_HEAD(&rqstp->rq_xprt->xpt_deferred);
> + kref_init(&rqstp->rq_xprt->xpt_ref);
> + spin_lock_init(&rqstp->rq_xprt->xpt_lock);
> + rqstp->rq_chandle.thread_wait = 5 * HZ;
> +
> + status = svcauth_unix_set_client(rqstp);
> + switch (status) {
> + case SVC_OK:
> + break;
> + case SVC_DENIED:
> + status = -ENXIO;
> + dprintk("%s: client %pISpc denied localio access\n",
> + __func__, (struct sockaddr *)&rqstp->rq_addr);
> + goto out_err;
> + default:
> + status = -ETIMEDOUT;
> + dprintk("%s: client %pISpc temporarily denied localio access\n",
> + __func__, (struct sockaddr *)&rqstp->rq_addr);
> + goto out_err;
> + }
> +
> + return rqstp;
> +
> +out_err:
> + nfsd_local_fakerqst_destroy(rqstp);
> + return ERR_PTR(status);
> +}
> +
> +/*
> + * nfsd_open_local_fh - lookup a local filehandle @nfs_fh and map to @file
> + *
> + * This function maps a local fh to a path on a local filesystem.
> + * This is useful when the nfs client has the local server mounted - it can
> + * avoid all the NFS overhead with reads, writes and commits.
> + *
> + * on successful return, caller is responsible for calling path_put. Also
> + * note that this is called from nfs.ko via find_symbol() to avoid an explicit
> + * dependency on knfsd. So, there is no forward declaration in a header file
> + * for it.
> + */
> +int nfsd_open_local_fh(struct rpc_clnt *rpc_clnt,
> + const struct cred *cred,
> + const struct nfs_fh *nfs_fh,
> + const fmode_t fmode,
> + struct file **pfilp)
> +{
> + const struct cred *save_cred;
> + struct svc_rqst *rqstp;
> + struct svc_fh fh;
> + struct nfsd_file *nf;
> + int status = 0;
> + int mayflags = NFSD_MAY_LOCALIO;
> + __be32 beres;
> +
> + /* Save creds before calling into nfsd */
> + save_cred = get_current_cred();
> +
> + rqstp = nfsd_local_fakerqst_create(rpc_clnt, cred);
> + if (IS_ERR(rqstp)) {
> + status = PTR_ERR(rqstp);
> + goto out_revertcred;
> + }
> +
> + /* nfs_fh -> svc_fh */
> + if (nfs_fh->size > NFS4_FHSIZE) {
> + status = -EINVAL;
> + goto out;
> + }
> + fh_init(&fh, NFS4_FHSIZE);
> + fh.fh_handle.fh_size = nfs_fh->size;
> + memcpy(fh.fh_handle.fh_raw, nfs_fh->data, nfs_fh->size);
> +
> + if (fmode & FMODE_READ)
> + mayflags |= NFSD_MAY_READ;
> + if (fmode & FMODE_WRITE)
> + mayflags |= NFSD_MAY_WRITE;
> +
> + beres = nfsd_file_acquire(rqstp, &fh, mayflags, &nf);
> + if (beres) {
> + status = nfs_stat_to_errno(be32_to_cpu(beres));
> + dprintk("%s: fh_verify failed %d\n", __func__, status);
> + goto out_fh_put;
> + }
> +
> + *pfilp = get_file(nf->nf_file);
> +
> + nfsd_file_put(nf);
> +out_fh_put:
> + fh_put(&fh);
> +
> +out:
> + nfsd_local_fakerqst_destroy(rqstp);
> +out_revertcred:
> + revert_creds(save_cred);
> + return status;
> +}
> +EXPORT_SYMBOL_GPL(nfsd_open_local_fh);
> +
> +/* Compile time type checking, not used by anything */
> +static nfs_to_nfsd_open_t __maybe_unused nfsd_open_local_fh_typecheck = nfsd_open_local_fh;
> diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h
> index 77bbd23aa150..9c0610fdd11c 100644
> --- a/fs/nfsd/trace.h
> +++ b/fs/nfsd/trace.h
> @@ -86,7 +86,8 @@ DEFINE_NFSD_XDR_ERR_EVENT(cant_encode);
>   { NFSD_MAY_NOT_BREAK_LEASE, "NOT_BREAK_LEASE" }, \
>   { NFSD_MAY_BYPASS_GSS, "BYPASS_GSS" }, \
>   { NFSD_MAY_READ_IF_EXEC, "READ_IF_EXEC" }, \
> - { NFSD_MAY_64BIT_COOKIE, "64BIT_COOKIE" })
> + { NFSD_MAY_64BIT_COOKIE, "64BIT_COOKIE" }, \
> + { NFSD_MAY_LOCALIO, "LOCALIO" })
>  
>  TRACE_EVENT(nfsd_compound,
>   TP_PROTO(
> diff --git a/fs/nfsd/vfs.h b/fs/nfsd/vfs.h
> index 57cd70062048..91c50649a8c7 100644
> --- a/fs/nfsd/vfs.h
> +++ b/fs/nfsd/vfs.h
> @@ -36,6 +36,8 @@
>  #define NFSD_MAY_CREATE (NFSD_MAY_EXEC|NFSD_MAY_WRITE)
>  #define NFSD_MAY_REMOVE (NFSD_MAY_EXEC|NFSD_MAY_WRITE|NFSD_MAY_TRUNC)
>  
> +#define NFSD_MAY_LOCALIO 0x800000
> +
>  struct nfsd_file;
>  
>  /*
> @@ -158,6 +160,12 @@ __be32 nfsd_permission(struct svc_rqst *, struct svc_export *,
>  
>  void nfsd_filp_close(struct file *fp);
>  
> +int nfsd_open_local_fh(struct rpc_clnt *rpc_clnt,
> +    const struct cred *cred,
> +    const struct nfs_fh *nfs_fh,
> +    const fmode_t fmode,
> +    struct file **pfilp);
> +
>  static inline int fh_want_write(struct svc_fh *fh)
>  {
>   int ret;
> diff --git a/include/linux/nfs.h b/include/linux/nfs.h
> index b94f51d17bc5..80843764fad3 100644
> --- a/include/linux/nfs.h
> +++ b/include/linux/nfs.h
> @@ -8,6 +8,8 @@
>  #ifndef _LINUX_NFS_H
>  #define _LINUX_NFS_H
>  
> +#include <linux/cred.h>
> +#include <linux/sunrpc/auth.h>
>  #include <linux/sunrpc/msg_prot.h>
>  #include <linux/string.h>
>  #include <linux/errno.h>
> @@ -109,6 +111,10 @@ static inline int nfs_stat_to_errno(enum nfs_stat status)
>   return nfs_common_errtbl[i].errno;
>  }
>  
> +typedef int (*nfs_to_nfsd_open_t)(struct rpc_clnt *, const struct cred *,
> +   const struct nfs_fh *, const fmode_t,
> +   struct file **);
> +
>  #ifdef CONFIG_CRC32
>  /**
>   * nfs_fhandle_hash - calculate the crc32 hash for the filehandle
> diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h
> index 039898d70954..a0bb947fdd1d 100644
> --- a/include/linux/nfs_fs.h
> +++ b/include/linux/nfs_fs.h
> @@ -96,6 +96,8 @@ struct nfs_open_context {
>   struct list_head list;
>   struct nfs4_threshold *mdsthreshold;
>   struct rcu_head rcu_head;
> +
> + struct file *local_filp;
>  };
>  
>  struct nfs_open_dir_context {
> diff --git a/include/linux/nfs_fs_sb.h b/include/linux/nfs_fs_sb.h
> index 82a6f66fe1d0..6b603b0247f1 100644
> --- a/include/linux/nfs_fs_sb.h
> +++ b/include/linux/nfs_fs_sb.h
> @@ -49,12 +49,14 @@ struct nfs_client {
>  #define NFS_CS_DS 7 /* - Server is a DS */
>  #define NFS_CS_REUSEPORT 8 /* - reuse src port on reconnect */
>  #define NFS_CS_PNFS 9 /* - Server used for pnfs */
> +#define NFS_CS_LOCAL_IO 10 /* - client is local */
>   struct sockaddr_storage cl_addr; /* server identifier */
>   size_t cl_addrlen;
>   char * cl_hostname; /* hostname of server */
>   char * cl_acceptor; /* GSSAPI acceptor name */
>   struct list_head cl_share_link; /* link in global client list */
>   struct list_head cl_superblocks; /* List of nfs_server structs */
> + struct list_head cl_local_addrs; /* List of local addresses */
>  

Is the above needed? I thought you weren't tracking addresses now and
were using the new RPC protocol to determine locality?

OIC, this goes away in patch #20...



>   struct rpc_clnt * cl_rpcclient;
>   const struct nfs_rpc_ops *rpc_ops; /* NFS protocol vector */
> diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h
> index d09b9773b20c..764513a61601 100644
> --- a/include/linux/nfs_xdr.h
> +++ b/include/linux/nfs_xdr.h
> @@ -1605,6 +1605,7 @@ enum {
>   NFS_IOHDR_RESEND_PNFS,
>   NFS_IOHDR_RESEND_MDS,
>   NFS_IOHDR_UNSTABLE_WRITES,
> + NFS_IOHDR_ODIRECT,
>  };
>  
>  struct nfs_io_completion;

--
Jeff Layton <[email protected]>

2024-06-10 12:47:51

by Jeff Layton

[permalink] [raw]
Subject: Re: [for-6.11 PATCH 00/29] nfs/nfsd: add support for localio bypass

On Fri, 2024-06-07 at 10:26 -0400, Mike Snitzer wrote:
> Hi,
>
> This patch series rebases "localio" changes that Hammerspace (and
> Primary Data before it) has been carrying since 2014. The reason they
> weren't proposed for upstream inclusion until now was the handshake
> for whether or not a client and server are local was brittle. Please
> see the commit header of "nfs/localio: discontinue network address
> based localio setup" (patch 20) for more context.
>
> Aside from rebasing the original changes (patches 1 - 18) from a
> 5.15.-130-stable kernel, my contribution to this series was to make
> the localio handshake more robust. To do so a new LOCALIO protocol
> extension has been added to both NFS v3 and v4. It follows the
> well-worn pattern established by the ACL protocol extension.
>
> These changes have proven stable against various test scenarios:
> 1) client and server both on localhost (for both v3 and v4.2)
> 2) various permutations of client and server support enablement for
>    both local and remote client and server.
> 3) client on host, server within a container (for both v3 and v4.2)
>
> I've preserved all established author and Signed-off-by attribution
> despite Andy, Peng and Jeff no longer working for Primary Data (or
> Hammerspace). I've confirmed with Trond that its best to keep it all
> despite those email addresses no longer being active. My Signed-off-
> by
> and that of reviewers and maintainer(s) to follow will build on the
> established development provenance.
>
> I also made sure to preserve the original work done by others (rather
> than fold changes that I add to this work, to avoid tainting the long
> established development and sequence of changes).
>

Honestly, I don't give a fig about the historical changes here. I'd
_much_ rather see a more logical folded patchset that avoids a lot of
the "churn". Given the long timescale of this series, the history is
just not terribly useful.

For instance, you're adding in the old network address tracking in the
earlier patches and then remove that in patch #20, which just means I
have to review a bunch of stuff that is ultimately going away. I'll
still review the set you've posted, but I think folding down the
changes would be best.

> My container testing was done in terms of podman managed containers.
> I'd appreciate additional review relative to network namespaces.
> fs/nfsd/localio.c:nfsd_local_fakerqst_create() in particular is
> simply
> using the client's network namespace with rpc_net_ns(rpc_clnt). I
> have
> an extra patch that updates nfsd_open_local_fh()'s first argument to
> be the server's 'struct net' -- but I stopped short of formally
> including that change in this series because it hasn't proven needed
> (but more exotic hypothetical scenarios could easily expose the need
> for it). I can append it to the series as an "RFC PATCH 30/29" as
> needed.
>
> All review and comments are welcome!
>
> Thanks,
> Mike
>
> Mike Snitzer (11):
>   nfs/write: fix nfs_initiate_commit to return error from
> nfs_local_commit
>   nfs/localio: discontinue network address based localio setup
>   nfs_common: add NFS v3 LOCALIO protocol extension enablement
>   nfs: implement v3 client support for NFS_LOCALIO_PROGRAM
>   nfsd: implement v3 server support for NFS_LOCALIO_PROGRAM
>   nfs_common: add NFS v4 LOCALIO protocol extension enablement
>   nfs: implement v4 client support for NFS_LOCALIO_PROGRAM
>   nfsd: implement v4 server support for NFS_LOCALIO_PROGRAM
>   nfs/nfsd: switch GETUUID to using {encode,decode}_opaque_fixed
>   nfs/nfsd: consolidate {encode,decode}_opaque_fixed in nfs_xdr.h
>   nfs/localio: move managing nfsd_open_local_fh symbol to nfs_common
>
> Peng Tao (3):
>   sunrpc: add and export rpc_ntop6_addr_noscopeid
>   nfs: move nfs_stat_to_errno to nfs.h
>   nfs/flexfiles: check local DS when making DS connections
>
> Trond Myklebust (8):
>   NFS: Manage boot verifier correctly in the case of localio
>   NFS: Enable localio for non-pNFS I/O
>   pnfs/flexfiles: Enable localio for flexfiles I/O
>   NFS: Add tracepoints for nfs_local_enable and nfs_local_disable
>   NFS: Don't call filesystem write() routines directly
>   NFS: Don't call filesystem read() routines directly
>   NFS: Use completion rather than flush_work() in nfs_local_commit()
>   NFS: localio writes need to use a normal workqueue
>
> Weston Andros Adamson (7):
>   nfs: pass nfs_client to nfs_initiate_pgio
>   nfs: pass nfs_client to nfs_initiate_commit
>   nfs: pass descriptor thru nfs_initiate_pgio path
>   sunrpc: handle NULL req->defer in cache_defer_req
>   sunrpc: export svc_defer
>   sunrpc: add rpcauth_map_to_svc_cred
>   nfs/nfsd: add "local io" support
>
>  fs/Kconfig                                |   3 +
>  fs/nfs/Kconfig                            |  25 +
>  fs/nfs/Makefile                           |   2 +
>  fs/nfs/blocklayout/blocklayout.c          |   6 +-
>  fs/nfs/client.c                           |  15 +-
>  fs/nfs/filelayout/filelayout.c            |  19 +-
>  fs/nfs/flexfilelayout/flexfilelayout.c    | 129 +++-
>  fs/nfs/flexfilelayout/flexfilelayout.h    |   2 +
>  fs/nfs/flexfilelayout/flexfilelayoutdev.c |   6 +
>  fs/nfs/inode.c                            |  28 +-
>  fs/nfs/internal.h                         | 101 ++-
>  fs/nfs/localio.c                          | 814
> ++++++++++++++++++++++
>  fs/nfs/nfs2xdr.c                          |  69 --
>  fs/nfs/nfs3_fs.h                          |   1 +
>  fs/nfs/nfs3client.c                       |  25 +
>  fs/nfs/nfs3proc.c                         |   3 +
>  fs/nfs/nfs3xdr.c                          |  58 ++
>  fs/nfs/nfs4_fs.h                          |   2 +
>  fs/nfs/nfs4client.c                       |  23 +
>  fs/nfs/nfs4proc.c                         |   3 +
>  fs/nfs/nfs4xdr.c                          |  65 +-
>  fs/nfs/nfstrace.h                         |  61 ++
>  fs/nfs/pagelist.c                         |  35 +-
>  fs/nfs/pnfs.c                             |  24 +-
>  fs/nfs/pnfs.h                             |   6 +-
>  fs/nfs/pnfs_nfs.c                         |   5 +-
>  fs/nfs/write.c                            |  28 +-
>  fs/nfs_common/Makefile                    |   3 +
>  fs/nfs_common/nfslocalio.c                |  68 ++
>  fs/nfsd/Kconfig                           |  25 +
>  fs/nfsd/Makefile                          |   2 +
>  fs/nfsd/filecache.c                       |   2 +-
>  fs/nfsd/localio.c                         | 324 +++++++++
>  fs/nfsd/netns.h                           |   4 +
>  fs/nfsd/nfsd.h                            |  11 +
>  fs/nfsd/nfssvc.c                          |  91 ++-
>  fs/nfsd/trace.h                           |   3 +-
>  fs/nfsd/vfs.h                             |   8 +
>  fs/nfsd/xdr.h                             |   6 +
>  include/linux/nfs.h                       |  65 ++
>  include/linux/nfs_fs.h                    |   2 +
>  include/linux/nfs_fs_sb.h                 |   8 +
>  include/linux/nfs_xdr.h                   |  31 +-
>  include/linux/nfslocalio.h                |  37 +
>  include/linux/sunrpc/auth.h               |   4 +
>  include/linux/sunrpc/svc_xprt.h           |   1 +
>  include/uapi/linux/nfs.h                  |   4 +
>  net/sunrpc/auth.c                         |  16 +
>  net/sunrpc/cache.c                        |   2 +
>  net/sunrpc/svc_xprt.c                     |   4 +-
>  50 files changed, 2120 insertions(+), 159 deletions(-)
>  create mode 100644 fs/nfs/localio.c
>  create mode 100644 fs/nfs_common/nfslocalio.c
>  create mode 100644 fs/nfsd/localio.c
>  create mode 100644 include/linux/nfslocalio.h
>

--
Jeff Layton <[email protected]>

2024-06-10 16:33:26

by Mike Snitzer

[permalink] [raw]
Subject: Re: [for-6.11 PATCH 07/29] sunrpc: add and export rpc_ntop6_addr_noscopeid

On Sun, Jun 09, 2024 at 08:36:40AM -0400, Jeff Layton wrote:
> On Fri, 2024-06-07 at 10:26 -0400, Mike Snitzer wrote:
> > From: Peng Tao <[email protected]>
> >
>
> Still looking over the set, but this could use some justification.

OK, note that it gets reverted by patch 20. It was introduced for the
benefit of sockadd based matching of "local" network interfaces.

2024-06-10 16:51:20

by Mike Snitzer

[permalink] [raw]
Subject: Re: [for-6.11 PATCH 10/29] nfs/nfsd: add "local io" support

On Mon, Jun 10, 2024 at 08:43:34AM -0400, Jeff Layton wrote:
> On Fri, 2024-06-07 at 10:26 -0400, Mike Snitzer wrote:
> > From: Weston Andros Adamson <[email protected]>
> >
> > Add client support for bypassing NFS for localhost reads, writes, and commits.
> >
> > This is only useful when the client and the server are running on the same
> > host and in the same container.
> >
> > This has dynamic binding with the nfsd module. Local i/o will only work if
> > nfsd is already loaded.
> >
> > [snitm: rebase accounted for commit d8b26071e65e8 ("NFSD: simplify struct nfsfh")
> > ?and commit 7c98f7cb8fda ("remove call_{read,write}_iter() functions")]
> >
> > Signed-off-by: Weston Andros Adamson <[email protected]>
> > Signed-off-by: Jeff Layton <[email protected]>
> > Signed-off-by: Peng Tao <[email protected]>
> > Signed-off-by: Lance Shelton <[email protected]>
> > Signed-off-by: Trond Myklebust <[email protected]>
> > Signed-off-by: Mike Snitzer <[email protected]>
> > ---
> > ?fs/nfs/Makefile??????????????????????? |?? 2 +-
> > ?fs/nfs/client.c??????????????????????? |? 12 +
> > ?fs/nfs/filelayout/filelayout.c???????? |?? 6 +-
> > ?fs/nfs/flexfilelayout/flexfilelayout.c |?? 6 +-
> > ?fs/nfs/inode.c???????????????????????? |?? 5 +
> > ?fs/nfs/internal.h????????????????????? |? 32 +-
> > ?fs/nfs/localio.c?????????????????????? | 933 +++++++++++++++++++++++++
> > ?fs/nfs/nfstrace.h????????????????????? |? 29 +
> > ?fs/nfs/pagelist.c????????????????????? |? 12 +-
> > ?fs/nfs/pnfs_nfs.c????????????????????? |?? 2 +-
> > ?fs/nfs/write.c???????????????????????? |? 14 +-
> > ?fs/nfsd/Makefile?????????????????????? |?? 2 +-
> > ?fs/nfsd/filecache.c??????????????????? |?? 2 +-
> > ?fs/nfsd/localio.c????????????????????? | 179 +++++
> > ?fs/nfsd/trace.h??????????????????????? |?? 3 +-
> > ?fs/nfsd/vfs.h????????????????????????? |?? 8 +
> > ?include/linux/nfs.h??????????????????? |?? 6 +
> > ?include/linux/nfs_fs.h???????????????? |?? 2 +
> > ?include/linux/nfs_fs_sb.h????????????? |?? 2 +
> > ?include/linux/nfs_xdr.h??????????????? |?? 1 +
> > ?20 files changed, 1240 insertions(+), 18 deletions(-)
> > ?create mode 100644 fs/nfs/localio.c
> > ?create mode 100644 fs/nfsd/localio.c
> >
> > diff --git a/fs/nfs/Makefile b/fs/nfs/Makefile
> > index 5f6db37f461e..af64cf5ea420 100644
> > --- a/fs/nfs/Makefile
> > +++ b/fs/nfs/Makefile
> > @@ -9,7 +9,7 @@ CFLAGS_nfstrace.o += -I$(src)
> > ?nfs-y? := client.o dir.o file.o getroot.o inode.o super.o \
> > ? ?? io.o direct.o pagelist.o read.o symlink.o unlink.o \
> > ? ?? write.o namespace.o mount_clnt.o nfstrace.o \
> > - ?? export.o sysfs.o fs_context.o
> > + ?? export.o sysfs.o fs_context.o localio.o
> > ?nfs-$(CONFIG_ROOT_NFS) += nfsroot.o
> > ?nfs-$(CONFIG_SYSCTL) += sysctl.o
> > ?nfs-$(CONFIG_NFS_FSCACHE) += fscache.o
> > diff --git a/fs/nfs/client.c b/fs/nfs/client.c
> > index dd3278dcfca8..288de750fd3b 100644
> > --- a/fs/nfs/client.c
> > +++ b/fs/nfs/client.c
> > @@ -170,6 +170,7 @@ struct nfs_client *nfs_alloc_client(const struct nfs_client_initdata *cl_init)
> > ? }
> > ?
> > ? INIT_LIST_HEAD(&clp->cl_superblocks);
> > + INIT_LIST_HEAD(&clp->cl_local_addrs);
> > ? clp->cl_rpcclient = ERR_PTR(-EINVAL);
> > ?
> > ? clp->cl_flags = cl_init->init_flags;
> > @@ -183,6 +184,7 @@ struct nfs_client *nfs_alloc_client(const struct nfs_client_initdata *cl_init)
> > ?
> > ? clp->cl_principal = "*";
> > ? clp->cl_xprtsec = cl_init->xprtsec;
> > + nfs_probe_local_addr(clp);
> > ? return clp;
> > ?
> > ?error_cleanup:
> > @@ -236,10 +238,19 @@ static void pnfs_init_server(struct nfs_server *server)
> > ? */
> > ?void nfs_free_client(struct nfs_client *clp)
> > ?{
> > + struct nfs_local_addr *addr, *tmp;
> > +
> > + nfs_local_disable(clp);
> > +
> > ? /* -EIO all pending I/O */
> > ? if (!IS_ERR(clp->cl_rpcclient))
> > ? rpc_shutdown_client(clp->cl_rpcclient);
> > ?
> > + list_for_each_entry_safe(addr, tmp, &clp->cl_local_addrs, cl_addrs) {
> > + list_del(&addr->cl_addrs);
> > + kfree(addr);
> > + }
> > +
> > ? put_net(clp->cl_net);
> > ? put_nfs_version(clp->cl_nfs_mod);
> > ? kfree(clp->cl_hostname);
> > @@ -427,6 +438,7 @@ struct nfs_client *nfs_get_client(const struct nfs_client_initdata *cl_init)
> > ? list_add_tail(&new->cl_share_link,
> > ? &nn->nfs_client_list);
> > ? spin_unlock(&nn->nfs_client_lock);
> > + nfs_local_probe(new);
> > ? return rpc_ops->init_client(new, cl_init);
> > ? }
> > ?
> > diff --git a/fs/nfs/filelayout/filelayout.c b/fs/nfs/filelayout/filelayout.c
> > index d66f2efbd92f..bd8c717c31d2 100644
> > --- a/fs/nfs/filelayout/filelayout.c
> > +++ b/fs/nfs/filelayout/filelayout.c
> > @@ -489,7 +489,7 @@ filelayout_read_pagelist(struct nfs_pageio_descriptor *desc,
> > ? /* Perform an asynchronous read to ds */
> > ? nfs_initiate_pgio(desc, ds->ds_clp, ds_clnt, hdr, hdr->cred,
> > ? ? NFS_PROTO(hdr->inode), &filelayout_read_call_ops,
> > - ? 0, RPC_TASK_SOFTCONN);
> > + ? 0, RPC_TASK_SOFTCONN, NULL);
> > ? return PNFS_ATTEMPTED;
> > ?}
> > ?
> > @@ -532,7 +532,7 @@ filelayout_write_pagelist(struct nfs_pageio_descriptor *desc,
> > ? /* Perform an asynchronous write */
> > ? nfs_initiate_pgio(desc, ds->ds_clp, ds_clnt, hdr, hdr->cred,
> > ? ? NFS_PROTO(hdr->inode), &filelayout_write_call_ops,
> > - ? sync, RPC_TASK_SOFTCONN);
> > + ? sync, RPC_TASK_SOFTCONN, NULL);
> > ? return PNFS_ATTEMPTED;
> > ?}
> > ?
> > @@ -1014,7 +1014,7 @@ static int filelayout_initiate_commit(struct nfs_commit_data *data, int how)
> > ? return nfs_initiate_commit(ds->ds_clp, ds_clnt, data,
> > ? ?? NFS_PROTO(data->inode),
> > ? ?? &filelayout_commit_call_ops, how,
> > - ?? RPC_TASK_SOFTCONN);
> > + ?? RPC_TASK_SOFTCONN, NULL);
> > ?out_err:
> > ? pnfs_generic_prepare_to_resend_writes(data);
> > ? pnfs_generic_commit_release(data);
> > diff --git a/fs/nfs/flexfilelayout/flexfilelayout.c b/fs/nfs/flexfilelayout/flexfilelayout.c
> > index d7e9e5ef4085..ce6cb5d82427 100644
> > --- a/fs/nfs/flexfilelayout/flexfilelayout.c
> > +++ b/fs/nfs/flexfilelayout/flexfilelayout.c
> > @@ -1808,7 +1808,7 @@ ff_layout_read_pagelist(struct nfs_pageio_descriptor *desc,
> > ? ? ds->ds_clp->rpc_ops,
> > ? ? vers == 3 ? &ff_layout_read_call_ops_v3 :
> > ? ????? &ff_layout_read_call_ops_v4,
> > - ? 0, RPC_TASK_SOFTCONN);
> > + ? 0, RPC_TASK_SOFTCONN, NULL);
> > ? put_cred(ds_cred);
> > ? return PNFS_ATTEMPTED;
> > ?
> > @@ -1878,7 +1878,7 @@ ff_layout_write_pagelist(struct nfs_pageio_descriptor *desc,
> > ? ? ds->ds_clp->rpc_ops,
> > ? ? vers == 3 ? &ff_layout_write_call_ops_v3 :
> > ? ????? &ff_layout_write_call_ops_v4,
> > - ? sync, RPC_TASK_SOFTCONN);
> > + ? sync, RPC_TASK_SOFTCONN, NULL);
> > ? put_cred(ds_cred);
> > ? return PNFS_ATTEMPTED;
> > ?
> > @@ -1953,7 +1953,7 @@ static int ff_layout_initiate_commit(struct nfs_commit_data *data, int how)
> > ? ret = nfs_initiate_commit(ds->ds_clp, ds_clnt, data, ds->ds_clp->rpc_ops,
> > ? ?? vers == 3 ? &ff_layout_commit_call_ops_v3 :
> > ? ?????? &ff_layout_commit_call_ops_v4,
> > - ?? how, RPC_TASK_SOFTCONN);
> > + ?? how, RPC_TASK_SOFTCONN, NULL);
> > ? put_cred(ds_cred);
> > ? return ret;
> > ?out_err:
> > diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
> > index acef52ecb1bb..4f88b860494f 100644
> > --- a/fs/nfs/inode.c
> > +++ b/fs/nfs/inode.c
> > @@ -39,6 +39,7 @@
> > ?#include <linux/slab.h>
> > ?#include <linux/compat.h>
> > ?#include <linux/freezer.h>
> > +#include <linux/file.h>
> > ?#include <linux/uaccess.h>
> > ?#include <linux/iversion.h>
> > ?
> > @@ -1053,6 +1054,7 @@ struct nfs_open_context *alloc_nfs_open_context(struct dentry *dentry,
> > ? ctx->lock_context.open_context = ctx;
> > ? INIT_LIST_HEAD(&ctx->list);
> > ? ctx->mdsthreshold = NULL;
> > + ctx->local_filp = NULL;
> > ? return ctx;
> > ?}
> > ?EXPORT_SYMBOL_GPL(alloc_nfs_open_context);
> > @@ -1084,6 +1086,8 @@ static void __put_nfs_open_context(struct nfs_open_context *ctx, int is_sync)
> > ? nfs_sb_deactive(sb);
> > ? put_rpccred(rcu_dereference_protected(ctx->ll_cred, 1));
> > ? kfree(ctx->mdsthreshold);
> > + if (!IS_ERR_OR_NULL(ctx->local_filp))
> > + fput(ctx->local_filp);
> > ? kfree_rcu(ctx, rcu_head);
> > ?}
> > ?
> > @@ -2495,6 +2499,7 @@ static int __init init_nfs_fs(void)
> > ? if (err)
> > ? goto out1;
> > ?
> > + nfs_local_init();
> > ? err = register_nfs_fs();
> > ? if (err)
> > ? goto out0;
> > diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
> > index 873c2339b78a..67b348447a40 100644
> > --- a/fs/nfs/internal.h
> > +++ b/fs/nfs/internal.h
> > @@ -204,6 +204,12 @@ struct nfs_mount_request {
> > ? struct net *net;
> > ?};
> > ?
> > +struct nfs_local_addr {
> > + struct list_head cl_addrs;
> > + struct sockaddr_storage address;
> > + size_t addrlen;
> > +};
> > +
> > ?extern int nfs_mount(struct nfs_mount_request *info, int timeo, int retrans);
> > ?extern void nfs_umount(const struct nfs_mount_request *info);
> > ?
> > @@ -309,7 +315,8 @@ int nfs_generic_pgio(struct nfs_pageio_descriptor *, struct nfs_pgio_header *);
> > ?int nfs_initiate_pgio(struct nfs_pageio_descriptor *, struct nfs_client *clp,
> > ? ????? struct rpc_clnt *rpc_clnt, struct nfs_pgio_header *hdr,
> > ? ????? const struct cred *cred, const struct nfs_rpc_ops *rpc_ops,
> > - ????? const struct rpc_call_ops *call_ops, int how, int flags);
> > + ????? const struct rpc_call_ops *call_ops, int how, int flags,
> > + ????? struct file *localio);
> > ?void nfs_free_request(struct nfs_page *req);
> > ?struct nfs_pgio_mirror *
> > ?nfs_pgio_current_mirror(struct nfs_pageio_descriptor *desc);
> > @@ -450,6 +457,26 @@ extern void nfs_set_cache_invalid(struct inode *inode, unsigned long flags);
> > ?extern bool nfs_check_cache_invalid(struct inode *, unsigned long);
> > ?extern int nfs_wait_bit_killable(struct wait_bit_key *key, int mode);
> > ?
> > +/* localio.c */
> > +extern void nfs_local_init(void);
> > +extern void nfs_local_enable(struct nfs_client *);
> > +extern void nfs_local_disable(struct nfs_client *);
> > +extern void nfs_local_probe(struct nfs_client *);
> > +extern struct file *nfs_local_open_fh(struct nfs_client *, const struct cred *,
> > + ????? struct nfs_fh *, const fmode_t);
> > +extern struct file *nfs_local_file_open(struct nfs_client *clp,
> > + const struct cred *cred,
> > + struct nfs_fh *fh,
> > + struct nfs_open_context *ctx);
> > +extern int nfs_local_doio(struct nfs_client *, struct file *,
> > + ? struct nfs_pgio_header *,
> > + ? const struct rpc_call_ops *);
> > +extern int nfs_local_commit(struct nfs_client *, struct file *,
> > + ??? struct nfs_commit_data *,
> > + ??? const struct rpc_call_ops *, int);
> > +extern void nfs_probe_local_addr(struct nfs_client *clnt);
> > +extern bool nfs_server_is_local(const struct nfs_client *clp);
> > +
> > ?/* super.c */
> > ?extern const struct super_operations nfs_sops;
> > ?bool nfs_auth_info_match(const struct nfs_auth_info *, rpc_authflavor_t);
> > @@ -530,7 +557,8 @@ extern int nfs_initiate_commit(struct nfs_client *clp,
> > ? ?????? struct nfs_commit_data *data,
> > ? ?????? const struct nfs_rpc_ops *nfs_ops,
> > ? ?????? const struct rpc_call_ops *call_ops,
> > - ?????? int how, int flags);
> > + ?????? int how, int flags,
> > + ?????? struct file *localio);
> > ?extern void nfs_init_commit(struct nfs_commit_data *data,
> > ? ??? struct list_head *head,
> > ? ??? struct pnfs_layout_segment *lseg,
> > diff --git a/fs/nfs/localio.c b/fs/nfs/localio.c
> > new file mode 100644
> > index 000000000000..5c69eb0fe7b6
> > --- /dev/null
> > +++ b/fs/nfs/localio.c
> > @@ -0,0 +1,933 @@
> > +/*
> > + *? linux/fs/nfs/localio.c
> > + *
> > + *? Copyright (C) 2014? Weston Andros Adamson <[email protected]>
> > + *
> > + */
> > +
> > +#include <linux/module.h>
> > +#include <linux/errno.h>
> > +#include <linux/vfs.h>
> > +#include <linux/file.h>
> > +#include <linux/inet.h>
> > +#include <linux/sunrpc/addr.h>
> > +#include <linux/inetdevice.h>
> > +#include <net/addrconf.h>
> > +#include <linux/module.h>
> > +#include <linux/bvec.h>
> > +
> > +#include <linux/nfs.h>
> > +#include <linux/nfs_fs.h>
> > +#include <linux/nfs_xdr.h>
> > +
> > +#include <uapi/linux/if_arp.h>
> > +
> > +#include "internal.h"
> > +#include "pnfs.h"
> > +#include "nfstrace.h"
> > +
> > +#define NFSDBG_FACILITY NFSDBG_VFS
> > +
> > +extern int nfsd_open_local_fh(struct rpc_clnt *rpc_clnt,
> > + ????? const struct cred *cred,
> > + ????? const struct nfs_fh *nfs_fh, const fmode_t fmode,
> > + ????? struct file **pfilp);
> > +/*
> > + * The localio code needs to call into nfsd to do the filehandle -> struct path
> > + * mapping, but cannot be statically linked, because that will make the nfs
> > + * module depend on the nfsd module.
> > + *
> > + * Instead, do dynamic linking to the nfsd module. This way the nfs module
> > + * will only hold a reference on nfsd when it's actually in use. This also
> > + * allows some sanity checking, like giving up on localio if nfsd isn't loaded.
> > + */
> > +
> > +struct nfs_local_open_ctx {
> > + spinlock_t lock;
> > + nfs_to_nfsd_open_t open_f;
> > + atomic_t refcount;
> > +};
> > +
> > +struct nfs_local_kiocb {
> > + struct kiocb kiocb;
> > + struct bio_vec *bvec;
> > + struct nfs_pgio_header *hdr;
> > + struct work_struct work;
> > +};
> > +
> > +struct nfs_local_fsync_ctx {
> > + struct file *filp;
> > + struct nfs_commit_data *data;
> > + struct work_struct work;
> > + struct kref kref;
> > +};
> > +static void nfs_local_fsync_work(struct work_struct *work);
> > +
> > +/*
> > + * We need to translate between nfs status return values and
> > + * the local errno values which may not be the same.
> > + */
> > +static struct {
> > + __u32 stat;
> > + int errno;
> > +} nfs_errtbl[] = {
> > + { NFS4_OK, 0 },
> > + { NFS4ERR_PERM, -EPERM },
> > + { NFS4ERR_NOENT, -ENOENT },
> > + { NFS4ERR_IO, -EIO },
> > + { NFS4ERR_NXIO, -ENXIO },
> > + { NFS4ERR_FBIG, -E2BIG },
> > + { NFS4ERR_STALE, -EBADF },
> > + { NFS4ERR_ACCESS, -EACCES },
> > + { NFS4ERR_EXIST, -EEXIST },
> > + { NFS4ERR_XDEV, -EXDEV },
> > + { NFS4ERR_MLINK, -EMLINK },
> > + { NFS4ERR_NOTDIR, -ENOTDIR },
> > + { NFS4ERR_ISDIR, -EISDIR },
> > + { NFS4ERR_INVAL, -EINVAL },
> > + { NFS4ERR_FBIG, -EFBIG },
> > + { NFS4ERR_NOSPC, -ENOSPC },
> > + { NFS4ERR_ROFS, -EROFS },
> > + { NFS4ERR_NAMETOOLONG, -ENAMETOOLONG },
> > + { NFS4ERR_NOTEMPTY, -ENOTEMPTY },
> > + { NFS4ERR_DQUOT, -EDQUOT },
> > + { NFS4ERR_STALE, -ESTALE },
> > + { NFS4ERR_STALE, -EOPENSTALE },
> > + { NFS4ERR_DELAY, -ETIMEDOUT },
> > + { NFS4ERR_DELAY, -ERESTARTSYS },
> > + { NFS4ERR_DELAY, -EAGAIN },
> > + { NFS4ERR_DELAY, -ENOMEM },
> > + { NFS4ERR_IO, -ETXTBSY },
> > + { NFS4ERR_IO, -EBUSY },
> > + { NFS4ERR_BADHANDLE, -EBADHANDLE },
> > + { NFS4ERR_BAD_COOKIE, -EBADCOOKIE },
> > + { NFS4ERR_NOTSUPP, -EOPNOTSUPP },
> > + { NFS4ERR_TOOSMALL, -ETOOSMALL },
> > + { NFS4ERR_SERVERFAULT, -ESERVERFAULT },
> > + { NFS4ERR_SERVERFAULT, -ENFILE },
> > + { NFS4ERR_IO, -EREMOTEIO },
> > + { NFS4ERR_IO, -EUCLEAN },
> > + { NFS4ERR_PERM, -ENOKEY },
> > + { NFS4ERR_BADTYPE, -EBADTYPE },
> > + { NFS4ERR_SYMLINK, -ELOOP },
> > + { NFS4ERR_DEADLOCK, -EDEADLK },
> > +};
> > +
> > +/*
> > + * Convert an NFS error code to a local one.
> > + * This one is used jointly by NFSv2 and NFSv3.
> > + */
> > +static __u32
> > +nfs4errno(int errno)
> > +{
> > + unsigned int i;
> > + for (i = 0; i < ARRAY_SIZE(nfs_errtbl); i++) {
> > + if (nfs_errtbl[i].errno == errno)
> > + return nfs_errtbl[i].stat;
> > + }
> > + /* If we cannot translate the error, the recovery routines should
> > + * handle it.
> > + * Note: remaining NFSv4 error codes have values > 10000, so should
> > + * not conflict with native Linux error codes.
> > + */
> > + return NFS4ERR_SERVERFAULT;
> > +}
> > +
> > +static struct nfs_local_open_ctx __local_open_ctx __read_mostly;
> > +
> > +static bool localio_enabled __read_mostly = true;
> > +module_param(localio_enabled, bool, 0644);
> > +
> > +bool nfs_server_is_local(const struct nfs_client *clp)
> > +{
> > + return test_bit(NFS_CS_LOCAL_IO, &clp->cl_flags) != 0 &&
> > + localio_enabled;
> > +}
> > +EXPORT_SYMBOL_GPL(nfs_server_is_local);
> > +
> > +void
> > +nfs_local_init(void)
> > +{
> > + struct nfs_local_open_ctx *ctx = &__local_open_ctx;
> > +
> > + ctx->open_f = NULL;
> > + spin_lock_init(&ctx->lock);
> > + atomic_set(&ctx->refcount, 0);
> > +}
> > +
> > +static bool
> > +nfs_local_get_lookup_ctx(void)
> > +{
> > + struct nfs_local_open_ctx *ctx = &__local_open_ctx;
> > + nfs_to_nfsd_open_t fn = NULL;
> > +
> > + spin_lock(&ctx->lock);
> > + if (ctx->open_f == NULL) {
> > + spin_unlock(&ctx->lock);
> > +
> > + fn = symbol_request(nfsd_open_local_fh);
> > + if (!fn)
> > + return false;
> > +
> > + spin_lock(&ctx->lock);
> > + /* catch race */
> > + if (ctx->open_f == NULL) {
> > + ctx->open_f = fn;
> > + fn = NULL;
> > + }
> > + }
> > + atomic_inc(&ctx->refcount);
> > + spin_unlock(&ctx->lock);
> > + if (fn)
> > + symbol_put(nfsd_open_local_fh);
> > + return true;
> > +}
> > +
> > +static void
> > +nfs_local_put_lookup_ctx(void)
> > +{
> > + struct nfs_local_open_ctx *ctx = &__local_open_ctx;
> > + nfs_to_nfsd_open_t fn;
> > +
> > + if (atomic_dec_and_lock(&ctx->refcount, &ctx->lock)) {
> > + fn = ctx->open_f;
> > + ctx->open_f = NULL;
> > + spin_unlock(&ctx->lock);
> > + if (fn)
> > + symbol_put(nfsd_open_local_fh);
> > + dprintk("destroy lookup context\n");
> > + }
> > +}
> > +
> > +/*
> > + * nfs_local_enable - attempt to enable local i/o for an nfs_client
> > + */
> > +void
> > +nfs_local_enable(struct nfs_client *clp)
> > +{
> > + if (nfs_local_get_lookup_ctx()) {
> > + dprintk("enabled local i/o\n");
> > + set_bit(NFS_CS_LOCAL_IO, &clp->cl_flags);
> > + }
> > +}
> > +EXPORT_SYMBOL_GPL(nfs_local_enable);
> > +
> > +/*
> > + * nfs_local_disable - disable local i/o for an nfs_client
> > + */
> > +void
> > +nfs_local_disable(struct nfs_client *clp)
> > +{
> > + if (test_and_clear_bit(NFS_CS_LOCAL_IO, &clp->cl_flags)) {
> > + dprintk("disabled local i/o\n");
> > + nfs_local_put_lookup_ctx();
> > + }
> > +}
> > +
> > +/*
> > + * nfs_local_probe - probe local i/o support for an nfs_client
> > + */
> > +void
> > +nfs_local_probe(struct nfs_client *clp)
> > +{
> > + struct sockaddr_in *sin;
> > + struct sockaddr_in6 *sin6;
> > + struct nfs_local_addr *addr;
> > + struct sockaddr *sap;
> > + bool enable = false;
> > +
> > + switch (clp->cl_addr.ss_family) {
> > + case AF_INET:
> > + sin = (struct sockaddr_in *)&clp->cl_addr;
> > + if (ipv4_is_loopback(sin->sin_addr.s_addr)) {
> > + dprintk("%s: detected IPv4 loopback address\n",
> > + __func__);
> > + enable = true;
> > + }
> > + break;
> > + case AF_INET6:
> > + sin6 = (struct sockaddr_in6 *)&clp->cl_addr;
> > + if (memcmp(&sin6->sin6_addr, &in6addr_loopback,
> > + ??? sizeof(struct in6_addr)) == 0) {
> > + dprintk("%s: detected IPv6 loopback address\n",
> > + __func__);
> > + enable = true;
> > + }
> > + break;
> > + default:
> > + break;
> > + }
> > +
> > + if (enable)
> > + goto out;
> > +
> > + list_for_each_entry(addr, &clp->cl_local_addrs, cl_addrs) {
> > + sap = (struct sockaddr *)&addr->address;
> > + if (rpc_cmp_addr((struct sockaddr *)&clp->cl_addr, sap)) {
> > + dprintk("%s: detected local server.\n", __func__);
> > + enable = true;
> > + break;
> > + }
> > + }
> > +
> > +out:
> > + if (enable)
> > + nfs_local_enable(clp);
> > +}
> > +
> > +/*
> > + * nfs_local_open_fh - open a local filehandle
> > + *
> > + * Returns a pointer to a struct file or an ERR_PTR
> > + */
> > +struct file *
> > +nfs_local_open_fh(struct nfs_client *clp, const struct cred *cred,
> > + ? struct nfs_fh *fh, const fmode_t mode)
> > +{
> > + struct nfs_local_open_ctx *ctx = &__local_open_ctx;
> > + struct file *filp;
> > + int status;
> > +
> > + if (mode & ~(FMODE_READ | FMODE_WRITE))
> > + return ERR_PTR(-EINVAL);
> > +
> > + status = ctx->open_f(clp->cl_rpcclient, cred, fh, mode, &filp);
> > + if (status < 0) {
> > + dprintk("%s: open local file failed error=%d\n",
> > + __func__, status);
> > + trace_nfs_local_open_fh(fh, mode, status);
> > + switch (status) {
> > + case -ENXIO:
> > + nfs_local_disable(clp);
> > + fallthrough;
> > + case -ETIMEDOUT:
> > + status = -EAGAIN;
> > + }
> > + filp = ERR_PTR(status);
> > + }
> > + return filp;
> > +}
> > +EXPORT_SYMBOL_GPL(nfs_local_open_fh);
> > +
> > +static struct bio_vec *
> > +nfs_bvec_alloc_and_import_pagevec(struct page **pagevec,
> > + unsigned int npages, gfp_t flags)
> > +{
> > + struct bio_vec *bvec, *p;
> > +
> > + bvec = kmalloc_array(npages, sizeof(*bvec), flags);
> > + if (bvec != NULL) {
> > + for (p = bvec; npages > 0; p++, pagevec++, npages--) {
> > + p->bv_page = *pagevec;
> > + p->bv_len = PAGE_SIZE;
> > + p->bv_offset = 0;
> > + }
> > + }
> > + return bvec;
> > +}
> > +
> > +static void
> > +nfs_local_iocb_free(struct nfs_local_kiocb *iocb)
> > +{
> > + kfree(iocb->bvec);
> > + kfree(iocb);
> > +}
> > +
> > +static struct nfs_local_kiocb *
> > +nfs_local_iocb_alloc(struct nfs_pgio_header *hdr, struct file *filp,
> > + gfp_t flags)
> > +{
> > + struct nfs_local_kiocb *iocb;
> > +
> > + iocb = kmalloc(sizeof(*iocb), flags);
> > + if (iocb == NULL)
> > + return NULL;
> > + iocb->bvec = nfs_bvec_alloc_and_import_pagevec(hdr->page_array.pagevec,
> > + hdr->page_array.npages, flags);
> > + if (iocb->bvec == NULL) {
> > + kfree(iocb);
> > + return NULL;
> > + }
> > + init_sync_kiocb(&iocb->kiocb, filp);
> > + iocb->kiocb.ki_pos = hdr->args.offset;
> > + iocb->hdr = hdr;
> > + /* FIXME: NFS_IOHDR_ODIRECT isn't ever set */
> > + if (test_bit(NFS_IOHDR_ODIRECT, &hdr->flags))
> > + iocb->kiocb.ki_flags |= IOCB_DIRECT|IOCB_DSYNC;
> > + iocb->kiocb.ki_flags &= ~IOCB_APPEND;
> > + return iocb;
> > +}
> > +
> > +static void
> > +nfs_local_iter_init(struct iov_iter *i, struct nfs_local_kiocb *iocb, int dir)
> > +{
> > + struct nfs_pgio_header *hdr = iocb->hdr;
> > +
> > + if (hdr->args.pgbase != 0) {
> > + iov_iter_bvec(i, dir, iocb->bvec,
> > + hdr->page_array.npages,
> > + hdr->args.count + hdr->args.pgbase);
> > + iov_iter_advance(i, hdr->args.pgbase);
> > + } else
> > + iov_iter_bvec(i, dir, iocb->bvec,
> > + hdr->page_array.npages, hdr->args.count);
> > +}
> > +
> > +static void
> > +nfs_local_hdr_release(struct nfs_pgio_header *hdr,
> > + const struct rpc_call_ops *call_ops)
> > +{
> > + call_ops->rpc_call_done(&hdr->task, hdr);
> > + call_ops->rpc_release(hdr);
> > +}
> > +
> > +static void
> > +nfs_local_pgio_init(struct nfs_pgio_header *hdr,
> > + const struct rpc_call_ops *call_ops)
> > +{
> > + hdr->task.tk_ops = call_ops;
> > + if (!hdr->task.tk_start)
> > + hdr->task.tk_start = ktime_get();
> > +}
> > +
> > +static void
> > +nfs_local_pgio_done(struct nfs_pgio_header *hdr, long status)
> > +{
> > + if (status >= 0) {
> > + hdr->res.count = status;
> > + hdr->res.op_status = NFS4_OK;
> > + hdr->task.tk_status = 0;
> > + } else {
> > + hdr->res.op_status = nfs4errno(status);
> > + hdr->task.tk_status = status;
> > + }
> > +}
> > +
> > +static void
> > +nfs_local_pgio_release(struct nfs_local_kiocb *iocb)
> > +{
> > + struct nfs_pgio_header *hdr = iocb->hdr;
> > +
> > + fput(iocb->kiocb.ki_filp);
> > + nfs_local_iocb_free(iocb);
> > + nfs_local_hdr_release(hdr, hdr->task.tk_ops);
> > +}
> > +
> > +static void
> > +nfs_local_read_aio_complete_work(struct work_struct *work)
> > +{
> > + struct nfs_local_kiocb *iocb = container_of(work,
> > + struct nfs_local_kiocb, work);
> > +
> > + nfs_local_pgio_release(iocb);
> > +}
> > +
> > +/*
> > + * Complete the I/O from iocb->kiocb.ki_complete()
> > + *
> > + * Note that this function can be called from a bottom half context,
> > + * hence we need to queue the fput() etc to a workqueue
> > + */
> > +static void
> > +nfs_local_pgio_complete(struct nfs_local_kiocb *iocb)
> > +{
> > + queue_work(nfsiod_workqueue, &iocb->work);
> > +}
> > +
> > +static void
> > +nfs_local_read_done(struct nfs_local_kiocb *iocb, long status)
> > +{
> > + struct nfs_pgio_header *hdr = iocb->hdr;
> > + struct file *filp = iocb->kiocb.ki_filp;
> > +
> > + nfs_local_pgio_done(hdr, status);
> > +
> > + if (hdr->res.count != hdr->args.count ||
> > + ??? hdr->args.offset + hdr->res.count >= i_size_read(file_inode(filp)))
> > + hdr->res.eof = true;
> > +
> > + dprintk("%s: read %ld bytes eof %d.\n", __func__,
> > + status > 0 ? status : 0, hdr->res.eof);
> > +}
> > +
> > +static void
> > +nfs_local_read_aio_complete(struct kiocb *kiocb, long ret)
> > +{
> > + struct nfs_local_kiocb *iocb = container_of(kiocb,
> > + struct nfs_local_kiocb, kiocb);
> > +
> > + nfs_local_read_done(iocb, ret);
> > + nfs_local_pgio_complete(iocb);
> > +}
> > +
> > +static int
> > +nfs_do_local_read(struct nfs_pgio_header *hdr, struct file *filp,
> > + const struct rpc_call_ops *call_ops)
> > +{
> > + struct nfs_local_kiocb *iocb;
> > + struct iov_iter iter;
> > + ssize_t status;
> > +
> > + dprintk("%s: vfs_read count=%u pos=%llu\n",
> > + __func__, hdr->args.count, hdr->args.offset);
> > +
> > + iocb = nfs_local_iocb_alloc(hdr, filp, GFP_KERNEL);
> > + if (iocb == NULL)
> > + return -ENOMEM;
> > + nfs_local_iter_init(&iter, iocb, READ);
> > +
> > + nfs_local_pgio_init(hdr, call_ops);
> > + hdr->res.eof = false;
> > +
> > + if (iocb->kiocb.ki_flags & IOCB_DIRECT) {
> > + INIT_WORK(&iocb->work, nfs_local_read_aio_complete_work);
> > + iocb->kiocb.ki_complete = nfs_local_read_aio_complete;
> > + }
> > +
> > + status = filp->f_op->read_iter(&iocb->kiocb, &iter);
> > + if (status != -EIOCBQUEUED) {
> > + nfs_local_read_done(iocb, status);
> > + nfs_local_pgio_release(iocb);
> > + }
> > + return 0;
> > +}
> > +
> > +static void
> > +nfs_copy_boot_verifier(struct nfs_write_verifier *verifier, struct inode *inode)
> > +{
> > + struct nfs_client *clp = NFS_SERVER(inode)->nfs_client;
> > + u32 *verf = (u32 *)verifier->data;
> > + int seq = 0;
> > +
> > + do {
> > + read_seqbegin_or_lock(&clp->cl_boot_lock, &seq);
> > + verf[0] = (u32)clp->cl_nfssvc_boot.tv_sec;
> > + verf[1] = (u32)clp->cl_nfssvc_boot.tv_nsec;
> > + } while (need_seqretry(&clp->cl_boot_lock, seq));
> > + done_seqretry(&clp->cl_boot_lock, seq);
> > +}
> > +
> > +static void
> > +nfs_reset_boot_verifier(struct inode *inode)
> > +{
> > + struct nfs_client *clp = NFS_SERVER(inode)->nfs_client;
> > +
> > + write_seqlock(&clp->cl_boot_lock);
> > + ktime_get_real_ts64(&clp->cl_nfssvc_boot);
> > + write_sequnlock(&clp->cl_boot_lock);
> > +}
> > +
> > +static void
> > +nfs_set_local_verifier(struct inode *inode,
> > + struct nfs_writeverf *verf,
> > + enum nfs3_stable_how how)
> > +{
> > +
> > + nfs_copy_boot_verifier(&verf->verifier, inode);
> > + verf->committed = how;
> > +}
> > +
> > +static void
> > +nfs_get_vfs_attr(struct file *filp, struct nfs_fattr *fattr)
> > +{
> > + struct kstat stat;
> > +
> > + if (fattr != NULL && vfs_getattr(&filp->f_path, &stat,
> > + STATX_INO |
> > + STATX_ATIME |
> > + STATX_MTIME |
> > + STATX_CTIME |
> > + STATX_SIZE |
> > + STATX_BLOCKS,
> > + AT_STATX_SYNC_AS_STAT) == 0) {
> > + fattr->valid = NFS_ATTR_FATTR_FILEID |
> > + NFS_ATTR_FATTR_CHANGE |
> > + NFS_ATTR_FATTR_SIZE |
> > + NFS_ATTR_FATTR_ATIME |
> > + NFS_ATTR_FATTR_MTIME |
> > + NFS_ATTR_FATTR_CTIME |
> > + NFS_ATTR_FATTR_SPACE_USED;
> > + fattr->fileid = stat.ino;
> > + fattr->size = stat.size;
> > + fattr->atime = stat.atime;
> > + fattr->mtime = stat.mtime;
> > + fattr->ctime = stat.ctime;
> > + fattr->change_attr = nfs_timespec_to_change_attr(&fattr->ctime);
> > + fattr->du.nfs3.used = stat.blocks << 9;
> > + }
> > +}
> > +
> > +static void
> > +nfs_local_write_done(struct nfs_local_kiocb *iocb, long status)
> > +{
> > + struct nfs_pgio_header *hdr = iocb->hdr;
> > +
> > + dprintk("%s: wrote %ld bytes.\n", __func__, status > 0 ? status : 0);
> > +
> > + /* Handle short writes as if they are ENOSPC */
> > + if (status > 0 && status < hdr->args.count) {
> > + hdr->mds_offset += status;
> > + hdr->args.offset += status;
> > + hdr->args.pgbase += status;
> > + hdr->args.count -= status;
> > + nfs_set_pgio_error(hdr, -ENOSPC, hdr->args.offset);
> > + status = -ENOSPC;
> > + }
> > + if (status < 0)
> > + nfs_reset_boot_verifier(hdr->inode);
> > + nfs_local_pgio_done(hdr, status);
> > +}
> > +
> > +static void
> > +nfs_local_write_aio_complete_work(struct work_struct *work)
> > +{
> > + struct nfs_local_kiocb *iocb = container_of(work,
> > + struct nfs_local_kiocb, work);
> > +
> > + nfs_get_vfs_attr(iocb->kiocb.ki_filp, iocb->hdr->res.fattr);
> > + nfs_local_pgio_release(iocb);
> > +}
> > +
> > +static void
> > +nfs_local_write_aio_complete(struct kiocb *kiocb, long ret)
> > +{
> > + struct nfs_local_kiocb *iocb = container_of(kiocb,
> > + struct nfs_local_kiocb, kiocb);
> > +
> > + nfs_local_write_done(iocb, ret);
> > + nfs_local_pgio_complete(iocb);
> > +}
> > +
> > +static int
> > +nfs_do_local_write(struct nfs_pgio_header *hdr, struct file *filp,
> > + const struct rpc_call_ops *call_ops)
> > +{
> > + struct nfs_local_kiocb *iocb;
> > + struct iov_iter iter;
> > + ssize_t status;
> > +
> > + dprintk("%s: vfs_write count=%u pos=%llu %s\n",
> > + __func__, hdr->args.count, hdr->args.offset,
> > + (hdr->args.stable == NFS_UNSTABLE) ?? "unstable" : "stable");
> > +
> > + iocb = nfs_local_iocb_alloc(hdr, filp, GFP_NOIO);
> > + if (iocb == NULL)
> > + return -ENOMEM;
> > + nfs_local_iter_init(&iter, iocb, WRITE);
> > +
> > + switch (hdr->args.stable) {
> > + default:
> > + break;
> > + case NFS_DATA_SYNC:
> > + iocb->kiocb.ki_flags |= IOCB_DSYNC;
> > + break;
> > + case NFS_FILE_SYNC:
> > + iocb->kiocb.ki_flags |= IOCB_DSYNC|IOCB_SYNC;
> > + }
> > + nfs_local_pgio_init(hdr, call_ops);
> > +
> > + if (iocb->kiocb.ki_flags & IOCB_DIRECT) {
> > + INIT_WORK(&iocb->work, nfs_local_write_aio_complete_work);
> > + iocb->kiocb.ki_complete = nfs_local_write_aio_complete;
> > + }
> > +
> > + nfs_set_local_verifier(hdr->inode, hdr->res.verf, hdr->args.stable);
> > +
> > + file_start_write(filp);
> > + status = filp->f_op->write_iter(&iocb->kiocb, &iter);
> > + file_end_write(filp);
> > + if (status != -EIOCBQUEUED) {
> > + nfs_local_write_done(iocb, status);
> > + nfs_get_vfs_attr(filp, hdr->res.fattr);
> > + nfs_local_pgio_release(iocb);
> > + }
> > + return 0;
> > +}
> > +
> > +static struct file *
> > +nfs_local_file_open_cached(struct nfs_client *clp, const struct cred *cred,
> > + ?? struct nfs_fh *fh, struct nfs_open_context *ctx)
> > +{
> > + struct file *filp = ctx->local_filp;
> > +
> > + if (!filp) {
> > + struct file *new = nfs_local_open_fh(clp, cred, fh, ctx->mode);
> > + if (IS_ERR_OR_NULL(new))
> > + return NULL;
> > + /* try to put this one in the slot */
> > + filp = cmpxchg(&ctx->local_filp, NULL, new);
> > + if (filp != NULL)
> > + fput(new);
> > + else
> > + filp = new;
> > + }
> > + return get_file(filp);
> > +}
> > +
> > +struct file *
> > +nfs_local_file_open(struct nfs_client *clp, const struct cred *cred,
> > + ??? struct nfs_fh *fh, struct nfs_open_context *ctx)
> > +{
> > + if (!nfs_server_is_local(clp))
> > + return NULL;
> > + return nfs_local_file_open_cached(clp, cred, fh, ctx);
> > +}
> > +
> > +int
> > +nfs_local_doio(struct nfs_client *clp, struct file *filp,
> > + ?????? struct nfs_pgio_header *hdr,
> > + ?????? const struct rpc_call_ops *call_ops)
> > +{
> > + int status = 0;
> > +
> > + if (!hdr->args.count)
> > + goto out_fput;
> > + /* Don't support filesystems without read_iter/write_iter */
> > + if (!filp->f_op->read_iter || !filp->f_op->write_iter) {
> > + nfs_local_disable(clp);
> > + status = -EAGAIN;
> > + goto out_fput;
> > + }
> > +
> > + switch (hdr->rw_mode) {
> > + case FMODE_READ:
> > + status = nfs_do_local_read(hdr, filp, call_ops);
> > + break;
> > + case FMODE_WRITE:
> > + status = nfs_do_local_write(hdr, filp, call_ops);
> > + break;
> > + default:
> > + dprintk("%s: invalid mode: %d\n", __func__,
> > + hdr->rw_mode);
> > + status = -EINVAL;
> > + }
> > +out_fput:
> > + if (status != 0) {
> > + fput(filp);
> > + hdr->task.tk_status = status;
> > + nfs_local_hdr_release(hdr, call_ops);
> > + }
> > + return status;
> > +}
> > +
> > +static void
> > +nfs_local_init_commit(struct nfs_commit_data *data,
> > + const struct rpc_call_ops *call_ops)
> > +{
> > + data->task.tk_ops = call_ops;
> > +}
> > +
> > +static int
> > +nfs_local_run_commit(struct file *filp, struct nfs_commit_data *data)
> > +{
> > + loff_t start = data->args.offset;
> > + loff_t end = LLONG_MAX;
> > +
> > + if (data->args.count > 0) {
> > + end = start + data->args.count - 1;
> > + if (end < start)
> > + end = LLONG_MAX;
> > + }
> > +
> > + dprintk("%s: commit %llu - %llu\n", __func__, start, end);
> > + return vfs_fsync_range(filp, start, end, 0);
> > +}
> > +
> > +static void
> > +nfs_local_commit_done(struct nfs_commit_data *data, int status)
> > +{
> > + if (status >= 0) {
> > + nfs_set_local_verifier(data->inode,
> > + data->res.verf,
> > + NFS_FILE_SYNC);
> > + data->res.op_status = NFS4_OK;
> > + data->task.tk_status = 0;
> > + } else {
> > + nfs_reset_boot_verifier(data->inode);
> > + data->res.op_status = nfs4errno(status);
> > + data->task.tk_status = status;
> > + }
> > +}
> > +
> > +static void
> > +nfs_local_release_commit_data(struct file *filp,
> > + struct nfs_commit_data *data,
> > + const struct rpc_call_ops *call_ops)
> > +{
> > + fput(filp);
> > + call_ops->rpc_call_done(&data->task, data);
> > + call_ops->rpc_release(data);
> > +}
> > +
> > +static struct nfs_local_fsync_ctx *
> > +nfs_local_fsync_ctx_alloc(struct nfs_commit_data *data, struct file *filp,
> > + gfp_t flags)
> > +{
> > + struct nfs_local_fsync_ctx *ctx = kmalloc(sizeof(*ctx), flags);
> > +
> > + if (ctx != NULL) {
> > + ctx->filp = filp;
> > + ctx->data = data;
> > + INIT_WORK(&ctx->work, nfs_local_fsync_work);
> > + kref_init(&ctx->kref);
> > + }
> > + return ctx;
> > +}
> > +
> > +static void
> > +nfs_local_fsync_ctx_kref_free(struct kref *kref)
> > +{
> > + kfree(container_of(kref, struct nfs_local_fsync_ctx, kref));
> > +}
> > +
> > +static void
> > +nfs_local_fsync_ctx_put(struct nfs_local_fsync_ctx *ctx)
> > +{
> > + kref_put(&ctx->kref, nfs_local_fsync_ctx_kref_free);
> > +}
> > +
> > +static void
> > +nfs_local_fsync_ctx_free(struct nfs_local_fsync_ctx *ctx)
> > +{
> > + nfs_local_release_commit_data(ctx->filp, ctx->data,
> > + ctx->data->task.tk_ops);
> > + nfs_local_fsync_ctx_put(ctx);
> > +}
> > +
> > +static void
> > +nfs_local_fsync_work(struct work_struct *work)
> > +{
> > + struct nfs_local_fsync_ctx *ctx;
> > + int status;
> > +
> > + ctx = container_of(work, struct nfs_local_fsync_ctx, work);
> > +
> > + status = nfs_local_run_commit(ctx->filp, ctx->data);
> > + nfs_local_commit_done(ctx->data, status);
> > + nfs_local_fsync_ctx_free(ctx);
> > +}
> > +
> > +int
> > +nfs_local_commit(struct nfs_client *clp, struct file *filp,
> > + struct nfs_commit_data *data,
> > + const struct rpc_call_ops *call_ops, int how)
> > +{
> > + struct nfs_local_fsync_ctx *ctx;
> > +
> > + ctx = nfs_local_fsync_ctx_alloc(data, filp, GFP_KERNEL);
> > + if (!ctx) {
> > + nfs_local_commit_done(data, -ENOMEM);
> > + nfs_local_release_commit_data(filp, data, call_ops);
> > + return -ENOMEM;
> > + }
> > +
> > + nfs_local_init_commit(data, call_ops);
> > + kref_get(&ctx->kref);
> > + queue_work(nfsiod_workqueue, &ctx->work);
> > + if (how & FLUSH_SYNC)
> > + flush_work(&ctx->work);
> > + nfs_local_fsync_ctx_put(ctx);
> > + return 0;
> > +}
> > +
> > +static int
> > +nfs_client_add_addr(struct nfs_client *clnt, char *buf, gfp_t flags)
> > +{
> > + struct nfs_local_addr *addr;
> > + struct sockaddr *sap;
> > +
> > + dprintk("%s: adding new local IP %s\n", __func__, buf);
> > + addr = kmalloc(sizeof(*addr), flags);
> > + if (!addr) {
> > + printk(KERN_WARNING "NFS: cannot alloc new addr\n");
> > + return -ENOMEM;
> > + }
> > + sap = (struct sockaddr *)&addr->address;
> > + addr->addrlen = rpc_pton(clnt->cl_net, buf, strlen(buf),
> > + sap, sizeof(addr->address));
> > + if (!addr->addrlen) {
> > + printk(KERN_WARNING "NFS: cannot parse new addr %s\n",
> > + buf);
> > + kfree(addr);
> > + return -EINVAL;
> > + }
> > + list_add(&addr->cl_addrs, &clnt->cl_local_addrs);
> > +
> > + return 0;
> > +}
> > +
> > +static int
> > +nfs_client_add_v4_addr(struct nfs_client *clnt, struct in_device *indev,
> > + ?????? char *buf, size_t buflen)
> > +{
> > + struct in_ifaddr *ifa;
> > + int ret;
> > +
> > + in_dev_for_each_ifa_rtnl(ifa, indev) {
> > + snprintf(buf, buflen, "%pI4", &ifa->ifa_local);
> > + ret = nfs_client_add_addr(clnt, buf, GFP_KERNEL);
> > + if (ret < 0)
> > + return ret;
> > + }
> > +
> > + return 0;
> > +}
> > +
> > +#if IS_ENABLED(CONFIG_IPV6)
> > +static int
> > +nfs_client_add_v6_addr(struct nfs_client *clnt, struct inet6_dev *in6dev,
> > + ?????? char *buf, size_t buflen)
> > +{
> > + struct inet6_ifaddr *ifp;
> > + int ret = 0;
> > +
> > + read_lock_bh(&in6dev->lock);
> > + list_for_each_entry(ifp, &in6dev->addr_list, if_list) {
> > + rpc_ntop6_addr_noscopeid(&ifp->addr, buf, buflen);
> > + ret = nfs_client_add_addr(clnt, buf, GFP_ATOMIC);
> > + if (ret < 0)
> > + goto out;
> > + }
> > +out:
> > + read_unlock_bh(&in6dev->lock);
> > + return ret;
> > +}
> > +#else /* CONFIG_IPV6 */
> > +static int
> > +nfs_client_add_v6_addr(struct nfs_client *clnt, struct inet6_dev *in6dev,
> > + ?????? char *buf, size_t buflen)
> > +{
> > + return 0;
> > +}
> > +#endif
> > +
> > +/* Find out all local IP addresses. Ignore errors
> > + * because local IO can be optional.
> > + */
> > +void
> > +nfs_probe_local_addr(struct nfs_client *clnt)
> > +{
> > + struct net_device *dev;
> > + struct in_device *indev;
> > + struct inet6_dev *in6dev;
> > + char buf[INET6_ADDRSTRLEN + IPV6_SCOPE_ID_LEN];
> > + size_t buflen = sizeof(buf);
> > +
> > + rtnl_lock();
> > +
> > + for_each_netdev(clnt->cl_net, dev) {
> > + if (dev->type == ARPHRD_LOOPBACK ||
> > + ??? !(dev->flags & IFF_UP))
> > + continue;
> > + indev = __in_dev_get_rtnl(dev);
> > + if (indev &&
> > + ??? nfs_client_add_v4_addr(clnt, indev, buf, buflen) < 0)
> > + break;
> > + in6dev = __in6_dev_get(dev);
> > + if (in6dev &&
> > + ??? nfs_client_add_v6_addr(clnt, in6dev, buf, buflen) < 0)
> > + break;
> > + }
> > +
> > + rtnl_unlock();
> > +}
> > diff --git a/fs/nfs/nfstrace.h b/fs/nfs/nfstrace.h
> > index 1e710654af11..45d4086cdeb1 100644
> > --- a/fs/nfs/nfstrace.h
> > +++ b/fs/nfs/nfstrace.h
> > @@ -1681,6 +1681,35 @@ TRACE_EVENT(nfs_mount_path,
> > ? TP_printk("path='%s'", __get_str(path))
> > ?);
> > ?
> > +TRACE_EVENT(nfs_local_open_fh,
> > + TP_PROTO(
> > + const struct nfs_fh *fh,
> > + fmode_t fmode,
> > + int error
> > + ),
> > +
> > + TP_ARGS(fh, fmode, error),
> > +
> > + TP_STRUCT__entry(
> > + __field(int, error)
> > + __field(u32, fhandle)
> > + __field(unsigned int, fmode)
> > + ),
> > +
> > + TP_fast_assign(
> > + __entry->error = error;
> > + __entry->fhandle = nfs_fhandle_hash(fh);
> > + __entry->fmode = (__force unsigned int)fmode;
> > + ),
> > +
> > + TP_printk(
> > + "error=%d fhandle=0x%08x mode=%s",
> > + __entry->error,
> > + __entry->fhandle,
> > + show_fs_fmode_flags(__entry->fmode)
> > + )
> > +);
> > +
> > ?DECLARE_EVENT_CLASS(nfs_xdr_event,
> > ? TP_PROTO(
> > ? const struct xdr_stream *xdr,
> > diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
> > index 3786d767e2ff..9210a1821ec9 100644
> > --- a/fs/nfs/pagelist.c
> > +++ b/fs/nfs/pagelist.c
> > @@ -848,7 +848,8 @@ int nfs_initiate_pgio(struct nfs_pageio_descriptor *desc,
> > ? ????? struct nfs_client *clp, struct rpc_clnt *rpc_clnt,
> > ? ????? struct nfs_pgio_header *hdr, const struct cred *cred,
> > ? ????? const struct nfs_rpc_ops *rpc_ops,
> > - ????? const struct rpc_call_ops *call_ops, int how, int flags)
> > + ????? const struct rpc_call_ops *call_ops, int how, int flags,
> > + ????? struct file *localio)
> > ?{
> > ? struct rpc_task *task;
> > ? struct rpc_message msg = {
> > @@ -878,10 +879,16 @@ int nfs_initiate_pgio(struct nfs_pageio_descriptor *desc,
> > ? hdr->args.count,
> > ? (unsigned long long)hdr->args.offset);
> > ?
> > + if (localio) {
> > + nfs_local_doio(clp, localio, hdr, call_ops);
> > + goto out;
> > + }
> > +
> > ? task = rpc_run_task(&task_setup_data);
> > ? if (IS_ERR(task))
> > ? return PTR_ERR(task);
> > ? rpc_put_task(task);
> > +out:
> > ? return 0;
> > ?}
> > ?EXPORT_SYMBOL_GPL(nfs_initiate_pgio);
> > @@ -1080,7 +1087,8 @@ static int nfs_generic_pg_pgios(struct nfs_pageio_descriptor *desc)
> > ? NFS_PROTO(hdr->inode),
> > ? desc->pg_rpc_callops,
> > ? desc->pg_ioflags,
> > - RPC_TASK_CRED_NOREF | task_flags);
> > + RPC_TASK_CRED_NOREF | task_flags,
> > + NULL);
> > ? }
> > ? return ret;
> > ?}
> > diff --git a/fs/nfs/pnfs_nfs.c b/fs/nfs/pnfs_nfs.c
> > index b29b50c2c933..ac3c5e6d4c5e 100644
> > --- a/fs/nfs/pnfs_nfs.c
> > +++ b/fs/nfs/pnfs_nfs.c
> > @@ -538,7 +538,7 @@ pnfs_generic_commit_pagelist(struct inode *inode, struct list_head *mds_pages,
> > ? ??? NFS_CLIENT(inode), data,
> > ? ??? NFS_PROTO(data->inode),
> > ? ??? data->mds_ops, how,
> > - ??? RPC_TASK_CRED_NOREF);
> > + ??? RPC_TASK_CRED_NOREF, NULL);
> > ? } else {
> > ? nfs_init_commit(data, NULL, data->lseg, cinfo);
> > ? initiate_commit(data, how);
> > diff --git a/fs/nfs/write.c b/fs/nfs/write.c
> > index c9cfa1308264..ba0b36b15bc1 100644
> > --- a/fs/nfs/write.c
> > +++ b/fs/nfs/write.c
> > @@ -1672,7 +1672,8 @@ int nfs_initiate_commit(struct nfs_client *clp,
> > ? struct nfs_commit_data *data,
> > ? const struct nfs_rpc_ops *nfs_ops,
> > ? const struct rpc_call_ops *call_ops,
> > - int how, int flags)
> > + int how, int flags,
> > + struct file *localio)
> > ?{
> > ? struct rpc_task *task;
> > ? int priority = flush_task_priority(how);
> > @@ -1691,6 +1692,7 @@ int nfs_initiate_commit(struct nfs_client *clp,
> > ? .flags = RPC_TASK_ASYNC | flags,
> > ? .priority = priority,
> > ? };
> > + int status = 0;
> > ?
> > ? if (nfs_server_capable(data->inode, NFS_CAP_MOVEABLE))
> > ? task_setup_data.flags |= RPC_TASK_MOVEABLE;
> > @@ -1701,13 +1703,19 @@ int nfs_initiate_commit(struct nfs_client *clp,
> > ?
> > ? dprintk("NFS: initiated commit call\n");
> > ?
> > + if (localio) {
> > + nfs_local_commit(clp, localio, data, call_ops, how);
> > + goto out;
> > + }
> > +
> > ? task = rpc_run_task(&task_setup_data);
> > ? if (IS_ERR(task))
> > ? return PTR_ERR(task);
> > ? if (how & FLUSH_SYNC)
> > ? rpc_wait_for_completion_task(task);
> > ? rpc_put_task(task);
> > - return 0;
> > +out:
> > + return status;
> > ?}
> > ?EXPORT_SYMBOL_GPL(nfs_initiate_commit);
> > ?
> > @@ -1819,7 +1827,7 @@ nfs_commit_list(struct inode *inode, struct list_head *head, int how,
> > ? return nfs_initiate_commit(NFS_SERVER(inode)->nfs_client,
> > ? ?? NFS_CLIENT(inode), data, NFS_PROTO(inode),
> > ? ?? data->mds_ops, how,
> > - ?? RPC_TASK_CRED_NOREF | task_flags);
> > + ?? RPC_TASK_CRED_NOREF | task_flags, NULL);
> > ?}
> > ?
> > ?/*
> > diff --git a/fs/nfsd/Makefile b/fs/nfsd/Makefile
> > index b8736a82e57c..702f277394f1 100644
> > --- a/fs/nfsd/Makefile
> > +++ b/fs/nfsd/Makefile
> > @@ -13,7 +13,7 @@ nfsd-y += trace.o
> > ?nfsd-y? += nfssvc.o nfsctl.o nfsfh.o vfs.o \
> > ? ?? export.o auth.o lockd.o nfscache.o \
> > ? ?? stats.o filecache.o nfs3proc.o nfs3xdr.o \
> > - ?? netlink.o
> > + ?? netlink.o localio.o
>
> Isn't there a Kconfig option this should be behind?

I backfill Kconfig knobs in patch 21 and later. I left this patch as
was developed/delivered back in 2014.

> > ?nfsd-$(CONFIG_NFSD_V2) += nfsproc.o nfsxdr.o
> > ?nfsd-$(CONFIG_NFSD_V2_ACL) += nfs2acl.o
> > ?nfsd-$(CONFIG_NFSD_V3_ACL) += nfs3acl.o
> > diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
> > index ad9083ca144b..99631fa56662 100644
> > --- a/fs/nfsd/filecache.c
> > +++ b/fs/nfsd/filecache.c
> > @@ -52,7 +52,7 @@
> > ?#define NFSD_FILE_CACHE_UP ???? (0)
> > ?
> > ?/* We only care about NFSD_MAY_READ/WRITE for this cache */
> > -#define NFSD_FILE_MAY_MASK (NFSD_MAY_READ|NFSD_MAY_WRITE)
> > +#define NFSD_FILE_MAY_MASK (NFSD_MAY_READ|NFSD_MAY_WRITE|NFSD_MAY_LOCALIO)
> > ?
> > ?static DEFINE_PER_CPU(unsigned long, nfsd_file_cache_hits);
> > ?static DEFINE_PER_CPU(unsigned long, nfsd_file_acquisitions);
> > diff --git a/fs/nfsd/localio.c b/fs/nfsd/localio.c
> > new file mode 100644
> > index 000000000000..ff68454a4017
> > --- /dev/null
> > +++ b/fs/nfsd/localio.c
> > @@ -0,0 +1,179 @@
> > +/*
> > + * NFS server support for local clients to bypass network stack
> > + *
> > + * Copyright (C) 2014 Weston Andros Adamson <[email protected]>
> > + */
> > +
> > +#include <linux/exportfs.h>
> > +#include <linux/sunrpc/svcauth_gss.h>
> > +#include <linux/sunrpc/clnt.h>
> > +#include <linux/nfs.h>
> > +#include <linux/string.h>
> > +
> > +#include "nfsd.h"
> > +#include "vfs.h"
> > +#include "netns.h"
> > +#include "filecache.h"
> > +
> > +#define NFSDDBG_FACILITY NFSDDBG_FH
> > +
> > +static void
> > +nfsd_local_fakerqst_destroy(struct svc_rqst *rqstp)
> > +{
> > + if (rqstp->rq_client)
> > + auth_domain_put(rqstp->rq_client);
> > + if (rqstp->rq_cred.cr_group_info)
> > + put_group_info(rqstp->rq_cred.cr_group_info);
> > + kfree(rqstp->rq_cred.cr_principal);
> > + kfree(rqstp->rq_xprt);
> > + kfree(rqstp);
> > +}
> > +
> > +static struct svc_rqst *
> > +nfsd_local_fakerqst_create(struct rpc_clnt *rpc_clnt, const struct cred *cred)
> > +{
> > + struct svc_rqst *rqstp;
> > + struct net *net = rpc_net_ns(rpc_clnt);
> > + struct nfsd_net *nn = net_generic(net, nfsd_net_id);
> > + int status;
> > +
> > + if (!nn->nfsd_serv) {
> > + dprintk("%s: localio denied. Server not running\n", __func__);
> > + return ERR_PTR(-ENXIO);
> > + }
> > +
>
> Note that the above check is racy. The nfsd_serv can go away at any
> time since you're not holding the (global) nfsd_mutex (I assume?).

Yes, worst case we should fallback to going over the network.

> > + rqstp = kzalloc(sizeof(*rqstp), GFP_KERNEL);
> > + if (!rqstp)
> > + return ERR_PTR(-ENOMEM);
> > +
> > + rqstp->rq_xprt = kzalloc(sizeof(*rqstp->rq_xprt), GFP_KERNEL);
> > + if (!rqstp->rq_xprt) {
> > + status = -ENOMEM;
> > + goto out_err;
> > + }
> > +
> > + rqstp->rq_xprt->xpt_net = net;
> > + __set_bit(RQ_SECURE, &rqstp->rq_flags);
> > + rqstp->rq_proc = 1;
> > + rqstp->rq_vers = 3;
> > + rqstp->rq_prot = IPPROTO_TCP;
> > + rqstp->rq_server = nn->nfsd_serv;
> > +
>
> I suspect you need to carry a reference of some sort so that the
> nfsd_serv doesn't go away out from under you while this is running,
> since this is not operating in nfsd thread context.
>
> Typically, every nfsd thread holds a reference to the serv (in serv-
> >sv_nrthreads), so that when you shut down all of the threads, it goes
> away. The catch is that that refcount is currently under the protection
> of the global nfsd_mutex and I doubt you want to take that in this
> codepath.

OK, I can look closer at the inplications.

> > diff --git a/include/linux/nfs_fs_sb.h b/include/linux/nfs_fs_sb.h
> > index 82a6f66fe1d0..6b603b0247f1 100644
> > --- a/include/linux/nfs_fs_sb.h
> > +++ b/include/linux/nfs_fs_sb.h
> > @@ -49,12 +49,14 @@ struct nfs_client {
> > ?#define NFS_CS_DS 7 /* - Server is a DS */
> > ?#define NFS_CS_REUSEPORT 8 /* - reuse src port on reconnect */
> > ?#define NFS_CS_PNFS 9 /* - Server used for pnfs */
> > +#define NFS_CS_LOCAL_IO 10 /* - client is local */
> > ? struct sockaddr_storage cl_addr; /* server identifier */
> > ? size_t cl_addrlen;
> > ? char * cl_hostname; /* hostname of server */
> > ? char * cl_acceptor; /* GSSAPI acceptor name */
> > ? struct list_head cl_share_link; /* link in global client list */
> > ? struct list_head cl_superblocks; /* List of nfs_server structs */
> > + struct list_head cl_local_addrs; /* List of local addresses */
> > ?
>
> Is the above needed? I thought you weren't tracking addresses now and
> were using the new RPC protocol to determine locality?
>
> OIC, this goes away in patch #20...

Right. Like I said above, just left the baseline has untouched as
possible. I can go a different way and forcibly rip out the network
interface (sockaddr) matching so as to limit churn if you prefer?

Thanks,
Mike

2024-06-10 16:56:35

by Mike Snitzer

[permalink] [raw]
Subject: Re: [for-6.11 PATCH 00/29] nfs/nfsd: add support for localio bypass

On Mon, Jun 10, 2024 at 08:47:47AM -0400, Jeff Layton wrote:
> On Fri, 2024-06-07 at 10:26 -0400, Mike Snitzer wrote:
> > Hi,
> >
> > This patch series rebases "localio" changes that Hammerspace (and
> > Primary Data before it) has been carrying since 2014. The reason they
> > weren't proposed for upstream inclusion until now was the handshake
> > for whether or not a client and server are local was brittle. Please
> > see the commit header of "nfs/localio: discontinue network address
> > based localio setup" (patch 20) for more context.
> >
> > Aside from rebasing the original changes (patches 1 - 18) from a
> > 5.15.-130-stable kernel, my contribution to this series was to make
> > the localio handshake more robust. To do so a new LOCALIO protocol
> > extension has been added to both NFS v3 and v4. It follows the
> > well-worn pattern established by the ACL protocol extension.
> >
> > These changes have proven stable against various test scenarios:
> > 1) client and server both on localhost (for both v3 and v4.2)
> > 2) various permutations of client and server support enablement for
> > ?? both local and remote client and server.
> > 3) client on host, server within a container (for both v3 and v4.2)
> >
> > I've preserved all established author and Signed-off-by attribution
> > despite Andy, Peng and Jeff no longer working for Primary Data (or
> > Hammerspace). I've confirmed with Trond that its best to keep it all
> > despite those email addresses no longer being active. My Signed-off-
> > by
> > and that of reviewers and maintainer(s) to follow will build on the
> > established development provenance.
> >
> > I also made sure to preserve the original work done by others (rather
> > than fold changes that I add to this work, to avoid tainting the long
> > established development and sequence of changes).
> >
>
> Honestly, I don't give a fig about the historical changes here. I'd
> _much_ rather see a more logical folded patchset that avoids a lot of
> the "churn". Given the long timescale of this series, the history is
> just not terribly useful.

Fair, will do (and this answers the question I just asked in response
to a different patch).

> For instance, you're adding in the old network address tracking in the
> earlier patches and then remove that in patch #20, which just means I
> have to review a bunch of stuff that is ultimately going away. I'll
> still review the set you've posted, but I think folding down the
> changes would be best.

Yeah, I just wanted to not be excessive with folding patches -- purely
to preserve the evolution of these changes (given the different
authors, etc). But I agree with you, and will sort it out for v2.

Mike

2024-06-10 16:56:38

by Mike Snitzer

[permalink] [raw]
Subject: Re: [for-6.11 PATCH 30/29] nfs/nfsd: ensure localio server always uses its network namespace

On Sun, Jun 09, 2024 at 11:44:27AM -0400, Chuck Lever wrote:
>
> For some reason, I received only patch 30/29.
>
> --
> Chuck Lever

Odd, I did send 30/29 after as a follow-up (in reply to 00/29). But
linux-nfs definitely got the 29 other patches.

I'll be sure to cc you for v2!

2024-06-10 22:38:07

by Mike Snitzer

[permalink] [raw]
Subject: Re: [for-6.11 PATCH 30/29] nfs/nfsd: ensure localio server always uses its network namespace

On Mon, Jun 10, 2024 at 12:50:15PM -0400, Mike Snitzer wrote:
> On Sun, Jun 09, 2024 at 11:44:27AM -0400, Chuck Lever wrote:
> >
> > For some reason, I received only patch 30/29.
> >
> > --
> > Chuck Lever
>
> Odd, I did send 30/29 after as a follow-up (in reply to 00/29). But
> linux-nfs definitely got the 29 other patches.
>
> I'll be sure to cc you for v2!

Looking closer, I don't think linux-nfs allowed the initial patchset.

So I just fixed my mail config so that I'm properly using smtp.kernel.org

It'll go smoother when I post v2 ;)

Mike

2024-06-11 01:13:17

by NeilBrown

[permalink] [raw]
Subject: Re: [for-6.11 PATCH 04/29] sunrpc: handle NULL req->defer in cache_defer_req

On Mon, 10 Jun 2024, Jeff Layton wrote:
> On Fri, 2024-06-07 at 10:26 -0400, Mike Snitzer wrote:
> > From: Weston Andros Adamson <[email protected]>
> >
> > Dont crash with a NULL pointer dereference when req->defer isn't
> > set. This is needed for the localio path.
> >
> > Signed-off-by: Weston Andros Adamson <[email protected]>
> > Signed-off-by: Lance Shelton <[email protected]>
> > Signed-off-by: Trond Myklebust <[email protected]>
> > Signed-off-by: Mike Snitzer <[email protected]>
> > ---
> >  net/sunrpc/cache.c | 2 ++
> >  1 file changed, 2 insertions(+)
> >
> > diff --git a/net/sunrpc/cache.c b/net/sunrpc/cache.c
> > index 95ff74706104..b757b891382c 100644
> > --- a/net/sunrpc/cache.c
> > +++ b/net/sunrpc/cache.c
> > @@ -714,6 +714,8 @@ static bool cache_defer_req(struct cache_req
> > *req, struct cache_head *item)
> >   return false;
> >   }
> >  
> > + if (!req->defer)
> > + return false;
> >   dreq = req->defer(req);
> >   if (dreq == NULL)
> >   return false;
>
> I've gone over it many times, but I still don't quite "get" the
> deferral handling code. I think the above is probably safe, but please
> do Cc Neil Brown on later postings of this series since he has a better
> grasp of that code.
> --
> Jeff Layton <[email protected]>
>

The patch is bound to be "safe" in a technical sense, but I wonder why
it is necessary. And if we add code that isn't necessary we could make
the result look confusing, which isn't "safe" in a social sense...

->defer is always set non-NULL before svc_process() is called, and I
don't think cache_defer_req() can be reached without svc_process() being
called. So I cannot see how ->defer could possibly be NULL.

Can you remove this patch and see if you can trigger a crash. If you
can I'd love to see the kernel stack.

NeilBrown

2024-06-11 02:57:46

by Mike Snitzer

[permalink] [raw]
Subject: Re: [for-6.11 PATCH 04/29] sunrpc: handle NULL req->defer in cache_defer_req

Hi Neil,

On Tue, Jun 11, 2024 at 11:03:16AM +1000, NeilBrown wrote:
> On Mon, 10 Jun 2024, Jeff Layton wrote:
> > On Fri, 2024-06-07 at 10:26 -0400, Mike Snitzer wrote:
> > > From: Weston Andros Adamson <[email protected]>
> > >
> > > Dont crash with a NULL pointer dereference when req->defer isn't
> > > set. This is needed for the localio path.
> > >
> > > Signed-off-by: Weston Andros Adamson <[email protected]>
> > > Signed-off-by: Lance Shelton <[email protected]>
> > > Signed-off-by: Trond Myklebust <[email protected]>
> > > Signed-off-by: Mike Snitzer <[email protected]>
> > > ---
> > > ?net/sunrpc/cache.c | 2 ++
> > > ?1 file changed, 2 insertions(+)
> > >
> > > diff --git a/net/sunrpc/cache.c b/net/sunrpc/cache.c
> > > index 95ff74706104..b757b891382c 100644
> > > --- a/net/sunrpc/cache.c
> > > +++ b/net/sunrpc/cache.c
> > > @@ -714,6 +714,8 @@ static bool cache_defer_req(struct cache_req
> > > *req, struct cache_head *item)
> > > ? return false;
> > > ? }
> > > ?
> > > + if (!req->defer)
> > > + return false;
> > > ? dreq = req->defer(req);
> > > ? if (dreq == NULL)
> > > ? return false;
> >
> > I've gone over it many times, but I still don't quite "get" the
> > deferral handling code. I think the above is probably safe, but please
> > do Cc Neil Brown on later postings of this series since he has a better
> > grasp of that code.
> > --
> > Jeff Layton <[email protected]>
> >
>
> The patch is bound to be "safe" in a technical sense, but I wonder why
> it is necessary. And if we add code that isn't necessary we could make
> the result look confusing, which isn't "safe" in a social sense...
>
> ->defer is always set non-NULL before svc_process() is called, and I
> don't think cache_defer_req() can be reached without svc_process() being
> called. So I cannot see how ->defer could possibly be NULL.
>
> Can you remove this patch and see if you can trigger a crash. If you
> can I'd love to see the kernel stack.

I removed the patch (and also the patch that exported svc_defer) and
I haven't seen any issues. So I'll drop those 2 patches.

Thanks,
Mike

2024-06-12 02:26:03

by Mike Snitzer

[permalink] [raw]
Subject: Re: [for-6.11 PATCH 10/29] nfs/nfsd: add "local io" support

On Mon, Jun 10, 2024 at 12:42:54PM -0400, Mike Snitzer wrote:
> On Mon, Jun 10, 2024 at 08:43:34AM -0400, Jeff Layton wrote:
> > On Fri, 2024-06-07 at 10:26 -0400, Mike Snitzer wrote:
> > > From: Weston Andros Adamson <[email protected]>
> > >
> > > Add client support for bypassing NFS for localhost reads, writes, and commits.
> > >
> > > This is only useful when the client and the server are running on the same
> > > host and in the same container.
> > >
> > > This has dynamic binding with the nfsd module. Local i/o will only work if
> > > nfsd is already loaded.
> > >
> > > [snitm: rebase accounted for commit d8b26071e65e8 ("NFSD: simplify struct nfsfh")
> > > ?and commit 7c98f7cb8fda ("remove call_{read,write}_iter() functions")]
> > >
> > > Signed-off-by: Weston Andros Adamson <[email protected]>
> > > Signed-off-by: Jeff Layton <[email protected]>
> > > Signed-off-by: Peng Tao <[email protected]>
> > > Signed-off-by: Lance Shelton <[email protected]>
> > > Signed-off-by: Trond Myklebust <[email protected]>
> > > Signed-off-by: Mike Snitzer <[email protected]>
...
> > > diff --git a/fs/nfsd/localio.c b/fs/nfsd/localio.c
> > > new file mode 100644
> > > index 000000000000..ff68454a4017
> > > --- /dev/null
> > > +++ b/fs/nfsd/localio.c
> > > @@ -0,0 +1,179 @@
> > > +/*
> > > + * NFS server support for local clients to bypass network stack
> > > + *
> > > + * Copyright (C) 2014 Weston Andros Adamson <[email protected]>
> > > + */
> > > +
> > > +#include <linux/exportfs.h>
> > > +#include <linux/sunrpc/svcauth_gss.h>
> > > +#include <linux/sunrpc/clnt.h>
> > > +#include <linux/nfs.h>
> > > +#include <linux/string.h>
> > > +
> > > +#include "nfsd.h"
> > > +#include "vfs.h"
> > > +#include "netns.h"
> > > +#include "filecache.h"
> > > +
> > > +#define NFSDDBG_FACILITY NFSDDBG_FH
> > > +
> > > +static void
> > > +nfsd_local_fakerqst_destroy(struct svc_rqst *rqstp)
> > > +{
> > > + if (rqstp->rq_client)
> > > + auth_domain_put(rqstp->rq_client);
> > > + if (rqstp->rq_cred.cr_group_info)
> > > + put_group_info(rqstp->rq_cred.cr_group_info);
> > > + kfree(rqstp->rq_cred.cr_principal);
> > > + kfree(rqstp->rq_xprt);
> > > + kfree(rqstp);
> > > +}
> > > +
> > > +static struct svc_rqst *
> > > +nfsd_local_fakerqst_create(struct rpc_clnt *rpc_clnt, const struct cred *cred)
> > > +{
> > > + struct svc_rqst *rqstp;
> > > + struct net *net = rpc_net_ns(rpc_clnt);
> > > + struct nfsd_net *nn = net_generic(net, nfsd_net_id);
> > > + int status;
> > > +
> > > + if (!nn->nfsd_serv) {
> > > + dprintk("%s: localio denied. Server not running\n", __func__);
> > > + return ERR_PTR(-ENXIO);
> > > + }
> > > +
> >
> > Note that the above check is racy. The nfsd_serv can go away at any
> > time since you're not holding the (global) nfsd_mutex (I assume?).
>
> Yes, worst case we should fallback to going over the network.

Actual worst case is we could crash... ;)

> > > + rqstp = kzalloc(sizeof(*rqstp), GFP_KERNEL);
> > > + if (!rqstp)
> > > + return ERR_PTR(-ENOMEM);
> > > +
> > > + rqstp->rq_xprt = kzalloc(sizeof(*rqstp->rq_xprt), GFP_KERNEL);
> > > + if (!rqstp->rq_xprt) {
> > > + status = -ENOMEM;
> > > + goto out_err;
> > > + }
> > > +
> > > + rqstp->rq_xprt->xpt_net = net;
> > > + __set_bit(RQ_SECURE, &rqstp->rq_flags);
> > > + rqstp->rq_proc = 1;
> > > + rqstp->rq_vers = 3;
> > > + rqstp->rq_prot = IPPROTO_TCP;
> > > + rqstp->rq_server = nn->nfsd_serv;
> > > +
> >
> > I suspect you need to carry a reference of some sort so that the
> > nfsd_serv doesn't go away out from under you while this is running,
> > since this is not operating in nfsd thread context.
> >
> > Typically, every nfsd thread holds a reference to the serv (in serv-
> > >sv_nrthreads), so that when you shut down all of the threads, it goes
> > away. The catch is that that refcount is currently under the protection
> > of the global nfsd_mutex and I doubt you want to take that in this
> > codepath.
>
> OK, I can look closer at the implications.

SO I looked, and I'm saddened to see Neil's 6.8 commit 1e3577a4521e
("SUNRPC: discard sv_refcnt, and svc_get/svc_put").

[the lack of useful refcounting with the current code kind of blew me
away.. but nice to see it existed not too long ago.]

Rather than immediately invest the effort to revert commit
1e3577a4521e for my apparent needs... I'll send out v2 to allow for
further review and discussion.

But it really does feel like I _need_ svc_{get,put} and nfsd_{get,put}

Mike

2024-06-12 03:17:19

by NeilBrown

[permalink] [raw]
Subject: Re: [for-6.11 PATCH 10/29] nfs/nfsd: add "local io" support

On Wed, 12 Jun 2024, Mike Snitzer wrote:
> On Mon, Jun 10, 2024 at 12:42:54PM -0400, Mike Snitzer wrote:
> > On Mon, Jun 10, 2024 at 08:43:34AM -0400, Jeff Layton wrote:
> > > On Fri, 2024-06-07 at 10:26 -0400, Mike Snitzer wrote:
> > > > From: Weston Andros Adamson <[email protected]>
> > > >
> > > > Add client support for bypassing NFS for localhost reads, writes, and commits.
> > > >
> > > > This is only useful when the client and the server are running on the same
> > > > host and in the same container.
> > > >
> > > > This has dynamic binding with the nfsd module. Local i/o will only work if
> > > > nfsd is already loaded.
> > > >
> > > > [snitm: rebase accounted for commit d8b26071e65e8 ("NFSD: simplify struct nfsfh")
> > > >  and commit 7c98f7cb8fda ("remove call_{read,write}_iter() functions")]
> > > >
> > > > Signed-off-by: Weston Andros Adamson <[email protected]>
> > > > Signed-off-by: Jeff Layton <[email protected]>
> > > > Signed-off-by: Peng Tao <[email protected]>
> > > > Signed-off-by: Lance Shelton <[email protected]>
> > > > Signed-off-by: Trond Myklebust <[email protected]>
> > > > Signed-off-by: Mike Snitzer <[email protected]>
> ...
> > > > diff --git a/fs/nfsd/localio.c b/fs/nfsd/localio.c
> > > > new file mode 100644
> > > > index 000000000000..ff68454a4017
> > > > --- /dev/null
> > > > +++ b/fs/nfsd/localio.c
> > > > @@ -0,0 +1,179 @@
> > > > +/*
> > > > + * NFS server support for local clients to bypass network stack
> > > > + *
> > > > + * Copyright (C) 2014 Weston Andros Adamson <[email protected]>
> > > > + */
> > > > +
> > > > +#include <linux/exportfs.h>
> > > > +#include <linux/sunrpc/svcauth_gss.h>
> > > > +#include <linux/sunrpc/clnt.h>
> > > > +#include <linux/nfs.h>
> > > > +#include <linux/string.h>
> > > > +
> > > > +#include "nfsd.h"
> > > > +#include "vfs.h"
> > > > +#include "netns.h"
> > > > +#include "filecache.h"
> > > > +
> > > > +#define NFSDDBG_FACILITY NFSDDBG_FH
> > > > +
> > > > +static void
> > > > +nfsd_local_fakerqst_destroy(struct svc_rqst *rqstp)
> > > > +{
> > > > + if (rqstp->rq_client)
> > > > + auth_domain_put(rqstp->rq_client);
> > > > + if (rqstp->rq_cred.cr_group_info)
> > > > + put_group_info(rqstp->rq_cred.cr_group_info);
> > > > + kfree(rqstp->rq_cred.cr_principal);
> > > > + kfree(rqstp->rq_xprt);
> > > > + kfree(rqstp);
> > > > +}
> > > > +
> > > > +static struct svc_rqst *
> > > > +nfsd_local_fakerqst_create(struct rpc_clnt *rpc_clnt, const struct cred *cred)
> > > > +{
> > > > + struct svc_rqst *rqstp;
> > > > + struct net *net = rpc_net_ns(rpc_clnt);
> > > > + struct nfsd_net *nn = net_generic(net, nfsd_net_id);
> > > > + int status;
> > > > +
> > > > + if (!nn->nfsd_serv) {
> > > > + dprintk("%s: localio denied. Server not running\n", __func__);
> > > > + return ERR_PTR(-ENXIO);
> > > > + }
> > > > +
> > >
> > > Note that the above check is racy. The nfsd_serv can go away at any
> > > time since you're not holding the (global) nfsd_mutex (I assume?).
> >
> > Yes, worst case we should fallback to going over the network.
>
> Actual worst case is we could crash... ;)
>
> > > > + rqstp = kzalloc(sizeof(*rqstp), GFP_KERNEL);
> > > > + if (!rqstp)
> > > > + return ERR_PTR(-ENOMEM);
> > > > +
> > > > + rqstp->rq_xprt = kzalloc(sizeof(*rqstp->rq_xprt), GFP_KERNEL);
> > > > + if (!rqstp->rq_xprt) {
> > > > + status = -ENOMEM;
> > > > + goto out_err;
> > > > + }
> > > > +
> > > > + rqstp->rq_xprt->xpt_net = net;
> > > > + __set_bit(RQ_SECURE, &rqstp->rq_flags);
> > > > + rqstp->rq_proc = 1;
> > > > + rqstp->rq_vers = 3;
> > > > + rqstp->rq_prot = IPPROTO_TCP;
> > > > + rqstp->rq_server = nn->nfsd_serv;
> > > > +
> > >
> > > I suspect you need to carry a reference of some sort so that the
> > > nfsd_serv doesn't go away out from under you while this is running,
> > > since this is not operating in nfsd thread context.
> > >
> > > Typically, every nfsd thread holds a reference to the serv (in serv-
> > > >sv_nrthreads), so that when you shut down all of the threads, it goes
> > > away. The catch is that that refcount is currently under the protection
> > > of the global nfsd_mutex and I doubt you want to take that in this
> > > codepath.
> >
> > OK, I can look closer at the implications.
>
> SO I looked, and I'm saddened to see Neil's 6.8 commit 1e3577a4521e
> ("SUNRPC: discard sv_refcnt, and svc_get/svc_put").
>
> [the lack of useful refcounting with the current code kind of blew me
> away.. but nice to see it existed not too long ago.]
>
> Rather than immediately invest the effort to revert commit
> 1e3577a4521e for my apparent needs... I'll send out v2 to allow for
> further review and discussion.
>
> But it really does feel like I _need_ svc_{get,put} and nfsd_{get,put}

You are taking a reference, and at the right time. But it is to the
wrong thing.
You call symbol_request(nfsd_open_local_fh) and so get a reference to
the nfsd module. But you really want a reference to the nfsd service.

I would suggest that you use symbol_request() to get a function which
you then call and immediately symbol_put().... unless you need to use it
to discard the reference to the service later.
The function would take nfsd_mutex, check there is an nfsd_serv, sets a
flag or whatever to indicate the serv is being used for local_io, and
maybe returns the nfsd_serv. As long as that flag is set the serv
cannot be destroy.

Do you need there to be available threads for LOCAL_IO to work? If so
the flag would cause setting the num threads to zero to fail.
If not .... that is weird. It would mean that setting the number of
threads to zero would not destroy the service and I don't think we want
to do that.

So I think that when LOCAL_IO is in use, setting number of threads to
zero must return EBUSY or similar, even if you don't need the threads.

NeilBrown

2024-06-12 03:42:51

by Mike Snitzer

[permalink] [raw]
Subject: Re: [for-6.11 PATCH 10/29] nfs/nfsd: add "local io" support

On Wed, Jun 12, 2024 at 01:17:05PM +1000, NeilBrown wrote:
> On Wed, 12 Jun 2024, Mike Snitzer wrote:
> >
> > SO I looked, and I'm saddened to see Neil's 6.8 commit 1e3577a4521e
> > ("SUNRPC: discard sv_refcnt, and svc_get/svc_put").
> >
> > [the lack of useful refcounting with the current code kind of blew me
> > away.. but nice to see it existed not too long ago.]
> >
> > Rather than immediately invest the effort to revert commit
> > 1e3577a4521e for my apparent needs... I'll send out v2 to allow for
> > further review and discussion.
> >
> > But it really does feel like I _need_ svc_{get,put} and nfsd_{get,put}
>
> You are taking a reference, and at the right time. But it is to the
> wrong thing.

Well, that reference is to ensure nfsd (and nfsd_open_local_fh) is
available for the duration of a local client connected to it.

Really wasn't trying to keep nn->nfsd_serv around with this ;)

> You call symbol_request(nfsd_open_local_fh) and so get a reference to
> the nfsd module. But you really want a reference to the nfsd service.
>
> I would suggest that you use symbol_request() to get a function which
> you then call and immediately symbol_put().... unless you need to use it
> to discard the reference to the service later.

Getting the nfsd_open_local_fh symbol once when client handshakes with
server is meant to avoid needing to do so for every IO the client
issues to the local server.

> The function would take nfsd_mutex, check there is an nfsd_serv, sets a
> flag or whatever to indicate the serv is being used for local_io, and
> maybe returns the nfsd_serv. As long as that flag is set the serv
> cannot be destroy.
>
> Do you need there to be available threads for LOCAL_IO to work? If so
> the flag would cause setting the num threads to zero to fail.
> If not .... that is weird. It would mean that setting the number of
> threads to zero would not destroy the service and I don't think we want
> to do that.
>
> So I think that when LOCAL_IO is in use, setting number of threads to
> zero must return EBUSY or similar, even if you don't need the threads.

Yes, but I really dislike needing to play games with a tangential
characteristic of nfsd_serv (that threads are what hold reference),
rather than have the ability to keep the nfsd_serv around in a cleaner
way.

This localio code doesn't run in nfsd context so it isn't using nfsd's
threads. Forcing threads to be held in reserve because localio doesn't
want nfsd_serv to go away isn't ideal.

Does it maybe make sense to introduce a more narrow svc_get/svc_put
for this auxillary usecase?

Thanks,
Mike

2024-06-12 04:09:43

by NeilBrown

[permalink] [raw]
Subject: Re: [for-6.11 PATCH 10/29] nfs/nfsd: add "local io" support

On Wed, 12 Jun 2024, Mike Snitzer wrote:
> On Wed, Jun 12, 2024 at 01:17:05PM +1000, NeilBrown wrote:
> > On Wed, 12 Jun 2024, Mike Snitzer wrote:
> > >
> > > SO I looked, and I'm saddened to see Neil's 6.8 commit 1e3577a4521e
> > > ("SUNRPC: discard sv_refcnt, and svc_get/svc_put").
> > >
> > > [the lack of useful refcounting with the current code kind of blew me
> > > away.. but nice to see it existed not too long ago.]
> > >
> > > Rather than immediately invest the effort to revert commit
> > > 1e3577a4521e for my apparent needs... I'll send out v2 to allow for
> > > further review and discussion.
> > >
> > > But it really does feel like I _need_ svc_{get,put} and nfsd_{get,put}
> >
> > You are taking a reference, and at the right time. But it is to the
> > wrong thing.
>
> Well, that reference is to ensure nfsd (and nfsd_open_local_fh) is
> available for the duration of a local client connected to it.
>
> Really wasn't trying to keep nn->nfsd_serv around with this ;)
>
> > You call symbol_request(nfsd_open_local_fh) and so get a reference to
> > the nfsd module. But you really want a reference to the nfsd service.
> >
> > I would suggest that you use symbol_request() to get a function which
> > you then call and immediately symbol_put().... unless you need to use it
> > to discard the reference to the service later.
>
> Getting the nfsd_open_local_fh symbol once when client handshakes with
> server is meant to avoid needing to do so for every IO the client
> issues to the local server.
>
> > The function would take nfsd_mutex, check there is an nfsd_serv, sets a
> > flag or whatever to indicate the serv is being used for local_io, and
> > maybe returns the nfsd_serv. As long as that flag is set the serv
> > cannot be destroy.
> >
> > Do you need there to be available threads for LOCAL_IO to work? If so
> > the flag would cause setting the num threads to zero to fail.
> > If not .... that is weird. It would mean that setting the number of
> > threads to zero would not destroy the service and I don't think we want
> > to do that.
> >
> > So I think that when LOCAL_IO is in use, setting number of threads to
> > zero must return EBUSY or similar, even if you don't need the threads.
>
> Yes, but I really dislike needing to play games with a tangential
> characteristic of nfsd_serv (that threads are what hold reference),
> rather than have the ability to keep the nfsd_serv around in a cleaner
> way.
>
> This localio code doesn't run in nfsd context so it isn't using nfsd's
> threads. Forcing threads to be held in reserve because localio doesn't
> want nfsd_serv to go away isn't ideal.

I started reading the rest of the patches and it seems that localio is
only used for READ, WRTE, COMMIT. Is that correct? Is there
documentation so that I don't have to ask?
Obviously there are lots of other NFS requests so you wouldn't be able
to use localio without nfsd threads running....

But a normal remote client doesn't pin the nfsd threads or the
nfsd_serv. If the threads go away, the client blocks until the service
comes back. Would that be appropriate semantics for localio?? i.e. on
each nfsd_open_local_fh() call you mutex_trylock and hold that long
enough to get the 'struct file *'. If it fails because there is no
serv, you simply fall-back to the same path you use for other requests.

Could that work?

>
> Does it maybe make sense to introduce a more narrow svc_get/svc_put
> for this auxillary usecase?

I don't think so. nfsd is a self-contained transactional service. It
doesn't promise to persist beyond each transaction.
Current transactions return status and/or data. Adding a new transaction
that returns a 'struct file *' fits that model reasonable well. Taking
an external reference to the nfs service is quite a big conceptual
change.

Thanks,
NeilBrown


>
> Thanks,
> Mike
>


2024-06-12 04:48:35

by Mike Snitzer

[permalink] [raw]
Subject: Re: [for-6.11 PATCH 10/29] nfs/nfsd: add "local io" support

On Wed, Jun 12, 2024 at 02:09:21PM +1000, NeilBrown wrote:
> On Wed, 12 Jun 2024, Mike Snitzer wrote:
> > On Wed, Jun 12, 2024 at 01:17:05PM +1000, NeilBrown wrote:
> > > On Wed, 12 Jun 2024, Mike Snitzer wrote:
> > > >
> > > > SO I looked, and I'm saddened to see Neil's 6.8 commit 1e3577a4521e
> > > > ("SUNRPC: discard sv_refcnt, and svc_get/svc_put").
> > > >
> > > > [the lack of useful refcounting with the current code kind of blew me
> > > > away.. but nice to see it existed not too long ago.]
> > > >
> > > > Rather than immediately invest the effort to revert commit
> > > > 1e3577a4521e for my apparent needs... I'll send out v2 to allow for
> > > > further review and discussion.
> > > >
> > > > But it really does feel like I _need_ svc_{get,put} and nfsd_{get,put}
> > >
> > > You are taking a reference, and at the right time. But it is to the
> > > wrong thing.
> >
> > Well, that reference is to ensure nfsd (and nfsd_open_local_fh) is
> > available for the duration of a local client connected to it.
> >
> > Really wasn't trying to keep nn->nfsd_serv around with this ;)
> >
> > > You call symbol_request(nfsd_open_local_fh) and so get a reference to
> > > the nfsd module. But you really want a reference to the nfsd service.
> > >
> > > I would suggest that you use symbol_request() to get a function which
> > > you then call and immediately symbol_put().... unless you need to use it
> > > to discard the reference to the service later.
> >
> > Getting the nfsd_open_local_fh symbol once when client handshakes with
> > server is meant to avoid needing to do so for every IO the client
> > issues to the local server.
> >
> > > The function would take nfsd_mutex, check there is an nfsd_serv, sets a
> > > flag or whatever to indicate the serv is being used for local_io, and
> > > maybe returns the nfsd_serv. As long as that flag is set the serv
> > > cannot be destroy.
> > >
> > > Do you need there to be available threads for LOCAL_IO to work? If so
> > > the flag would cause setting the num threads to zero to fail.
> > > If not .... that is weird. It would mean that setting the number of
> > > threads to zero would not destroy the service and I don't think we want
> > > to do that.
> > >
> > > So I think that when LOCAL_IO is in use, setting number of threads to
> > > zero must return EBUSY or similar, even if you don't need the threads.
> >
> > Yes, but I really dislike needing to play games with a tangential
> > characteristic of nfsd_serv (that threads are what hold reference),
> > rather than have the ability to keep the nfsd_serv around in a cleaner
> > way.
> >
> > This localio code doesn't run in nfsd context so it isn't using nfsd's
> > threads. Forcing threads to be held in reserve because localio doesn't
> > want nfsd_serv to go away isn't ideal.
>
> I started reading the rest of the patches and it seems that localio is
> only used for READ, WRTE, COMMIT. Is that correct? Is there
> documentation so that I don't have to ask?

The header for v2's patch 7 (nfs/nfsd: add "localio" support) starts with:
Add client support for bypassing NFS for localhost reads, writes, and
commits.

But I should've made it clearer by saying the same in the 0th header.

> Obviously there are lots of other NFS requests so you wouldn't be able
> to use localio without nfsd threads running....

That's very true.

> But a normal remote client doesn't pin the nfsd threads or the
> nfsd_serv. If the threads go away, the client blocks until the service
> comes back. Would that be appropriate semantics for localio?? i.e. on
> each nfsd_open_local_fh() call you mutex_trylock and hold that long
> enough to get the 'struct file *'. If it fails because there is no
> serv, you simply fall-back to the same path you use for other requests.
>
> Could that work?

I can try it, but feels like it'd elevate nfsd_mutex to "contended",
as such it feels heavy.

> > Does it maybe make sense to introduce a more narrow svc_get/svc_put
> > for this auxillary usecase?
>
> I don't think so. nfsd is a self-contained transactional service. It
> doesn't promise to persist beyond each transaction.
> Current transactions return status and/or data. Adding a new transaction
> that returns a 'struct file *' fits that model reasonable well.

Sure. But to be clear, I am adding global state to nfs_common that
tracks nfsd_uuids. Those change every time a new nfsd_net is created
for a given server (client will then lookup the uuid to see if local).

But even if we went to the extreme where nfsd instances are bouncing
like crazy, the 'nfsd_uuids' list in nfs_common should work fine.

Just not seeing what is gained by nfsd being so ephemeral. Maybe your
point is, it should work in that model too?.. I think it would, just
less efficiently due to make-work to re-get resources it needs.

> Taking an external reference to the nfs service is quite a big
> conceptual change.

Getting the nfsd_open_local_fh() symbol in a coarse-grained manner
isn't about anything other than efficiency. Ensures localio client
calls to nfsd_open_local_fh will work for as long as it exists on that
local server -- nfs.ko's indirect reference to nfsd.ko (via
nfs_localio.ko getting symbol for nfsd_open_local_fh) is dropped when
client is destroyed.

Mike

2024-06-12 06:31:21

by NeilBrown

[permalink] [raw]
Subject: Re: [for-6.11 PATCH 10/29] nfs/nfsd: add "local io" support

On Wed, 12 Jun 2024, Mike Snitzer wrote:
> On Wed, Jun 12, 2024 at 02:09:21PM +1000, NeilBrown wrote:
> > On Wed, 12 Jun 2024, Mike Snitzer wrote:
> > > On Wed, Jun 12, 2024 at 01:17:05PM +1000, NeilBrown wrote:
> > > > On Wed, 12 Jun 2024, Mike Snitzer wrote:
> > > > >
> > > > > SO I looked, and I'm saddened to see Neil's 6.8 commit 1e3577a4521e
> > > > > ("SUNRPC: discard sv_refcnt, and svc_get/svc_put").
> > > > >
> > > > > [the lack of useful refcounting with the current code kind of blew me
> > > > > away.. but nice to see it existed not too long ago.]
> > > > >
> > > > > Rather than immediately invest the effort to revert commit
> > > > > 1e3577a4521e for my apparent needs... I'll send out v2 to allow for
> > > > > further review and discussion.
> > > > >
> > > > > But it really does feel like I _need_ svc_{get,put} and nfsd_{get,put}
> > > >
> > > > You are taking a reference, and at the right time. But it is to the
> > > > wrong thing.
> > >
> > > Well, that reference is to ensure nfsd (and nfsd_open_local_fh) is
> > > available for the duration of a local client connected to it.
> > >
> > > Really wasn't trying to keep nn->nfsd_serv around with this ;)
> > >
> > > > You call symbol_request(nfsd_open_local_fh) and so get a reference to
> > > > the nfsd module. But you really want a reference to the nfsd service.
> > > >
> > > > I would suggest that you use symbol_request() to get a function which
> > > > you then call and immediately symbol_put().... unless you need to use it
> > > > to discard the reference to the service later.
> > >
> > > Getting the nfsd_open_local_fh symbol once when client handshakes with
> > > server is meant to avoid needing to do so for every IO the client
> > > issues to the local server.
> > >
> > > > The function would take nfsd_mutex, check there is an nfsd_serv, sets a
> > > > flag or whatever to indicate the serv is being used for local_io, and
> > > > maybe returns the nfsd_serv. As long as that flag is set the serv
> > > > cannot be destroy.
> > > >
> > > > Do you need there to be available threads for LOCAL_IO to work? If so
> > > > the flag would cause setting the num threads to zero to fail.
> > > > If not .... that is weird. It would mean that setting the number of
> > > > threads to zero would not destroy the service and I don't think we want
> > > > to do that.
> > > >
> > > > So I think that when LOCAL_IO is in use, setting number of threads to
> > > > zero must return EBUSY or similar, even if you don't need the threads.
> > >
> > > Yes, but I really dislike needing to play games with a tangential
> > > characteristic of nfsd_serv (that threads are what hold reference),
> > > rather than have the ability to keep the nfsd_serv around in a cleaner
> > > way.
> > >
> > > This localio code doesn't run in nfsd context so it isn't using nfsd's
> > > threads. Forcing threads to be held in reserve because localio doesn't
> > > want nfsd_serv to go away isn't ideal.
> >
> > I started reading the rest of the patches and it seems that localio is
> > only used for READ, WRTE, COMMIT. Is that correct? Is there
> > documentation so that I don't have to ask?
>
> The header for v2's patch 7 (nfs/nfsd: add "localio" support) starts with:
> Add client support for bypassing NFS for localhost reads, writes, and
> commits.
>
> But I should've made it clearer by saying the same in the 0th header.

Or maybe even a something to Documentation/ which describes your new
side-protocol including how the UUIDs are used and what happens when a
match is found..

>
> > Obviously there are lots of other NFS requests so you wouldn't be able
> > to use localio without nfsd threads running....
>
> That's very true.
>
> > But a normal remote client doesn't pin the nfsd threads or the
> > nfsd_serv. If the threads go away, the client blocks until the service
> > comes back. Would that be appropriate semantics for localio?? i.e. on
> > each nfsd_open_local_fh() call you mutex_trylock and hold that long
> > enough to get the 'struct file *'. If it fails because there is no
> > serv, you simply fall-back to the same path you use for other requests.
> >
> > Could that work?
>
> I can try it, but feels like it'd elevate nfsd_mutex to "contended",
> as such it feels heavy.
>
> > > Does it maybe make sense to introduce a more narrow svc_get/svc_put
> > > for this auxillary usecase?
> >
> > I don't think so. nfsd is a self-contained transactional service. It
> > doesn't promise to persist beyond each transaction.
> > Current transactions return status and/or data. Adding a new transaction
> > that returns a 'struct file *' fits that model reasonable well.
>
> Sure. But to be clear, I am adding global state to nfs_common that
> tracks nfsd_uuids. Those change every time a new nfsd_net is created
> for a given server (client will then lookup the uuid to see if local).
>

I missed the full importance of this on my read-through. It would
certainly make sense for the NFS client to get a counted-reference to
something managed by nfs_common and created/destroyed by nfsd.
It could then easily check if the handle is still valid and repeat the
lookup only if the handle has been marked as invalid.

We still need a way for the filehande-to-struct-file lookup to proceed
without taking nfsd_mutex. Possibly we could use srcu and put a
synchronise_srcu() call at the top of nfsd_destroy_serv()...

>
> But even if we went to the extreme where nfsd instances are bouncing
> like crazy, the 'nfsd_uuids' list in nfs_common should work fine.
>
> Just not seeing what is gained by nfsd being so ephemeral. Maybe your
> point is, it should work in that model too?.. I think it would, just
> less efficiently due to make-work to re-get resources it needs.

Exactly. localio shouldn't prevent the nfsd server from being stopped
and restarted, but it needn't work efficiently when that is happening.

Thanks,
NeilBrown