LinuxLists.cc - [PATCH v2 0/9] Crossing our fingers is not a strategy

2022-03-22 01:48:34

Subject: [PATCH v2 0/9] Crossing our fingers is not a strategy

From: Trond Myklebust <[email protected]>

We'd like to avoid GFP_NOWAIT whenever possible, because it has no fall-
back reclaim strategy for dealing with a failure of the initial
allocation.
At the same time, memory pools appear to be suboptimal as an allocation
strategy when used with anything other than GFP_NOWAIT, since they cause
threads to hang upon failure.

This patch series therefore aims to demote the mempools to being a
strategy of last resort, with the primary allocation strategy being to
use the underlying slabs.
While we're at it, we want to ensure that rpciod, xprtiod and nfsiod all
use the same allocation strategy, so that the latter two don't thwart
our ability to complete writeback RPC calls by getting blocked in the mm
layer.

Trond Myklebust (9):
NFS: Fix memory allocation in rpc_malloc()
NFS: Fix memory allocation in rpc_alloc_task()
SUNRPC: Fix unx_lookup_cred() allocation
SUNRPC: Make the rpciod and xprtiod slab allocation modes consistent
NFS: nfsiod should not block forever in mempool_alloc()
NFS: Avoid writeback threads getting stuck in mempool_alloc()
NFSv4/pnfs: Ensure pNFS allocation modes are consistent with nfsiod
pNFS/flexfiles: Ensure pNFS allocation modes are consistent with
nfsiod
pNFS/files: Ensure pNFS allocation modes are consistent with nfsiod

fs/nfs/filelayout/filelayout.c | 2 +-
fs/nfs/flexfilelayout/flexfilelayout.c | 50 +++++++++++---------------
fs/nfs/internal.h | 7 ++++
fs/nfs/nfs42proc.c | 2 +-
fs/nfs/pagelist.c | 10 +++---
fs/nfs/pnfs.c | 39 +++++++++-----------
fs/nfs/pnfs_nfs.c | 8 +++--
fs/nfs/write.c | 34 +++++++++---------
include/linux/nfs_fs.h | 2 +-
include/linux/sunrpc/sched.h | 1 +
net/sunrpc/auth_unix.c | 18 +++++-----
net/sunrpc/backchannel_rqst.c | 8 ++---
net/sunrpc/rpcb_clnt.c | 4 +--
net/sunrpc/sched.c | 31 ++++++++++------
net/sunrpc/socklib.c | 3 +-
net/sunrpc/xprt.c | 5 +--
net/sunrpc/xprtsock.c | 11 +++---
17 files changed, 123 insertions(+), 112 deletions(-)

--
2.35.1

2022-03-22 01:48:36

by Trond Myklebust

[permalink] [raw]

Subject: [PATCH v2 1/9] NFS: Fix memory allocation in rpc_malloc()

From: Trond Myklebust <[email protected]>

When in a low memory situation, we do want rpciod to kick off direct
reclaim in the case where that helps, however we don't want it looping
forever in mempool_alloc().
So first try allocating from the slab using GFP_KERNEL | __GFP_NORETRY,
and then fall back to a GFP_NOWAIT allocation from the mempool.

Ditto for rpc_alloc_task()

Signed-off-by: Trond Myklebust <[email protected]>
---
include/linux/sunrpc/sched.h | 1 +
net/sunrpc/sched.c | 21 ++++++++++++++-------
2 files changed, 15 insertions(+), 7 deletions(-)

diff --git a/include/linux/sunrpc/sched.h b/include/linux/sunrpc/sched.h
index 56710f8056d3..1d7a3e51b795 100644
--- a/include/linux/sunrpc/sched.h
+++ b/include/linux/sunrpc/sched.h
@@ -262,6 +262,7 @@ void rpc_destroy_mempool(void);
extern struct workqueue_struct *rpciod_workqueue;
extern struct workqueue_struct *xprtiod_workqueue;
void rpc_prepare_task(struct rpc_task *task);
+gfp_t rpc_task_gfp_mask(void);

static inline int rpc_wait_for_completion_task(struct rpc_task *task)
{
diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c
index 7c8f87ebdbc0..d59a033820be 100644
--- a/net/sunrpc/sched.c
+++ b/net/sunrpc/sched.c
@@ -57,6 +57,13 @@ struct workqueue_struct *rpciod_workqueue __read_mostly;
struct workqueue_struct *xprtiod_workqueue __read_mostly;
EXPORT_SYMBOL_GPL(xprtiod_workqueue);

+gfp_t rpc_task_gfp_mask(void)
+{
+ if (current->flags & PF_WQ_WORKER)
+ return GFP_KERNEL | __GFP_NORETRY | __GFP_NOWARN;
+ return GFP_KERNEL;
+}
+
unsigned long
rpc_task_timeout(const struct rpc_task *task)
{
@@ -1030,15 +1037,15 @@ int rpc_malloc(struct rpc_task *task)
struct rpc_rqst *rqst = task->tk_rqstp;
size_t size = rqst->rq_callsize + rqst->rq_rcvsize;
struct rpc_buffer *buf;
- gfp_t gfp = GFP_KERNEL;
-
- if (RPC_IS_ASYNC(task))
- gfp = GFP_NOWAIT | __GFP_NOWARN;
+ gfp_t gfp = rpc_task_gfp_mask();

size += sizeof(struct rpc_buffer);
- if (size <= RPC_BUFFER_MAXSIZE)
- buf = mempool_alloc(rpc_buffer_mempool, gfp);
- else
+ if (size <= RPC_BUFFER_MAXSIZE) {
+ buf = kmem_cache_alloc(rpc_buffer_slabp, gfp);
+ /* Reach for the mempool if dynamic allocation fails */
+ if (!buf && RPC_IS_ASYNC(task))
+ buf = mempool_alloc(rpc_buffer_mempool, GFP_NOWAIT);
+ } else
buf = kmalloc(size, gfp);

if (!buf)
--
2.35.1

2022-03-22 01:48:39

by Trond Myklebust

[permalink] [raw]

Subject: [PATCH v2 2/9] NFS: Fix memory allocation in rpc_alloc_task()

From: Trond Myklebust <[email protected]>

As for rpc_malloc(), we first try allocating from the slab, then fall
back to a non-waiting allocation from the mempool.

Signed-off-by: Trond Myklebust <[email protected]>
---
net/sunrpc/sched.c | 10 +++++++---
1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c
index d59a033820be..b258b87a3ec2 100644
--- a/net/sunrpc/sched.c
+++ b/net/sunrpc/sched.c
@@ -1108,10 +1108,14 @@ static void rpc_init_task(struct rpc_task *task, const struct rpc_task_setup *ta
rpc_init_task_statistics(task);
}

-static struct rpc_task *
-rpc_alloc_task(void)
+static struct rpc_task *rpc_alloc_task(void)
{
- return (struct rpc_task *)mempool_alloc(rpc_task_mempool, GFP_KERNEL);
+ struct rpc_task *task;
+
+ task = kmem_cache_alloc(rpc_task_slabp, rpc_task_gfp_mask());
+ if (task)
+ return task;
+ return mempool_alloc(rpc_task_mempool, GFP_NOWAIT);
}

/*
--
2.35.1

2022-03-24 23:41:42

by NeilBrown

[permalink] [raw]

Subject: Re: [PATCH v2 0/9] Crossing our fingers is not a strategy

On Tue, 22 Mar 2022, [email protected] wrote:
> From: Trond Myklebust <[email protected]>
>
> We'd like to avoid GFP_NOWAIT whenever possible, because it has no fall-
> back reclaim strategy for dealing with a failure of the initial
> allocation.

I'm not sure I entirely agree with that. GFP_NOWAIT will ensure kswapd
runs on failure, so waiting briefly and retrying (which sunrpc does on
-ENOMEM (at least ni call_refreshresult) is a valid fallback.

However, I do like the new rpc_task_gfp_mask() and that fact that you
have used it quite widely.

So: looks good to me. I haven't carefully reviewed each patch enough to
say Reviewed-by, but I did see an easy problems.

Thanks,
NeilBrown