2024-04-02 19:38:47

by Chuck Lever

[permalink] [raw]
Subject: [PATCH RFC] SUNRPC: Fix a slow server-side memory leak with RPC-over-TCP

From: Chuck Lever <[email protected]>

Jan Schunk reports that his small NFS servers suffer from memory
exhaustion after just a few days. A bisect shows that commit
e18e157bb5c8 ("SUNRPC: Send RPC message on TCP with a single
sock_sendmsg() call") is the first bad commit.

That commit assumed that sock_sendmsg() releases all the pages in
the underlying bio_vec array, but the reality is that it doesn't.
svc_xprt_release() releases the rqst's response pages, but the
record marker page fragment isn't one of those, so it was never
released.

This is a narrow fix that can be applied to stable kernels. A
more extensive fix is in the works.

Reported-by: Jan Schunk <[email protected]>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218671
Fixes: e18e157bb5c8 ("SUNRPC: Send RPC message on TCP with a single sock_sendmsg() call")
Cc: Alexander Duyck <[email protected]>
Cc: Jakub Kacinski <[email protected]>
Cc: David Howells <[email protected]>
Signed-off-by: Chuck Lever <[email protected]>
---
net/sunrpc/svcsock.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
index 545017a3daa4..be6c6ee85c8f 100644
--- a/net/sunrpc/svcsock.c
+++ b/net/sunrpc/svcsock.c
@@ -130,6 +130,8 @@ static void svc_reclassify_socket(struct socket *sock)
*/
static void svc_tcp_release_ctxt(struct svc_xprt *xprt, void *ctxt)
{
+ if (ctxt)
+ page_frag_free(ctxt);
}

/**
@@ -1237,6 +1239,7 @@ static int svc_tcp_sendmsg(struct svc_sock *svsk, struct svc_rqst *rqstp,
return -ENOMEM;
memcpy(buf, &marker, sizeof(marker));
bvec_set_virt(rqstp->rq_bvec, buf, sizeof(marker));
+ rqstp->rq_xprt_ctxt = buf;

count = xdr_buf_to_bvec(rqstp->rq_bvec + 1,
ARRAY_SIZE(rqstp->rq_bvec) - 1, &rqstp->rq_res);




2024-04-03 05:30:19

by Cedric Blancher

[permalink] [raw]
Subject: Re: [PATCH RFC] SUNRPC: Fix a slow server-side memory leak with RPC-over-TCP

On Tue, 2 Apr 2024 at 21:38, Chuck Lever <[email protected]> wrote:
>
> From: Chuck Lever <[email protected]>
>
> Jan Schunk reports that his small NFS servers suffer from memory
> exhaustion after just a few days. A bisect shows that commit
> e18e157bb5c8 ("SUNRPC: Send RPC message on TCP with a single
> sock_sendmsg() call") is the first bad commit.
>
> That commit assumed that sock_sendmsg() releases all the pages in
> the underlying bio_vec array, but the reality is that it doesn't.
> svc_xprt_release() releases the rqst's response pages, but the
> record marker page fragment isn't one of those, so it was never
> released.
>
> This is a narrow fix that can be applied to stable kernels. A
> more extensive fix is in the works.
>
> Reported-by: Jan Schunk <[email protected]>
> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218671
> Fixes: e18e157bb5c8 ("SUNRPC: Send RPC message on TCP with a single sock_sendmsg() call")

Is this bug present in 6.6 LTS?

Ced
--
Cedric Blancher <[email protected]>
[https://plus.google.com/u/0/+CedricBlancher/]
Institute Pasteur

2024-04-03 13:28:27

by Chuck Lever

[permalink] [raw]
Subject: Re: [PATCH RFC] SUNRPC: Fix a slow server-side memory leak with RPC-over-TCP

On Wed, Apr 03, 2024 at 07:29:00AM +0200, Cedric Blancher wrote:
> On Tue, 2 Apr 2024 at 21:38, Chuck Lever <[email protected]> wrote:
> >
> > From: Chuck Lever <[email protected]>
> >
> > Jan Schunk reports that his small NFS servers suffer from memory
> > exhaustion after just a few days. A bisect shows that commit
> > e18e157bb5c8 ("SUNRPC: Send RPC message on TCP with a single
> > sock_sendmsg() call") is the first bad commit.
> >
> > That commit assumed that sock_sendmsg() releases all the pages in
> > the underlying bio_vec array, but the reality is that it doesn't.
> > svc_xprt_release() releases the rqst's response pages, but the
> > record marker page fragment isn't one of those, so it was never
> > released.
> >
> > This is a narrow fix that can be applied to stable kernels. A
> > more extensive fix is in the works.
> >
> > Reported-by: Jan Schunk <[email protected]>
> > Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218671
> > Fixes: e18e157bb5c8 ("SUNRPC: Send RPC message on TCP with a single sock_sendmsg() call")
>
> Is this bug present in 6.6 LTS?

It was introduced in v6.6, so yes.

~/linux $ git describe --contains e18e157bb5c8
v6.6-rc1~108^2~27
~/linux $

Once this fix is merged upstream, the LTS maintainers will notice
the Fixes: tag and backport it to v6.6 automatically.

--
Chuck Lever