2018-10-30 13:58:50

by zhong jiang

[permalink] [raw]
Subject: Re: [Qestion] Lots of memory leaks when mounting and unmounting nfs client to server continuously.

On 2018/10/30 21:06, Benjamin Coddington wrote:
> Hi zhong jiang,
>
> Try asking in linux-nfs.. but I'll also note that 3.10-stable may be missing a number of fixes to leaks in the NFS GSS code.
>
> I can see a more than a few fixes to memory leaks with:
> git log --grep=leak --oneline net/sunrpc/auth_gss/
>
Thanks for your reply. I has tested some of them in the upsteam as you have said. but It fails to solve the issue completely.
hence, I turn to the relevant experts whether they have happened to the issue or can give some suggestion or not.

Thanks,
zhong jiang
> Ben
>
> On 30 Oct 2018, at 8:45, zhong jiang wrote:
>
>> Hi, Herbert
>>
>> Recently, I hit a memory leak issue when mounting and unmounting nfs with the way of krb5.
>> The issue happens to the linux-3.10-stable.
>>
>> I find that slab-1024 and slab-512 will take up most of the memory. And it can not be freed.
>> Meanwhile, it result in rpcsec_gss_krb5 can be unregistered as well.
>>
>> nfs-sve1:/home # cat /proc/modules | grep krb5
>> rpcsec_gss_krb5 31477 239730 - Live 0xffffffffa0334000
>> auth_rpcgss 59314 3 rpcsec_gss_krb5,nfsd, Live 0xffffffffa0123000
>> sunrpc 300546 25 rpcsec_gss_krb5,nfsd,auth_rpcgss,nfs_acl,lockd, Live 0xffffffffa013b000
>>
>> I open the slab-1024 trace by enabling /sys/kernel/slab/:t-0001024/trace and get the following
>>
>> [123420.989831] Call Trace:
>> [123420.989834] [<ffffffff81642d2a>] dump_stack+0x19/0x1b
>> [123420.989837] [<ffffffff8163f25e>] alloc_debug_processing+0xc5/0x118
>> [123420.989839] [<ffffffff8163fd4d>] __slab_alloc+0x400/0x48f
>> [123420.989841] [<ffffffff812b1795>] ? __crypto_alloc_tfm+0x45/0x170
>> [123420.989845] [<ffffffff812b2307>] ? setkey+0x57/0x110
>> [123420.989847] [<ffffffff8118b5fd>] ? kzfree+0x2d/0x30
>> [123420.989850] [<ffffffff811c6e88>] __kmalloc+0x1c8/0x230
>> [123420.989852] [<ffffffff812b1795>] __crypto_alloc_tfm+0x45/0x170
>> [123420.989854] [<ffffffff812b2e45>] crypto_spawn_tfm+0x45/0x80
>> [123420.989857] [<ffffffff811c6eb3>] ? __kmalloc+0x1f3/0x230
>> [123420.989859] [<ffffffff812c15c7>] crypto_cbc_init_tfm+0x27/0x40
>> [123420.989864] [<ffffffff812b1851>] __crypto_alloc_tfm+0x101/0x170
>> [123420.989866] [<ffffffff812b1ffc>] crypto_alloc_base+0x4c/0xb0
>> [123420.989869] [<ffffffffa033411b>] context_v2_alloc_cipher.isra.2+0x2b/0xc0 [rpcsec_gss_krb5]
>> [123420.989871] [<ffffffffa0334da8>] gss_import_sec_context_kerberos+0xbf8/0xf00 [rpcsec_gss_krb5]
>> [123420.989875] [<ffffffffa0126d5d>] gss_import_sec_context+0x7d/0xb0 [auth_rpcgss]
>> [123420.989878] [<ffffffffa012b35e>] gss_proxy_save_rsc+0x137/0x1b0 [auth_rpcgss]
>> [123420.989884] [<ffffffffa012b51e>] svcauth_gss_proxy_init+0x147/0x1e4 [auth_rpcgss]
>> [123420.989886] [<ffffffff810c2ad6>] ? dequeue_entity+0x106/0x520
>> [123420.989890] [<ffffffffa0128e2a>] svcauth_gss_accept+0x3da/0xb70 [auth_rpcgss]
>> [123420.989892] [<ffffffff810b6c25>] ? check_preempt_curr+0x85/0xa0
>> [123420.989894] [<ffffffff810b6c59>] ? ttwu_do_wakeup+0x19/0xd0
>> [123420.989897] [<ffffffff810b6ded>] ? ttwu_do_activate.constprop.86+0x5d/0x70
>> [123420.989900] [<ffffffff810b9422>] ? try_to_wake_up+0x162/0x330
>> [123420.989908] [<ffffffffa014f490>] svc_authenticate+0xc0/0xe0 [sunrpc]
>> [123420.989914] [<ffffffffa014c04a>] svc_process_common+0x21a/0x6f0 [sunrpc]
>> [123420.989921] [<ffffffffa014c623>] svc_process+0x103/0x170 [sunrpc]
>> [123420.989928] [<ffffffffa01baaaf>] nfsd+0xdf/0x150 [nfsd]
>> [123420.989932] [<ffffffffa01ba9d0>] ? nfsd_destroy+0x80/0x80 [nfsd]
>> [123420.989934] [<ffffffff810a648f>] kthread+0xcf/0xe0
>> [123420.989936] [<ffffffff810a63c0>] ? kthread_create_on_node+0x140/0x140
>> [123420.989939] [<ffffffff81653318>] ret_from_fork+0x58/0x90
>> [123420.989943] [<ffffffff810a63c0>] ? kthread_create_on_node+0x140/0x140
>>
>> I am unfamiliar with crypto. I will be appreciated if you could give me some suggestion.
>>
>> Thanks,
>> zhong jiang
>
> .
>




2018-10-30 14:03:40

by Benjamin Coddington

[permalink] [raw]
Subject: Re: [Qestion] Lots of memory leaks when mounting and unmounting nfs client to server continuously.

On 30 Oct 2018, at 9:58, zhong jiang wrote:

> On 2018/10/30 21:06, Benjamin Coddington wrote:
>> Hi zhong jiang,
>>
>> Try asking in linux-nfs.. but I'll also note that 3.10-stable may be
>> missing a number of fixes to leaks in the NFS GSS code.
>>
>> I can see a more than a few fixes to memory leaks with: git log
>> --grep=leak --oneline net/sunrpc/auth_gss/
>>
> Thanks for your reply. I has tested some of them in the upsteam as you
> have said. but It fails to solve the issue completely.

What have you tested? It is hard to help without specifics.

2018-10-30 14:29:46

by zhong jiang

[permalink] [raw]
Subject: Re: [Qestion] Lots of memory leaks when mounting and unmounting nfs client to server continuously.

On 2018/10/30 22:03, Benjamin Coddington wrote:
> On 30 Oct 2018, at 9:58, zhong jiang wrote:
>
>> On 2018/10/30 21:06, Benjamin Coddington wrote:
>>> Hi zhong jiang,
>>>
>>> Try asking in linux-nfs.. but I'll also note that 3.10-stable may be
>>> missing a number of fixes to leaks in the NFS GSS code.
>>>
>>> I can see a more than a few fixes to memory leaks with: git log
>>> --grep=leak --oneline net/sunrpc/auth_gss/
>>>
>> Thanks for your reply. I has tested some of them in the upsteam as you
>> have said. but It fails to solve the issue completely.
> What have you tested? It is hard to help without specifics.
In the latest mainline. we can filter the following result by the key word "leak"
in net/sunrpc/auth_gss.

0070ed3 Fix 16-byte memory leak in gssp_accept_sec_context_upcall (has been tested, Fail to work)
78794d1 svcrpc: don't leak contexts on PROC_DESTROY (has been tested, Fail to work)
a1d1e9b svcrpc: fix memory leak in gssp_accept_sec_context_upcall (Not yet)
e9776d0 SUNRPC: Fix a pipe_version reference leak (Not yet)
cdead7c SUNRPC: Fix a potential memory leak in auth_gss (Not yet)
980e5a4 nfsd: fix rsi_cache reference count leak (Not yet)
07a2bf1 SUNRPC: Fix a memory leak in gss_create() (Not yet)
3ab9bb7 SUNRPC: Fix a memory leak in the auth credcache code (existed)
54f9247 knfsd: fix resource leak resulting in module refcount leak for rpcsec_gss_krb5.ko (existed)
b797b5b [PATCH] knfsd: svcrpc: fix gss krb5i memory leak (existed)
d4a30e7 RPCSEC_GSS: fix leak in krb5 code caused by superfluous kmalloc (Not yet)


I suspect that commit d4a30e7 ("RPCSEC_GSS: fix leak in krb5 code caused by superfluous kmalloc") will solve
the issue. Further, I will adjust the patch to 3.10. and see what it will happen. Actually I am not sure. :-[

Thanks,
zhong jiang.








2018-11-01 14:18:28

by zhong jiang

[permalink] [raw]
Subject: Re: [Qestion] Lots of memory leaks when mounting and unmounting nfs client to server continuously.

On 2018/10/30 22:03, Benjamin Coddington wrote:
> On 30 Oct 2018, at 9:58, zhong jiang wrote:
>
>> On 2018/10/30 21:06, Benjamin Coddington wrote:
>>> Hi zhong jiang,
>>>
>>> Try asking in linux-nfs.. but I'll also note that 3.10-stable may be
>>> missing a number of fixes to leaks in the NFS GSS code.
>>>
>>> I can see a more than a few fixes to memory leaks with: git log
>>> --grep=leak --oneline net/sunrpc/auth_gss/
>>>
>> Thanks for your reply. I has tested some of them in the upsteam as you
>> have said. but It fails to solve the issue completely.
> What have you tested? It is hard to help without specifics.
Hi, Benjamin

I have tested all of the the following patches in the latest mainline.

git log --grep=leak --oneline net/sunrpc/auth_gss/

Unfortunately, None of the patches works.

Could you give some clues?

Thanks,
zhong jiang


2018-11-07 19:50:50

by David Wysochanski

[permalink] [raw]
Subject: Re: [Qestion] Lots of memory leaks when mounting and unmounting nfs client to server continuously.

On Tue, 2018-10-30 at 21:58 +0800, zhong jiang wrote:
> On 2018/10/30 21:06, Benjamin Coddington wrote:
> > Hi zhong jiang,
> >
> > Try asking in linux-nfs.. but I'll also note that 3.10-stable may
> > be missing a number of fixes to leaks in the NFS GSS code.
> >
> > I can see a more than a few fixes to memory leaks with:
> > git log --grep=leak --oneline net/sunrpc/auth_gss/
> >
>
> Thanks for your reply.  I has tested some of them in the upsteam as
> you have said.  but It fails to solve the issue completely.
> hence, I turn to the relevant experts whether they have happened to
> the issue or  can give some suggestion or not.
>
> Thanks,
> zhong jiang
> > Ben
> >
> > On 30 Oct 2018, at 8:45, zhong jiang wrote:
> >
> > > Hi,   Herbert
> > >
> > > Recently,  I  hit  a memory leak issue when  mounting and
> > > unmounting nfs with  the way of  krb5.
> > > The issue happens to the linux-3.10-stable.
> > >
> > > I find that slab-1024 and slab-512 will take up most of the
> > > memory.  And it can not be freed.
> > > Meanwhile, it result in rpcsec_gss_krb5 can be unregistered as
> > > well.
> > >
> > >

Are you running the latest 3.10-stable?

This sounds very familiar to something I encountered a while ago and it
was a sunrpc cache related problem. The patch that fixed it for me is
in 3.10.106 though.

Can you check if this cache is growing indefinitely?
/proc/net/rpc/auth.rpcsec.context

If it is large, try to flush explicitly with:
date +%s  > /proc/net/rpc/auth.rpcsec.context/flush

If all that checks out, you may need the below upstream fix, but it
went into v3.10.106 as
6a4a5fd svcrpc: don't leak contexts on PROC_DESTROY

commit 6a4a5fd4c7bc6a06ca26ad7327d046d8d3c0932a
Author: J. Bruce Fields <[email protected]>
Date: Mon Jan 9 17:15:18 2017 -0500

svcrpc: don't leak contexts on PROC_DESTROY

commit 78794d1890708cf94e3961261e52dcec2cc34722 upstream.

Context expiry times are in units of seconds since boot, not unix time.

The use of get_seconds() here therefore sets the expiry time decades in
the future. This prevents timely freeing of contexts destroyed by
client RPC_GSS_PROC_DESTROY requests. We'd still free them eventually
(when the module is unloaded or the container shut down), but a lot of
contexts could pile up before then.

Fixes: c5b29f885afe "sunrpc: use seconds since boot in expiry cache"
Reported-by: Andy Adamson <[email protected]>
Signed-off-by: J. Bruce Fields <[email protected]>
Signed-off-by: Willy Tarreau <[email protected]>

diff --git a/net/sunrpc/auth_gss/svcauth_gss.c b/net/sunrpc/auth_gss/svcauth_gss.c
index 62663a0..e625efe 100644
--- a/net/sunrpc/auth_gss/svcauth_gss.c
+++ b/net/sunrpc/auth_gss/svcauth_gss.c
@@ -1518,7 +1518,7 @@ static void destroy_use_gss_proxy_proc_entry(struct net *net) {}
case RPC_GSS_PROC_DESTROY:
if (gss_write_verf(rqstp, rsci->mechctx, gc->gc_seq))
goto auth_err;
- rsci->h.expiry_time = get_seconds();
+ rsci->h.expiry_time = seconds_since_boot();
set_bit(CACHE_NEGATIVE, &rsci->h.flags);
if (resv->iov_len + 4 > PAGE_SIZE)
goto drop;

2018-11-13 06:40:18

by zhong jiang

[permalink] [raw]
Subject: Re: [Qestion] Lots of memory leaks when mounting and unmounting nfs client to server continuously.

On 2018/11/8 3:49, Dave Wysochanski wrote:
> On Tue, 2018-10-30 at 21:58 +0800, zhong jiang wrote:
>> On 2018/10/30 21:06, Benjamin Coddington wrote:
>>> Hi zhong jiang,
>>>
>>> Try asking in linux-nfs.. but I'll also note that 3.10-stable may
>>> be missing a number of fixes to leaks in the NFS GSS code.
>>>
>>> I can see a more than a few fixes to memory leaks with:
>>> git log --grep=leak --oneline net/sunrpc/auth_gss/
>>>
>> Thanks for your reply. I has tested some of them in the upsteam as
>> you have said. but It fails to solve the issue completely.
>> hence, I turn to the relevant experts whether they have happened to
>> the issue or can give some suggestion or not.
>>
>> Thanks,
>> zhong jiang
>>> Ben
>>>
>>> On 30 Oct 2018, at 8:45, zhong jiang wrote:
>>>
>>>> Hi, Herbert
>>>>
>>>> Recently, I hit a memory leak issue when mounting and
>>>> unmounting nfs with the way of krb5.
>>>> The issue happens to the linux-3.10-stable.
>>>>
>>>> I find that slab-1024 and slab-512 will take up most of the
>>>> memory. And it can not be freed.
>>>> Meanwhile, it result in rpcsec_gss_krb5 can be unregistered as
>>>> well.
>>>>
>>>>
> Are you running the latest 3.10-stable?
>
> This sounds very familiar to something I encountered a while ago and it
> was a sunrpc cache related problem. The patch that fixed it for me is
> in 3.10.106 though.
>
> Can you check if this cache is growing indefinitely?
> /proc/net/rpc/auth.rpcsec.context
>
> If it is large, try to flush explicitly with:
> date +%s > /proc/net/rpc/auth.rpcsec.context/flush
>
> If all that checks out, you may need the below upstream fix, but it
> went into v3.10.106 as
> 6a4a5fd svcrpc: don't leak contexts on PROC_DESTROY
>
> commit 6a4a5fd4c7bc6a06ca26ad7327d046d8d3c0932a
> Author: J. Bruce Fields <[email protected]>
> Date: Mon Jan 9 17:15:18 2017 -0500
>
> svcrpc: don't leak contexts on PROC_DESTROY
>
> commit 78794d1890708cf94e3961261e52dcec2cc34722 upstream.
>
> Context expiry times are in units of seconds since boot, not unix time.
>
> The use of get_seconds() here therefore sets the expiry time decades in
> the future. This prevents timely freeing of contexts destroyed by
> client RPC_GSS_PROC_DESTROY requests. We'd still free them eventually
> (when the module is unloaded or the container shut down), but a lot of
> contexts could pile up before then.
>
> Fixes: c5b29f885afe "sunrpc: use seconds since boot in expiry cache"
> Reported-by: Andy Adamson <[email protected]>
> Signed-off-by: J. Bruce Fields <[email protected]>
> Signed-off-by: Willy Tarreau <[email protected]>
>
> diff --git a/net/sunrpc/auth_gss/svcauth_gss.c b/net/sunrpc/auth_gss/svcauth_gss.c
> index 62663a0..e625efe 100644
> --- a/net/sunrpc/auth_gss/svcauth_gss.c
> +++ b/net/sunrpc/auth_gss/svcauth_gss.c
> @@ -1518,7 +1518,7 @@ static void destroy_use_gss_proxy_proc_entry(struct net *net) {}
> case RPC_GSS_PROC_DESTROY:
> if (gss_write_verf(rqstp, rsci->mechctx, gc->gc_seq))
> goto auth_err;
> - rsci->h.expiry_time = get_seconds();
> + rsci->h.expiry_time = seconds_since_boot();
> set_bit(CACHE_NEGATIVE, &rsci->h.flags);
> if (resv->iov_len + 4 > PAGE_SIZE)
> goto drop;
>
> .
>
Hi, Dave

Thank you for kindly help and reply. and sorry for late reply.

Because I just test the patch. It will not work thoroughly.

but I unite the following three patches from upstream, the issue will not occur.

0070ed3 Fix 16-byte memory leak in gssp_accept_sec_context_upcall
78794d1 svcrpc: don't leak contexts on PROC_DESTROY
a1d1e9b svcrpc: fix memory leak in gssp_accept_sec_context_upcall

I think we should backport the relevant patches to stable-3.10.

Thanks,
zhong jiang