2015-09-11 10:45:22

by Stefan Hajnoczi

[permalink] [raw]
Subject: nfsd leaks 3 sunrpc module references

I'm having a little trouble debugging this sunrpc module refcnt leak in
nfsd. Perhaps a fresh pair of eyes can help.

nfsd leaks 3 references to the sunrpc module here:

# echo -n "asdf 1234" >/proc/fs/nfsd/portlist
bash: echo: write error: Protocol not supported

Now stop nfsd and try unloading the kernel modules:

# systemctl stop nfs-server
# systemctl stop nfs
# systemctl stop proc-fs-nfsd.mount
# systemctl stop var-lib-nfs-rpc_pipefs.mount
# rmmod nfsd
# rmmod nfs_acl
# rmmod lockd
# rmmod auth_rpcgss
# rmmod sunrpc
rmmod: ERROR: Module sunrpc is in use
# lsmod | grep rpc
sunrpc 315392 3

I have reproduced this from source with a tree based on Linux 4.2-rc3
and also with Linux 4.1.6-200.fc22.x86_64.

The following unmatched module_get() calls are recorded by perf record
-e module:module_get -e module:module_put. Perhaps the backtrace is a
little mangled because I don't see xdr_reserve_space() calls in
svc_create_pooled() (even indirectly):

nfsd 4291 [002] 87.914752: module:module_get: sunrpc call_site=0xffffffffa031c86as refcnt=7
318d93 try_module_get (vmlinux)
98da xs_tcp_setup_socket ([sunrpc])
5ca5 xprt_create_transport ([sunrpc])
34ad rpc_create ([sunrpc])
1892a rpcb_getport_async ([sunrpc])
196ea xdr_reserve_space ([sunrpc])
128dc svc_create_pooled ([sunrpc])
686 nfsd_create_serv ([nfsd])
1be9 write_ports ([nfsd])
182c nfsctl_transaction_write ([nfsd])
410d87 __vfs_write (vmlinux)
411706 vfs_write (vmlinux)
4123f5 sys_write (vmlinux)
95ed95 tracesys_phase2 (vmlinux)
f2ff0 __GI___libc_write (/usr/lib64/libc-2.21.so)

nfsd 4291 [002] 87.914775: module:module_get: sunrpc call_site=0xffffffffa03228a5s refcnt=8
318d93 try_module_get (vmlinux)
f915 rpcauth_cache_do_shrink ([sunrpc])
2bbd rpc_new_client ([sunrpc])
337a rpc_create_xprt ([sunrpc])
34d6 rpc_create ([sunrpc])
1892a rpcb_getport_async ([sunrpc])
196ea xdr_reserve_space ([sunrpc])
128dc svc_create_pooled ([sunrpc])
686 nfsd_create_serv ([nfsd])
1be9 write_ports ([nfsd])
182c nfsctl_transaction_write ([nfsd])
410d87 __vfs_write (vmlinux)
411706 vfs_write (vmlinux)
4123f5 sys_write (vmlinux)
95ed95 tracesys_phase2 (vmlinux)
f2ff0 __GI___libc_write (/usr/lib64/libc-2.21.so)

nfsd 4291 [002] 87.914936: module:module_get: sunrpc call_site=0xffffffffa03228a5s refcnt=9
318d93 try_module_get (vmlinux)
f915 rpcauth_cache_do_shrink ([sunrpc])
2bbd rpc_new_client ([sunrpc])
2ead __rpc_clone_client ([sunrpc])
35e7 rpc_bind_new_program ([sunrpc])
18949 rpcb_getport_async ([sunrpc])
196ea xdr_reserve_space ([sunrpc])
128dc svc_create_pooled ([sunrpc])
686 nfsd_create_serv ([nfsd])
1be9 write_ports ([nfsd])
182c nfsctl_transaction_write ([nfsd])
410d87 __vfs_write (vmlinux)
411706 vfs_write (vmlinux)
4123f5 sys_write (vmlinux)
95ed95 tracesys_phase2 (vmlinux)
f2ff0 __GI___libc_write (/usr/lib64/libc-2.21.so)

In this trace I modified nfs-utils nfsd to write to portlist but you get
the same result with echo from a shell (as shown above).

Any ideas?

Stefan


2016-01-04 03:15:35

by Kinglong Mee

[permalink] [raw]
Subject: [PATCH] nfsd: Fix nfsd leaks sunrpc module references

Stefan Hajnoczi reports,
nfsd leaks 3 references to the sunrpc module here:

# echo -n "asdf 1234" >/proc/fs/nfsd/portlist
bash: echo: write error: Protocol not supported

Now stop nfsd and try unloading the kernel modules:

# systemctl stop nfs-server
# systemctl stop nfs
# systemctl stop proc-fs-nfsd.mount
# systemctl stop var-lib-nfs-rpc_pipefs.mount
# rmmod nfsd
# rmmod nfs_acl
# rmmod lockd
# rmmod auth_rpcgss
# rmmod sunrpc
rmmod: ERROR: Module sunrpc is in use
# lsmod | grep rpc
sunrpc 315392 3

It is caused by nfsd don't cleanup rpcb program for nfsd
when destroying svc service after creating xprt fail.

Reported-by: Stefan Hajnoczi <[email protected]>
Signed-off-by: Kinglong Mee <[email protected]>
---
fs/nfsd/nfssvc.c | 7 +++----
1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
index ad4e237..543de5f 100644
--- a/fs/nfsd/nfssvc.c
+++ b/fs/nfsd/nfssvc.c
@@ -314,14 +314,13 @@ static void nfsd_last_thread(struct svc_serv *serv, struct net *net)
* write_ports can create the server without actually starting
* any threads--if we get shut down before any threads are
* started, then nfsd_last_thread will be run before any of this
- * other initialization has been done.
+ * other initialization has been done except the rpcb information.
*/
+ svc_rpcb_cleanup(serv, net);
if (!nn->nfsd_net_up)
return;
- nfsd_shutdown_net(net);
-
- svc_rpcb_cleanup(serv, net);

+ nfsd_shutdown_net(net);
printk(KERN_WARNING "nfsd: last server has exited, flushing export "
"cache\n");
nfsd_export_flush(net);
--
2.5.0


2016-01-04 09:03:24

by Stefan Hajnoczi

[permalink] [raw]
Subject: Re: [PATCH] nfsd: Fix nfsd leaks sunrpc module references

On Mon, Jan 04, 2016 at 11:15:21AM +0800, Kinglong Mee wrote:
> Stefan Hajnoczi reports,
> nfsd leaks 3 references to the sunrpc module here:
>
> # echo -n "asdf 1234" >/proc/fs/nfsd/portlist
> bash: echo: write error: Protocol not supported
>
> Now stop nfsd and try unloading the kernel modules:
>
> # systemctl stop nfs-server
> # systemctl stop nfs
> # systemctl stop proc-fs-nfsd.mount
> # systemctl stop var-lib-nfs-rpc_pipefs.mount
> # rmmod nfsd
> # rmmod nfs_acl
> # rmmod lockd
> # rmmod auth_rpcgss
> # rmmod sunrpc
> rmmod: ERROR: Module sunrpc is in use
> # lsmod | grep rpc
> sunrpc 315392 3
>
> It is caused by nfsd don't cleanup rpcb program for nfsd
> when destroying svc service after creating xprt fail.
>
> Reported-by: Stefan Hajnoczi <[email protected]>
> Signed-off-by: Kinglong Mee <[email protected]>

Thanks for looking into this!

Tested-by: Stefan Hajnoczi <[email protected]>


Attachments:
(No filename) (962.00 B)
signature.asc (473.00 B)
Download all attachments

2016-01-04 22:39:32

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH] nfsd: Fix nfsd leaks sunrpc module references

On Mon, Jan 04, 2016 at 11:15:21AM +0800, Kinglong Mee wrote:
> Stefan Hajnoczi reports,
> nfsd leaks 3 references to the sunrpc module here:
>
> # echo -n "asdf 1234" >/proc/fs/nfsd/portlist
> bash: echo: write error: Protocol not supported
>
> Now stop nfsd and try unloading the kernel modules:
>
> # systemctl stop nfs-server
> # systemctl stop nfs
> # systemctl stop proc-fs-nfsd.mount
> # systemctl stop var-lib-nfs-rpc_pipefs.mount
> # rmmod nfsd
> # rmmod nfs_acl
> # rmmod lockd
> # rmmod auth_rpcgss
> # rmmod sunrpc
> rmmod: ERROR: Module sunrpc is in use
> # lsmod | grep rpc
> sunrpc 315392 3
>
> It is caused by nfsd don't cleanup rpcb program for nfsd
> when destroying svc service after creating xprt fail.
>
> Reported-by: Stefan Hajnoczi <[email protected]>
> Signed-off-by: Kinglong Mee <[email protected]>
> ---
> fs/nfsd/nfssvc.c | 7 +++----
> 1 file changed, 3 insertions(+), 4 deletions(-)
>
> diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
> index ad4e237..543de5f 100644
> --- a/fs/nfsd/nfssvc.c
> +++ b/fs/nfsd/nfssvc.c
> @@ -314,14 +314,13 @@ static void nfsd_last_thread(struct svc_serv *serv, struct net *net)
> * write_ports can create the server without actually starting
> * any threads--if we get shut down before any threads are
> * started, then nfsd_last_thread will be run before any of this
> - * other initialization has been done.
> + * other initialization has been done except the rpcb information.
> */
> + svc_rpcb_cleanup(serv, net);

Are you sure this is right?

For example, nfsd_create_serv calls svc_destroy on svc_bind failure, is
it OK to call svc_rpcb_cleanup in that case?

--b.

> if (!nn->nfsd_net_up)
> return;
> - nfsd_shutdown_net(net);
> -
> - svc_rpcb_cleanup(serv, net);
>
> + nfsd_shutdown_net(net);
> printk(KERN_WARNING "nfsd: last server has exited, flushing export "
> "cache\n");
> nfsd_export_flush(net);
> --
> 2.5.0

2016-01-05 00:49:20

by Kinglong Mee

[permalink] [raw]
Subject: Re: [PATCH] nfsd: Fix nfsd leaks sunrpc module references

On 1/5/2016 06:39, J. Bruce Fields wrote:
> On Mon, Jan 04, 2016 at 11:15:21AM +0800, Kinglong Mee wrote:
>> Stefan Hajnoczi reports,
>> nfsd leaks 3 references to the sunrpc module here:
>>
>> # echo -n "asdf 1234" >/proc/fs/nfsd/portlist
>> bash: echo: write error: Protocol not supported
>>
>> Now stop nfsd and try unloading the kernel modules:
>>
>> # systemctl stop nfs-server
>> # systemctl stop nfs
>> # systemctl stop proc-fs-nfsd.mount
>> # systemctl stop var-lib-nfs-rpc_pipefs.mount
>> # rmmod nfsd
>> # rmmod nfs_acl
>> # rmmod lockd
>> # rmmod auth_rpcgss
>> # rmmod sunrpc
>> rmmod: ERROR: Module sunrpc is in use
>> # lsmod | grep rpc
>> sunrpc 315392 3
>>
>> It is caused by nfsd don't cleanup rpcb program for nfsd
>> when destroying svc service after creating xprt fail.
>>
>> Reported-by: Stefan Hajnoczi <[email protected]>
>> Signed-off-by: Kinglong Mee <[email protected]>
>> ---
>> fs/nfsd/nfssvc.c | 7 +++----
>> 1 file changed, 3 insertions(+), 4 deletions(-)
>>
>> diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
>> index ad4e237..543de5f 100644
>> --- a/fs/nfsd/nfssvc.c
>> +++ b/fs/nfsd/nfssvc.c
>> @@ -314,14 +314,13 @@ static void nfsd_last_thread(struct svc_serv *serv, struct net *net)
>> * write_ports can create the server without actually starting
>> * any threads--if we get shut down before any threads are
>> * started, then nfsd_last_thread will be run before any of this
>> - * other initialization has been done.
>> + * other initialization has been done except the rpcb information.
>> */
>> + svc_rpcb_cleanup(serv, net);
>
> Are you sure this is right?
>
> For example, nfsd_create_serv calls svc_destroy on svc_bind failure, is
> it OK to call svc_rpcb_cleanup in that case?

The svc_rpcb_cleanup in nfsd_last_thread is called in svc_shutdown_net,
svc_destroy never call it.

svc_shutdown_net is called in nfsd_destroy only nfsd_serv->sv_nrthreads == 1.

So that, it's ok calling svc_rpcb_cleanup in nfsd_last_thread always.

Is there anything I missing?

thanks,
Kinglong Mee

>
> --b.
>
>> if (!nn->nfsd_net_up)
>> return;
>> - nfsd_shutdown_net(net);
>> -
>> - svc_rpcb_cleanup(serv, net);
>>
>> + nfsd_shutdown_net(net);
>> printk(KERN_WARNING "nfsd: last server has exited, flushing export "
>> "cache\n");
>> nfsd_export_flush(net);
>> --
>> 2.5.0
>

2016-01-05 22:44:41

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH] nfsd: Fix nfsd leaks sunrpc module references

On Tue, Jan 05, 2016 at 08:49:06AM +0800, Kinglong Mee wrote:
> On 1/5/2016 06:39, J. Bruce Fields wrote:
> > On Mon, Jan 04, 2016 at 11:15:21AM +0800, Kinglong Mee wrote:
> >> Stefan Hajnoczi reports,
> >> nfsd leaks 3 references to the sunrpc module here:
> >>
> >> # echo -n "asdf 1234" >/proc/fs/nfsd/portlist
> >> bash: echo: write error: Protocol not supported
> >>
> >> Now stop nfsd and try unloading the kernel modules:
> >>
> >> # systemctl stop nfs-server
> >> # systemctl stop nfs
> >> # systemctl stop proc-fs-nfsd.mount
> >> # systemctl stop var-lib-nfs-rpc_pipefs.mount
> >> # rmmod nfsd
> >> # rmmod nfs_acl
> >> # rmmod lockd
> >> # rmmod auth_rpcgss
> >> # rmmod sunrpc
> >> rmmod: ERROR: Module sunrpc is in use
> >> # lsmod | grep rpc
> >> sunrpc 315392 3
> >>
> >> It is caused by nfsd don't cleanup rpcb program for nfsd
> >> when destroying svc service after creating xprt fail.
> >>
> >> Reported-by: Stefan Hajnoczi <[email protected]>
> >> Signed-off-by: Kinglong Mee <[email protected]>
> >> ---
> >> fs/nfsd/nfssvc.c | 7 +++----
> >> 1 file changed, 3 insertions(+), 4 deletions(-)
> >>
> >> diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
> >> index ad4e237..543de5f 100644
> >> --- a/fs/nfsd/nfssvc.c
> >> +++ b/fs/nfsd/nfssvc.c
> >> @@ -314,14 +314,13 @@ static void nfsd_last_thread(struct svc_serv *serv, struct net *net)
> >> * write_ports can create the server without actually starting
> >> * any threads--if we get shut down before any threads are
> >> * started, then nfsd_last_thread will be run before any of this
> >> - * other initialization has been done.
> >> + * other initialization has been done except the rpcb information.
> >> */
> >> + svc_rpcb_cleanup(serv, net);
> >
> > Are you sure this is right?
> >
> > For example, nfsd_create_serv calls svc_destroy on svc_bind failure, is
> > it OK to call svc_rpcb_cleanup in that case?
>
> The svc_rpcb_cleanup in nfsd_last_thread is called in svc_shutdown_net,
> svc_destroy never call it.
>
> svc_shutdown_net is called in nfsd_destroy only nfsd_serv->sv_nrthreads == 1.
>
> So that, it's ok calling svc_rpcb_cleanup in nfsd_last_thread always.
>
> Is there anything I missing?

Oops, somehow I got nfsd_destroy and svc_destroy confused.

Applying. Thanks for your patience!

--b.

>
> thanks,
> Kinglong Mee
>
> >
> > --b.
> >
> >> if (!nn->nfsd_net_up)
> >> return;
> >> - nfsd_shutdown_net(net);
> >> -
> >> - svc_rpcb_cleanup(serv, net);
> >>
> >> + nfsd_shutdown_net(net);
> >> printk(KERN_WARNING "nfsd: last server has exited, flushing export "
> >> "cache\n");
> >> nfsd_export_flush(net);
> >> --
> >> 2.5.0
> >