When use-gss-proxy is set to 1, write_gssp() creates a rpc client in
gssp_rpc_create(), this increases netns refcount by 2 [1], these
refcounts are supposed to be released in rpcsec_gss_exit_net(), but
it will never happen because rpcsec_gss_exit_net() is triggered only
when netns refcount gets to 0, specifically:
refcount=0 -> cleanup_net() -> ops_exit_list -> rpcsec_gss_exit_net
It is a deadlock situation here, refcount will never get to 0 unless
rpcsec_gss_exit_net() is called.
[1]
SyS_write
vfs_write
proc_reg_write
write_gssp
set_gssp_clnt
gssp_rpc_create
rpc_create
xprt_create_transport
xs_setup_local
xs_setup_xprt
xprt_alloc // get net refcount
xs_local_setup_socket
unix_create
kernel_connect // get net refcount
In this case, the net refcount shouldn't be increased when creating rpc
client, otherwise it will lead to deadlock.
This patchset removes the increased netns reference count.
Wang Hai (2):
net: Modify unix_stream_connect to not reference count the netns of
kernel sockets
auth_gss: Fix deadlock that blocks rpcsec_gss_exit_net when
use-gss-proxy==1
include/linux/sunrpc/clnt.h | 1 +
include/linux/sunrpc/xprt.h | 6 ++++--
net/sunrpc/auth_gss/gss_rpc_upcall.c | 3 ++-
net/sunrpc/clnt.c | 2 ++
net/sunrpc/xprt.c | 13 +++++++++----
net/sunrpc/xprtrdma/svc_rdma_backchannel.c | 2 +-
net/sunrpc/xprtrdma/transport.c | 2 +-
net/sunrpc/xprtsock.c | 4 +++-
net/unix/af_unix.c | 3 ++-
9 files changed, 25 insertions(+), 11 deletions(-)
--
2.17.1
When use-gss-proxy is set to 1, write_gssp() creates a rpc client in
gssp_rpc_create(), this increases netns refcount by 2, these refcounts
are supposed to be released in rpcsec_gss_exit_net(), but it will never
happen because rpcsec_gss_exit_net() is triggered only when netns
refcount gets to 0, specifically:
refcount=0 -> cleanup_net() -> ops_exit_list -> rpcsec_gss_exit_net
It is a deadlock situation here, refcount will never get to 0 unless
rpcsec_gss_exit_net() is called. So, in this case, the netns refcount
should not be increased.
In this case, kernel_connect()->unix_stream_connect() will take a netns
refcount. According to commit 26abe14379f8 ("net: Modify sk_alloc to not
reference count the netns of kernel sockets."), kernel sockets should not
take the netns refcount, so unix_stream_connect() should not take
the netns refcount when the sock is a kernel socket either.
Signed-off-by: Wang Hai <[email protected]>
---
net/unix/af_unix.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 92345c9bb60c..af6ba67779c8 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -1317,7 +1317,8 @@ static int unix_stream_connect(struct socket *sock, struct sockaddr *uaddr,
err = -ENOMEM;
/* create new sock for complete connection */
- newsk = unix_create1(sock_net(sk), NULL, 0, sock->type);
+ newsk = unix_create1(sock_net(sk), NULL,
+ !sk->sk_net_refcnt, sock->type);
if (newsk == NULL)
goto out;
--
2.17.1
From: Wang Hai <[email protected]>
Date: Tue, 28 Sep 2021 11:14:39 +0800
> When use-gss-proxy is set to 1, write_gssp() creates a rpc client in
> gssp_rpc_create(), this increases netns refcount by 2, these refcounts
> are supposed to be released in rpcsec_gss_exit_net(), but it will never
> happen because rpcsec_gss_exit_net() is triggered only when netns
> refcount gets to 0, specifically:
> refcount=0 -> cleanup_net() -> ops_exit_list -> rpcsec_gss_exit_net
> It is a deadlock situation here, refcount will never get to 0 unless
> rpcsec_gss_exit_net() is called. So, in this case, the netns refcount
> should not be increased.
>
> In this case, kernel_connect()->unix_stream_connect() will take a netns
> refcount. According to commit 26abe14379f8 ("net: Modify sk_alloc to not
> reference count the netns of kernel sockets."), kernel sockets should not
> take the netns refcount, so unix_stream_connect() should not take
> the netns refcount when the sock is a kernel socket either.
>
> Signed-off-by: Wang Hai <[email protected]>
> ---
> net/unix/af_unix.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
> index 92345c9bb60c..af6ba67779c8 100644
> --- a/net/unix/af_unix.c
> +++ b/net/unix/af_unix.c
> @@ -1317,7 +1317,8 @@ static int unix_stream_connect(struct socket *sock, struct sockaddr *uaddr,
> err = -ENOMEM;
>
> /* create new sock for complete connection */
> - newsk = unix_create1(sock_net(sk), NULL, 0, sock->type);
> + newsk = unix_create1(sock_net(sk), NULL,
> + !sk->sk_net_refcnt, sock->type);
This patch conflicts with the commit f4bd73b5a950 for now.
https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git/commit/?id=f4bd73b5a950
Could you please rebase and resend the patch set?
> if (newsk == NULL)
> goto out;
>
> --
> 2.17.1