Return-Path: Received: from foss.arm.com ([217.140.101.70]:54610 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752059AbbIPJsN (ORCPT ); Wed, 16 Sep 2015 05:48:13 -0400 Message-ID: <55F93ADA.7050303@arm.com> Date: Wed, 16 Sep 2015 10:48:10 +0100 From: Marc Zyngier MIME-Version: 1.0 To: "Suzuki K. Poulose" , Trond Myklebust CC: Anna Schumaker , "J. Bruce Fields" , Jeff Layton , "David S. Miller" , linux-nfs@vger.kernel.org, linux-kernel@vger.kernel.org Subject: Re: [PATCHv2] SUNRPC: Fix a race in xs_reset_transport References: <1442394280-24280-1-git-send-email-suzuki.poulose@arm.com> <1442396149-24775-1-git-send-email-suzuki.poulose@arm.com> In-Reply-To: <1442396149-24775-1-git-send-email-suzuki.poulose@arm.com> Content-Type: text/plain; charset=windows-1252 Sender: linux-nfs-owner@vger.kernel.org List-ID: On 16/09/15 10:35, Suzuki K. Poulose wrote: > From: "Suzuki K. Poulose" > > Ignore the previous patch, which was really v1. > > --- > > Encountered the following BUG() with 4.3-rc1 on a fast model > for arm64 with NFS root filesystem. > > ------------[ cut here ]------------ > kernel BUG at fs/inode.c:1493! > > Internal error: Oops - BUG: 0 [#1] PREEMPT SMP > Modules linked in: > CPU: 2 PID: 1 Comm: swapper/0 Not tainted 4.3.0-rc1+ #855 > Hardware name: FVP Base (DT) > task: ffffc000760b0000 ti: ffffc00076070000 task.ti: ffffc00076070000 > PC is at iput+0x144/0x170 > LR is at sock_release+0xbc/0xdc > pc : [] lr : [] pstate: 40000045 > sp : ffffc00076073790 > x29: ffffc00076073790 x28: ffffc00076073b40 > x27: 00000000000003e8 x26: ffffc00076955000 > x25: 000000000000000c x24: ffffc00076637200 > x23: ffffc00076073930 x22: ffffc000769b8180 > x21: ffffc000740500a8 x20: ffffc00074050158 > x19: ffffc00074050030 x18: 000000009fcef6bf > x17: 00000000593e3df5 x16: 00000000b597f71d > x15: 00000000e2f9d3f6 x14: 0ffffffffffffffe > x13: 0000000000000020 x12: 0101010101010101 > x11: 00000000000001c9 x10: 0000000000000750 > x9 : ffffc00076073670 x8 : ffffc000760b07b0 > x7 : 0000000000000001 x6 : 0000000000000001 > x5 : 0000000000000000 x4 : 00000000ffffffff > x3 : 0000000000000000 x2 : ffffffffffffffff > x1 : ffffc00076070000 x0 : 0000000000000060 > > Process swapper/0 (pid: 1, stack limit = 0xffffc00076070020) > Stack: (0xffffc00076073790 to 0xffffc00076074000) > > [ stack contents stripped ] > > Call trace: > [] iput+0x144/0x170 > [] sock_release+0xb8/0xdc > [] xs_reset_transport+0x8c/0xac > [] xs_close+0x50/0x6c > [] xs_destroy+0x20/0x5c > [] xprt_destroy+0x68/0x8c > [] xprt_put+0x24/0x30 > [] rpc_free_client+0x78/0xd8 > [] rpc_release_client+0x94/0xec > [] rpc_shutdown_client+0x58/0x118 > [] nfs_mount+0x100/0x234 > [] nfs_request_mount+0xa8/0x12c > [] nfs_try_mount+0x54/0x2b4 > [] nfs_fs_mount+0x5cc/0xac0 > [] mount_fs+0x38/0x158 > [] vfs_kern_mount+0x48/0x11c > [] do_mount+0x208/0xc04 > [] SyS_mount+0x78/0xd0 > [] mount_root+0x80/0x148 > [] prepare_namespace+0x138/0x184 > [] kernel_init_freeable+0x1cc/0x1f4 > [] kernel_init+0xc/0xd8 > Code: b5fffc00 17ffffed d4210000 17ffffd7 (d4210000) > ---[ end trace 02951451f1831f54 ]--- > > With rpc_debug enabled here is the log : > > RPC: shutting down mount client for your.nfs.server > RPC: rpc_release_client(ffffc00076637800) > RPC: destroying UNIX authenticator ffffc000008f48c8 > RPC: destroying mount client for your.nfs.server > RPC: destroying transport ffffc00076226000 > RPC: xs_destroy xprt ffffc00076226000 > RPC: xs_close xprt ffffc00076226000 > RPC: xs_tcp_state_change client ffffc00076226000... > RPC: state 4 conn 1 dead 0 zapped 1 sk_shutdown 3 > RPC: xs_tcp_state_change client ffffc00076226000... > RPC: state 5 conn 0 dead 0 zapped 1 sk_shutdown 3 > RPC: xs_tcp_state_change client ffffc00076226000... > RPC: state 7 conn 0 dead 0 zapped 1 sk_shutdown 3 > RPC: disconnected transport ffffc00076226000 > RPC: xs_tcp_state_change client ffffc00076226000... > RPC: state 7 conn 0 dead 0 zapped 1 sk_shutdown 3 > RPC: disconnected transport ffffc00076226000 > RPC: xs_tcp_data_ready... > RPC: xs_tcp_state_change client ffffc00076226000... > RPC: state 7 conn 0 dead 0 zapped 1 sk_shutdown 3 > RPC: disconnected transport ffffc00076226000 > RPC: wake_up_first(ffffc00076226170 "xprt_sending") > > So it looks like just before we lock the callbacks in xs_reset_transport, > a few of the callbacks got through and issued the reset before we could > lock it. And we end up repeating the cleanups, ending up in the above > BUG() due to multiple sock_release(). > > This patch fixes the race by confirming that somebody else hasn't performed > the reset while we were waiting for the lock. Also, the transport->sock is > only accessed within the lock to prevent using a stale sock for kernel_sock_shutdown(). > > Signed-off-by: Suzuki K. Poulose > --- > Changes since V1: > - Prevent race in accessing sock. > --- > net/sunrpc/xprtsock.c | 15 ++++++++++++--- > 1 file changed, 12 insertions(+), 3 deletions(-) > > diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c > index 7be90bc..b95f5d0 100644 > --- a/net/sunrpc/xprtsock.c > +++ b/net/sunrpc/xprtsock.c > @@ -812,7 +812,7 @@ static void xs_error_report(struct sock *sk) > > static void xs_reset_transport(struct sock_xprt *transport) > { > - struct socket *sock = transport->sock; > + struct socket *sock; > struct sock *sk = transport->inet; > struct rpc_xprt *xprt = &transport->xprt; > > @@ -822,9 +822,14 @@ static void xs_reset_transport(struct sock_xprt *transport) > if (atomic_read(&transport->xprt.swapper)) > sk_clear_memalloc(sk); > > - kernel_sock_shutdown(sock, SHUT_RDWR); > - > write_lock_bh(&sk->sk_callback_lock); > + /* Check someone has already done the job, while we were waiting */ > + if (!transport->inet) { > + write_unlock_bh(&sk->sk_callback_lock); > + return; > + } > + sock = transport->sock; > + > transport->inet = NULL; > transport->sock = NULL; > > @@ -833,6 +838,10 @@ static void xs_reset_transport(struct sock_xprt *transport) > xs_restore_old_callbacks(transport, sk); > xprt_clear_connected(xprt); > write_unlock_bh(&sk->sk_callback_lock); > + > + if (sock) > + kernel_sock_shutdown(sock, SHUT_RDWR); > + > xs_sock_reset_connection_flags(xprt); > > trace_rpc_socket_close(xprt, sock); > This patch resurrects NFS on my arm64 model, so thanks for that! Tested-by: Marc Zyngier M. -- Jazz is not dead. It just smells funny...