2009-02-27 23:52:40

by Ben Myers

[permalink] [raw]
Subject: [PATCH] sunrpc: xprt is not connected until after sock is set

Hi Aaron,

I'm just getting back to this.

On Wed, Feb 25, 2009 at 04:17:45PM -0800, Aaron Straus wrote:
> Quick question, do we need a barrier between setting the transport->sock
> and the xprt_set_connected(xprt)?

Yeah, looks that way. That was some interesting reading. Here is the
patch as it stands now. Hopefully I got that right. This has
apparently fixed the problem on ia64 even with out the barriers. rpciod
racing with a write.

> Also, out of curiosity, do you know what changed to introduce the BUG?

I haven't looked into that.

Bruce, can you take this patch? I understand that you're the
maintainer?

Thanks!
Ben

xs_sendpages() returned -ENOTCONN to xs_udp_send_request() and we tried to
clear a bit in transport->sock->flags when sock hadn't been set. With this
change we will instead return earlier in xprt_prepare_transmit() and the rpc
will be retried.
---
include/linux/sunrpc/xprt.h | 9 ++++++++-
net/sunrpc/xprtsock.c | 3 +--
2 files changed, 9 insertions(+), 3 deletions(-)

Index: linux-2.6.27.15-2/include/linux/sunrpc/xprt.h
===================================================================
--- linux-2.6.27.15-2.orig/include/linux/sunrpc/xprt.h
+++ linux-2.6.27.15-2/include/linux/sunrpc/xprt.h
@@ -266,17 +266,24 @@ int xs_swapper(struct rpc_xprt *xprt,

static inline void xprt_set_connected(struct rpc_xprt *xprt)
{
+ smp_wmb();
set_bit(XPRT_CONNECTED, &xprt->state);
}

static inline void xprt_clear_connected(struct rpc_xprt *xprt)
{
clear_bit(XPRT_CONNECTED, &xprt->state);
+ smp_wmb();
}

static inline int xprt_connected(struct rpc_xprt *xprt)
{
- return test_bit(XPRT_CONNECTED, &xprt->state);
+ int connected;
+
+ connected = test_bit(XPRT_CONNECTED, &xprt->state);
+ smp_rmb();
+
+ return connected;
}

static inline int xprt_test_and_set_connected(struct rpc_xprt *xprt)
Index: linux-2.6.27.15-2/net/sunrpc/xprtsock.c
===================================================================
--- linux-2.6.27.15-2.orig/net/sunrpc/xprtsock.c
+++ linux-2.6.27.15-2/net/sunrpc/xprtsock.c
@@ -1512,14 +1512,13 @@ static void xs_udp_finish_connecting(str
sk->sk_no_check = UDP_CSUM_NORCV;
sk->sk_allocation = GFP_ATOMIC;

- xprt_set_connected(xprt);
-
/* Reset to new socket */
transport->sock = sock;
transport->inet = sk;

xs_set_memalloc(xprt);

+ xprt_set_connected(xprt);
write_unlock_bh(&sk->sk_callback_lock);
}
xs_udp_do_set_buffer_size(xprt);




2009-02-28 00:38:44

by Myklebust, Trond

[permalink] [raw]
Subject: Re: [PATCH] sunrpc: xprt is not connected until after sock is set

On Fri, 2009-02-27 at 17:54 -0600, Ben Myers wrote:
> Hi Aaron,
>
> I'm just getting back to this.
>
> On Wed, Feb 25, 2009 at 04:17:45PM -0800, Aaron Straus wrote:
> > Quick question, do we need a barrier between setting the transport->sock
> > and the xprt_set_connected(xprt)?
>
> Yeah, looks that way. That was some interesting reading. Here is the
> patch as it stands now. Hopefully I got that right. This has
> apparently fixed the problem on ia64 even with out the barriers. rpciod
> racing with a write.
>
> > Also, out of curiosity, do you know what changed to introduce the BUG?
>
> I haven't looked into that.
>
> Bruce, can you take this patch? I understand that you're the
> maintainer?
>
> Thanks!
> Ben
>
> xs_sendpages() returned -ENOTCONN to xs_udp_send_request() and we tried to
> clear a bit in transport->sock->flags when sock hadn't been set. With this
> change we will instead return earlier in xprt_prepare_transmit() and the rpc
> will be retried.
> ---
> include/linux/sunrpc/xprt.h | 9 ++++++++-
> net/sunrpc/xprtsock.c | 3 +--
> 2 files changed, 9 insertions(+), 3 deletions(-)
>
> Index: linux-2.6.27.15-2/include/linux/sunrpc/xprt.h
> ===================================================================
> --- linux-2.6.27.15-2.orig/include/linux/sunrpc/xprt.h
> +++ linux-2.6.27.15-2/include/linux/sunrpc/xprt.h
> @@ -266,17 +266,24 @@ int xs_swapper(struct rpc_xprt *xprt,
>
> static inline void xprt_set_connected(struct rpc_xprt *xprt)
> {
> + smp_wmb();
> set_bit(XPRT_CONNECTED, &xprt->state);
> }
>
> static inline void xprt_clear_connected(struct rpc_xprt *xprt)
> {
> clear_bit(XPRT_CONNECTED, &xprt->state);
> + smp_wmb();
> }
>
> static inline int xprt_connected(struct rpc_xprt *xprt)
> {
> - return test_bit(XPRT_CONNECTED, &xprt->state);
> + int connected;
> +
> + connected = test_bit(XPRT_CONNECTED, &xprt->state);
> + smp_rmb();
> +
> + return connected;
> }

NACK. If you need a memory barrier, put it in the UDP case only (and
justify why). The TCP case sets and clears the above in the
connect/disconnect softirqs and so does not require them.

I certainly see no reason whatsoever for adding memory barriers to
xprt_connected() or xprt_clear_connected().

> static inline int xprt_test_and_set_connected(struct rpc_xprt *xprt)
> Index: linux-2.6.27.15-2/net/sunrpc/xprtsock.c
> ===================================================================
> --- linux-2.6.27.15-2.orig/net/sunrpc/xprtsock.c
> +++ linux-2.6.27.15-2/net/sunrpc/xprtsock.c
> @@ -1512,14 +1512,13 @@ static void xs_udp_finish_connecting(str
> sk->sk_no_check = UDP_CSUM_NORCV;
> sk->sk_allocation = GFP_ATOMIC;
>
> - xprt_set_connected(xprt);
> -
> /* Reset to new socket */
> transport->sock = sock;
> transport->inet = sk;
>
> xs_set_memalloc(xprt);
>
> + xprt_set_connected(xprt);
> write_unlock_bh(&sk->sk_callback_lock);
> }
> xs_udp_do_set_buffer_size(xprt);

Please move that call to xprt_set_connected() outside the if statement.
We want to set the connected flag unconditionally here.

Trond
--
Trond Myklebust
Linux NFS client maintainer

NetApp
[email protected]
http://www.netapp.com

2009-02-28 01:34:59

by Aaron Straus

[permalink] [raw]
Subject: Re: [PATCH] sunrpc: xprt is not connected until after sock is set

Hi Trond,

On Feb 27 07:37 PM, Trond Myklebust wrote:
> NACK. If you need a memory barrier, put it in the UDP case only (and
> justify why). The TCP case sets and clears the above in the
> connect/disconnect softirqs and so does not require them.
>
> I certainly see no reason whatsoever for adding memory barriers to
> xprt_connected() or xprt_clear_connected().
>
> > static inline int xprt_test_and_set_connected(struct rpc_xprt *xprt)
> > Index: linux-2.6.27.15-2/net/sunrpc/xprtsock.c
> > ===================================================================
> > --- linux-2.6.27.15-2.orig/net/sunrpc/xprtsock.c
> > +++ linux-2.6.27.15-2/net/sunrpc/xprtsock.c
> > @@ -1512,14 +1512,13 @@ static void xs_udp_finish_connecting(str
> > sk->sk_no_check = UDP_CSUM_NORCV;
> > sk->sk_allocation = GFP_ATOMIC;
> >
> > - xprt_set_connected(xprt);
> > -
> > /* Reset to new socket */
> > transport->sock = sock;
> > transport->inet = sk;
> >
> > xs_set_memalloc(xprt);
> >
> > + xprt_set_connected(xprt);
> > write_unlock_bh(&sk->sk_callback_lock);
> > }
> > xs_udp_do_set_buffer_size(xprt);
>
> Please move that call to xprt_set_connected() outside the if statement.
> We want to set the connected flag unconditionally here.

FYI, I've been digging a bit. This commit adds the clearing of the
bits:

commit b6ddf64ffe9d59577a9176856bb6fe69a539f573
Author: Trond Myklebust <[email protected]>
Date: Thu Apr 17 18:52:19 2008 -0400

SUNRPC: Fix up xprt_write_space()

<snip>

@@ -588,19 +603,20 @@ static int xs_udp_send_request(struct rpc_task *task)
}

switch (status) {
+ case -EAGAIN:
+ xs_nospace(task);
+ break;
case -ENETUNREACH:
case -EPIPE:
case -ECONNREFUSED:
/* When the server has died, an ICMP port unreachable message
* prompts ECONNREFUSED. */
- break;
- case -EAGAIN:
- xs_nospace(task);
+ clear_bit(SOCK_ASYNC_NOSPACE, &transport->sock->flags);
break;
default:
+ clear_bit(SOCK_ASYNC_NOSPACE, &transport->sock->flags);
dprintk("RPC: sendmsg returned unrecognized error %d\n",
-status);
- break;
}

return status;


This went in to 2.6.26 which was the first time we saw the bug
(2.6.26.3) on kerneloops.

I don't know if *this* is a bad commit or some other locking changed in
2.6.26 which tickles the bug.

Hope it helps. Have a good weekend all!

=a=

--
===================
Aaron Straus
aaron-bYFJunmd+ZV8UrSeD/[email protected]

2009-02-28 01:41:29

by Myklebust, Trond

[permalink] [raw]
Subject: Re: [PATCH] sunrpc: xprt is not connected until after sock is set

On Fri, 2009-02-27 at 17:34 -0800, Aaron Straus wrote:
> Hi Trond,
>
> On Feb 27 07:37 PM, Trond Myklebust wrote:
> > NACK. If you need a memory barrier, put it in the UDP case only (and
> > justify why). The TCP case sets and clears the above in the
> > connect/disconnect softirqs and so does not require them.
> >
> > I certainly see no reason whatsoever for adding memory barriers to
> > xprt_connected() or xprt_clear_connected().
> >
> > > static inline int xprt_test_and_set_connected(struct rpc_xprt *xprt)
> > > Index: linux-2.6.27.15-2/net/sunrpc/xprtsock.c
> > > ===================================================================
> > > --- linux-2.6.27.15-2.orig/net/sunrpc/xprtsock.c
> > > +++ linux-2.6.27.15-2/net/sunrpc/xprtsock.c
> > > @@ -1512,14 +1512,13 @@ static void xs_udp_finish_connecting(str
> > > sk->sk_no_check = UDP_CSUM_NORCV;
> > > sk->sk_allocation = GFP_ATOMIC;
> > >
> > > - xprt_set_connected(xprt);
> > > -
> > > /* Reset to new socket */
> > > transport->sock = sock;
> > > transport->inet = sk;
> > >
> > > xs_set_memalloc(xprt);
> > >
> > > + xprt_set_connected(xprt);
> > > write_unlock_bh(&sk->sk_callback_lock);
> > > }
> > > xs_udp_do_set_buffer_size(xprt);
> >
> > Please move that call to xprt_set_connected() outside the if statement.
> > We want to set the connected flag unconditionally here.
>
> FYI, I've been digging a bit. This commit adds the clearing of the
> bits:
>
> commit b6ddf64ffe9d59577a9176856bb6fe69a539f573
> Author: Trond Myklebust <[email protected]>
> Date: Thu Apr 17 18:52:19 2008 -0400
>
> SUNRPC: Fix up xprt_write_space()
>
> <snip>
>
> @@ -588,19 +603,20 @@ static int xs_udp_send_request(struct rpc_task *task)
> }
>
> switch (status) {
> + case -EAGAIN:
> + xs_nospace(task);
> + break;
> case -ENETUNREACH:
> case -EPIPE:
> case -ECONNREFUSED:
> /* When the server has died, an ICMP port unreachable message
> * prompts ECONNREFUSED. */
> - break;
> - case -EAGAIN:
> - xs_nospace(task);
> + clear_bit(SOCK_ASYNC_NOSPACE, &transport->sock->flags);
> break;
> default:
> + clear_bit(SOCK_ASYNC_NOSPACE, &transport->sock->flags);
> dprintk("RPC: sendmsg returned unrecognized error %d\n",
> -status);
> - break;
> }
>
> return status;
>
>
> This went in to 2.6.26 which was the first time we saw the bug
> (2.6.26.3) on kerneloops.
>
> I don't know if *this* is a bad commit or some other locking changed in
> 2.6.26 which tickles the bug.

It should be unrelated. If the client managed to get past the call to
xs_sendpages(), then the UDP socket must obviously exist already.

Can you show me the oops?

--
Trond Myklebust
Linux NFS client maintainer

NetApp
[email protected]
http://www.netapp.com

2009-02-28 04:57:39

by Aaron Straus

[permalink] [raw]
Subject: Re: [PATCH] sunrpc: xprt is not connected until after sock is set

Hi,

On Feb 27 08:40 PM, Trond Myklebust wrote:
> > This went in to 2.6.26 which was the first time we saw the bug
> > (2.6.26.3) on kerneloops.
> >
> > I don't know if *this* is a bad commit or some other locking changed in
> > 2.6.26 which tickles the bug.
>
> It should be unrelated. If the client managed to get past the call to
> xs_sendpages(), then the UDP socket must obviously exist already.
>
> Can you show me the oops?

Sure, please see the original e-mail below with the oops at the bottom.
Also if you follow this link:

http://www.kerneloops.org/search.php?search=xs_udp_send_request&btnG=Function+Search

you can see other oops (with other stack traces) on kerneloops.

Thanks for your time!!

=a=




----

From: Aaron Straus <aaron-bYFJunmd+ZV8UrSeD/[email protected]>
Subject: BUG NULL pointer dereference in SUNRPC xs_udp_send_request
Message-ID: <20090223201108.GB3308-bYFJunmd+ZV8UrSeD/[email protected]>

Hi,

We received the trace below on one of our machines this weekend. The
machine is running vanilla 2.6.27.14.

If I'm reading the trace correctly, it looks like this line of
xs_udp_send_request:

clear_bit(SOCK_ASYNC_NOSPACE, &transport->sock->flags);

The machine is x86 (32-bit).

Please let me know if you need anything else e.g .config or full
dmesg.

Thanks for your time.

=a=


--
BUG: unable to handle kernel NULL pointer dereference at 00000008
IP: [<f89fbf5e>] :sunrpc:xs_udp_send_request+0xa3/0xca
*pdpt = 0000000035b6d001 *pde = 0000000000000000
Oops: 0002 [#1] PREEMPT SMP
Modules linked in: nfs lockd autofs4 sunrpc ohci_hcd usbcore tg3 libphy [last unloaded: x_tables]

Pid: 1870, comm: mount Not tainted (2.6.27.14-asb #104)
EIP: 0060:[<f89fbf5e>] EFLAGS: 00010202 CPU: 2
EIP is at xs_udp_send_request+0xa3/0xca [sunrpc]
EAX: 00000000 EBX: ffffff95 ECX: 00000010 EDX: 00000000
ESI: f58ba000 EDI: f58f9000 EBP: f78df800 ESP: c66d1a80
DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Process mount (pid: 1870, ti=c66d0000 task=f78d4a20 task.ti=c66d0000)
Stack: f58ba004 f58ba000 f58ba054 f58ba0b4 f58f9000 f89f981a f78df800 f58f9290
f58ba074 c66d1b64 f58ba000 f78df800 f78df848 f89f83a3 f78df848 f78df800
00000000 f89fcbe2 f78df800 c66d1ae8 c66d1b04 c66d1bc0 f89f78a9 f63bf800
Call Trace:
[<f89f981a>] xprt_transmit+0xd3/0x1ad [sunrpc]
[<f89f83a3>] call_transmit+0x1b0/0x1e6 [sunrpc]
[<f89fcbe2>] __rpc_execute+0x63/0x1e1 [sunrpc]
[<f89f78a9>] rpc_run_task+0x3e/0x45 [sunrpc]
[<f89f7939>] rpc_call_sync+0x38/0x52 [sunrpc]
[<f8a02408>] rpcb_register_call+0x85/0xc7 [sunrpc]
[<f8a024f7>] rpcb_register+0xad/0xb5 [sunrpc]
[<f89fe43e>] svc_register+0xb0/0x12a [sunrpc]
[<f89ff7b2>] svc_setup_socket+0x6b/0x242 [sunrpc]
[<f89ffbde>] svc_create_socket+0x255/0x2bd [sunrpc]
[<f89f7b5d>] rpc_shutdown_client+0xa2/0xaa [sunrpc]
[<f8a02411>] rpcb_register_call+0x8e/0xc7 [sunrpc]
[<f89fc7d4>] rpc_wait_bit_killable+0x0/0x2a [sunrpc]
[<f8a024f7>] rpcb_register+0xad/0xb5 [sunrpc]
[<f89ffc57>] svc_tcp_create+0x11/0x14 [sunrpc]
[<f8a07055>] svc_create_xprt+0xd8/0x18c [sunrpc]
[<f8891f45>] make_socks+0x85/0xc2 [lockd]
[<f889216f>] lockd_up+0x80/0x145 [lockd]
[<f8890384>] nlmclnt_init+0x1c/0x4e [lockd]
[<f8a1d80b>] nfs_start_lockd+0x5f/0x7d [nfs]
[<f8a1e1dc>] nfs_create_server+0x6d9/0x9bd [nfs]
[<c02a9cd7>] sock_common_recvmsg+0x2f/0x45
[<c02a8784>] sock_recvmsg+0xc8/0xe3
[<c02a8857>] sock_sendmsg+0xb8/0xd1
[<c0114407>] kmap_atomic+0x11/0x14
[<c0143fa3>] get_page_from_freelist+0x352/0x3ca
[<c01bf5b6>] idr_get_empty_slot+0x150/0x225
[<f8a25518>] nfs_get_sb+0x651/0x87d [nfs]
[<c0148e3a>] kstrdup+0x27/0x48
[<c015f9cb>] vfs_kern_mount+0x39/0x72
[<c015fa42>] do_kern_mount+0x2f/0xb4
[<c017166f>] do_new_mount+0x55/0x89
[<c017181f>] do_mount+0x17c/0x19b
[<c030e3da>] error_code+0x72/0x78
[<c016fa36>] copy_mount_options+0x78/0x10d
[<c01718ab>] sys_mount+0x6d/0xaa
[<c0102cb9>] sysenter_do_call+0x12/0x25
=======================
Code: 83 fb e0 74 13 83 fb f5 75 1b 89 e8 e8 6c e7 ff ff bb f5 ff ff ff eb 32 8b 87 f4 02 00 00 f0 80 60 08 fe eb 25 8b 87 f4 02 00 00 <f0> 80 60 08 fe 80 3d 68 b8 a1 f8 00 79 11 89 d8 f7 d8 50 68 fa
EIP: [<f89fbf5e>] xs_udp_send_request+0xa3/0xca [sunrpc] SS:ESP 0068:c66d1a80
---[ end trace a09733af92e06ec6 ]---

--
===================
Aaron Straus
aaron-bYFJunmd+ZV8UrSeD/[email protected]


2009-02-28 05:07:09

by Aaron Straus

[permalink] [raw]
Subject: Re: [PATCH] sunrpc: xprt is not connected until after sock is set

On Feb 27 08:40 PM, Trond Myklebust wrote:
> > > > static inline int xprt_test_and_set_connected(struct rpc_xprt *xprt)
> > > > Index: linux-2.6.27.15-2/net/sunrpc/xprtsock.c
> > > > ===================================================================
> > > > --- linux-2.6.27.15-2.orig/net/sunrpc/xprtsock.c
> > > > +++ linux-2.6.27.15-2/net/sunrpc/xprtsock.c
> > > > @@ -1512,14 +1512,13 @@ static void xs_udp_finish_connecting(str
> > > > sk->sk_no_check = UDP_CSUM_NORCV;
> > > > sk->sk_allocation = GFP_ATOMIC;
> > > >
> > > > - xprt_set_connected(xprt);
> > > > -
> > > > /* Reset to new socket */
> > > > transport->sock = sock;
> > > > transport->inet = sk;
> > > >
> > > > xs_set_memalloc(xprt);
> > > >
> > > > + xprt_set_connected(xprt);
> > > > write_unlock_bh(&sk->sk_callback_lock);
> > > > }
> > > > xs_udp_do_set_buffer_size(xprt);
> > >
> >
> > @@ -588,19 +603,20 @@ static int xs_udp_send_request(struct rpc_task *task)
> > }
> >
> > switch (status) {
> > + case -EAGAIN:
> > + xs_nospace(task);
> > + break;
> > case -ENETUNREACH:
> > case -EPIPE:
> > case -ECONNREFUSED:
> > /* When the server has died, an ICMP port unreachable message
> > * prompts ECONNREFUSED. */
> > - break;
> > - case -EAGAIN:
> > - xs_nospace(task);
> > + clear_bit(SOCK_ASYNC_NOSPACE, &transport->sock->flags);
> > break;
> > default:
> > + clear_bit(SOCK_ASYNC_NOSPACE, &transport->sock->flags);
> > dprintk("RPC: sendmsg returned unrecognized error %d\n",
> > -status);
> > - break;
> > }
> >
> > return status;
> >
> > This went in to 2.6.26 which was the first time we saw the bug
> > (2.6.26.3) on kerneloops.
> >
> > I don't know if *this* is a bad commit or some other locking changed in
> > 2.6.26 which tickles the bug.
>
> It should be unrelated. If the client managed to get past the call to
> xs_sendpages(), then the UDP socket must obviously exist already.

PS just to repeat the previous discussion.

It seems like xs_sendpages does check if sock is NULL and returns
-ENOTCONN in that case.

The problem is that now in xs_udp_send_request falls into the default:
section of that switch statement and tries to do the:

> > + clear_bit(SOCK_ASYNC_NOSPACE, &transport->sock->flags);

but sock is NULL, so I think that's the oops.


Anyway, that's why Ben thinks we need to move xprt_set_connected() down
(past the setting of transport->sock) in xs_udp_finish_connecting().

I was worried without a barrier xprt_set_connected() could just move
back up before the setting of transport->sock again either by compiler
or CPU reordering.

Anyway that's where we are....

thanks!

=a=




--
===================
Aaron Straus
aaron-bYFJunmd+ZV8UrSeD/[email protected]

2009-02-28 18:09:57

by Myklebust, Trond

[permalink] [raw]
Subject: Re: [PATCH] sunrpc: xprt is not connected until after sock is set

On Fri, 2009-02-27 at 21:07 -0800, Aaron Straus wrote:
> On Feb 27 08:40 PM, Trond Myklebust wrote:
> > > > > static inline int xprt_test_and_set_connected(struct rpc_xprt *xprt)
> > > > > Index: linux-2.6.27.15-2/net/sunrpc/xprtsock.c
> > > > > ===================================================================
> > > > > --- linux-2.6.27.15-2.orig/net/sunrpc/xprtsock.c
> > > > > +++ linux-2.6.27.15-2/net/sunrpc/xprtsock.c
> > > > > @@ -1512,14 +1512,13 @@ static void xs_udp_finish_connecting(str
> > > > > sk->sk_no_check = UDP_CSUM_NORCV;
> > > > > sk->sk_allocation = GFP_ATOMIC;
> > > > >
> > > > > - xprt_set_connected(xprt);
> > > > > -
> > > > > /* Reset to new socket */
> > > > > transport->sock = sock;
> > > > > transport->inet = sk;
> > > > >
> > > > > xs_set_memalloc(xprt);
> > > > >
> > > > > + xprt_set_connected(xprt);
> > > > > write_unlock_bh(&sk->sk_callback_lock);
> > > > > }
> > > > > xs_udp_do_set_buffer_size(xprt);
> > > >
> > >
> > > @@ -588,19 +603,20 @@ static int xs_udp_send_request(struct rpc_task *task)
> > > }
> > >
> > > switch (status) {
> > > + case -EAGAIN:
> > > + xs_nospace(task);
> > > + break;
> > > case -ENETUNREACH:
> > > case -EPIPE:
> > > case -ECONNREFUSED:
> > > /* When the server has died, an ICMP port unreachable message
> > > * prompts ECONNREFUSED. */
> > > - break;
> > > - case -EAGAIN:
> > > - xs_nospace(task);
> > > + clear_bit(SOCK_ASYNC_NOSPACE, &transport->sock->flags);
> > > break;
> > > default:
> > > + clear_bit(SOCK_ASYNC_NOSPACE, &transport->sock->flags);
> > > dprintk("RPC: sendmsg returned unrecognized error %d\n",
> > > -status);
> > > - break;
> > > }
> > >
> > > return status;
> > >
> > > This went in to 2.6.26 which was the first time we saw the bug
> > > (2.6.26.3) on kerneloops.
> > >
> > > I don't know if *this* is a bad commit or some other locking changed in
> > > 2.6.26 which tickles the bug.
> >
> > It should be unrelated. If the client managed to get past the call to
> > xs_sendpages(), then the UDP socket must obviously exist already.
>
> PS just to repeat the previous discussion.
>
> It seems like xs_sendpages does check if sock is NULL and returns
> -ENOTCONN in that case.
>
> The problem is that now in xs_udp_send_request falls into the default:
> section of that switch statement and tries to do the:
>
> > > + clear_bit(SOCK_ASYNC_NOSPACE, &transport->sock->flags);
>
> but sock is NULL, so I think that's the oops.

OK, that makes a lot of sense. We should definitely fix up both
xs_tcp_send_request() and xs_udp_send_request() to deal correctly with
that error.

--
Trond Myklebust
Linux NFS client maintainer

NetApp
[email protected]
http://www.netapp.com

2009-03-02 16:34:04

by Ben Myers

[permalink] [raw]
Subject: Re: [PATCH] sunrpc: xprt is not connected until after sock is set

Hi Trond,

On Sat, Feb 28, 2009 at 01:09:28PM -0500, Trond Myklebust wrote:
> On Fri, 2009-02-27 at 21:07 -0800, Aaron Straus wrote:
> > It seems like xs_sendpages does check if sock is NULL and returns
> > -ENOTCONN in that case.

Here's a trace that shows the above:

Starting kernel based NFS server: idmapd mountd statd nfsdrpc.nfsd[4502]: NaT co
nsumption 17179869216 [1]
Modules linked in: nfsd lockd nfs_acl auth_rpcgss sunrpc exportfs ib_ipoib ib_cm
ib_sa ipv6 ib_uverbs ib_umad iw_cxgb3 cxgb3 mlx4_en mlx4_ib mlx4_core binfmt_mi
sc fuse dm_crypt crypto_blkcipher nls_iso8859_1 loop dm_mod sr_mod cdrom ib_mthc
a tg3 ib_mad shpchp sg ib_core button libphy pci_hotplug qla2xxx mptctl usbhid h
id ff_memless ohci_hcd ehci_hcd usbcore sd_mod crc_t10dif ide_generic mptfc scsi
_transport_fc scsi_tgt mptspi scsi_transport_spi qla1280 sata_vsc sgiioc4 ioc4 x
fs fan ide_pci_generic siimage ide_core mptsas mptscsih mptbase scsi_transport_s
as thermal processor thermal_sys hwmon pata_sil680 libata scsi_mod dock
Supported: Yes

Pid: 4502, CPU 2, comm: rpc.nfsd
psr : 0000101008526030 ifs : 800000000000040d ip : [<a0000002044acf30>] Not
tainted (2.6.27.15-2-default)
ip is at xs_udp_send_request+0x270/0x340 [sunrpc]
unat: 0000000000000000 pfs : 000000000000040d rsc : 0000000000000003
rnat: 0000000000000000 bsps: 0000000000000000 pr : aa99aaa6aa5aa999
ldrs: 0000000000000000 ccv : 0000000000000000 fpsr: 0009804c8a70033f
csd : 0000000000000000 ssd : 0000000000000000
b0 : a0000002044acd30 b6 : a0000002044accc0 b7 : a0000002044c8fe0
f6 : 1003e000000000000222e f7 : 1003e00000000000004e2
f8 : 1003e00000000000009c4 f9 : 1003e0000001f3eb1b6ce
f10 : 1003e93b691609417c0ec f11 : 1003e0000000000000013
r1 : a0000002044d9760 r2 : a000000204511ba0 r3 : a000000204511ba0
r8 : ffffffffffffff95 r9 : e0000060431825a8 r10 : a000000204511ce0
r11 : 0000000000000000 r12 : e0000060434cfc30 r13 : e0000060434c0000
r14 : 0000000000000000 r15 : e00000607bd79198 r16 : a0000002044e9fa8
r17 : e000006078fc6040 r18 : 0000000000000054 r19 : e000006043182668
r20 : ffffffffffffff95 r21 : 00000001000023ca r22 : e000006078fc6148
r23 : a000000100efa780 r24 : a000000204511ce0 r25 : 0000000000000000
r26 : e000006078fc613c r27 : e00000607bd795a8 r28 : 0000000000000000
r29 : e00000607bd796e0 r30 : 0000000000000000 r31 : e00000607b3dc000

Call Trace:
[<a000000100016a30>] show_stack+0x50/0xa0
sp=e0000060434cf790 bsp=e0000060434c1720
[<a0000001000172a0>] show_regs+0x820/0x860
sp=e0000060434cf960 bsp=e0000060434c16c8
[<a0000001000266a0>] die+0x1a0/0x2e0
sp=e0000060434cf960 bsp=e0000060434c1688
[<a000000100026830>] die_if_kernel+0x50/0x80
sp=e0000060434cf960 bsp=e0000060434c1658
[<a0000001006e79a0>] ia64_fault+0xac0/0xb60
sp=e0000060434cf960 bsp=e0000060434c1610
[<a00000010000c7a0>] ia64_native_leave_kernel+0x0/0x270
sp=e0000060434cfa60 bsp=e0000060434c1610
[<a0000002044acf30>] xs_udp_send_request+0x270/0x340 [sunrpc]
sp=e0000060434cfc30 bsp=e0000060434c15a0
[<a0000002044a9a20>] xprt_transmit+0x3a0/0x620 [sunrpc]
sp=e0000060434cfc30 bsp=e0000060434c1560
[<a0000002044a28a0>] call_transmit+0x600/0x720 [sunrpc]
sp=e0000060434cfc30 bsp=e0000060434c1520
[<a0000002044b5690>] __rpc_execute+0x1d0/0x7a0 [sunrpc]
sp=e0000060434cfc30 bsp=e0000060434c14a0
[<a0000002044b5ce0>] rpc_execute+0x80/0xa0 [sunrpc]
sp=e0000060434cfc30 bsp=e0000060434c1480
[<a0000002044a43d0>] rpc_run_task+0xf0/0x120 [sunrpc]
sp=e0000060434cfc30 bsp=e0000060434c1460
[<a0000002044a46e0>] rpc_call_sync+0xe0/0x160 [sunrpc]
sp=e0000060434cfc30 bsp=e0000060434c1430
[<a0000002044ca500>] rpcb_register_call+0x120/0x220 [sunrpc]
sp=e0000060434cfc70 bsp=e0000060434c13e8
[<a0000002044ca7b0>] rpcb_register+0x1b0/0x1e0 [sunrpc]
sp=e0000060434cfcc0 bsp=e0000060434c1398
[<a0000002044bd200>] svc_register+0x1e0/0x360 [sunrpc]
sp=e0000060434cfd20 bsp=e0000060434c1310
[<a0000002044c0c50>] svc_setup_socket+0x150/0x9a0 [sunrpc]
sp=e0000060434cfd30 bsp=e0000060434c12c0
[<a0000002044c3e10>] svc_addsock+0x1d0/0x480 [sunrpc]
sp=e0000060434cfd40 bsp=e0000060434c1270
[<a000000204b52550>] write_ports+0x1f0/0x6c0 [nfsd]
sp=e0000060434cfdd0 bsp=e0000060434c1228
[<a000000204b54270>] nfsctl_transaction_write+0xf0/0x1c0 [nfsd]
sp=e0000060434cfe20 bsp=e0000060434c11e8
[<a0000001001b6a90>] vfs_write+0x1b0/0x360
sp=e0000060434cfe20 bsp=e0000060434c1198
[<a0000001001b6de0>] sys_write+0x80/0x100
sp=e0000060434cfe20 bsp=e0000060434c1120
[<a00000010000c600>] ia64_ret_from_syscall+0x0/0x20
sp=e0000060434cfe30 bsp=e0000060434c1120
[<a000000000010720>] __kernel_syscall_via_break+0x0/0x20
sp=e0000060434d0000 bsp=e0000060434c1120

If I read it correctly the xs_sendpages return value is in r8, -ENOTCONN. We
had xs_udp_finish_connecting in rpciod racing with the above write.

> > The problem is that now in xs_udp_send_request falls into the default:
> > section of that switch statement and tries to do the:
> >
> > > > + clear_bit(SOCK_ASYNC_NOSPACE, &transport->sock->flags);
> >
> > but sock is NULL, so I think that's the oops.
>
> OK, that makes a lot of sense. We should definitely fix up both
> xs_tcp_send_request() and xs_udp_send_request() to deal correctly with
> that error.

With the patch as it stands I think we will not get into
xs_udp_send_request because xprt_prepare_transmit will have returned
-ENOTCONN and the rpc retried in __rpc_execute:

rpc_execute
__rpc_execute
call_transmit
| xprt_prepare_transmit
| if (!xprt_connected) return -ENOTCONN *here
| xprt_transmit
| xs_udp_send_request

It looks like we just need to protect sock_xprt->sock with respect to
the value of rpc_xprt->state, and Aaron suggested a barrier to do that.
Alternatively we could add a lock to rpc_xprt.

What is your recommendation?

Thanks!
Ben

2009-03-02 16:39:33

by Chuck Lever III

[permalink] [raw]
Subject: Re: [PATCH] sunrpc: xprt is not connected until after sock is set

On Mar 2, 2009, at Mar 2, 2009, 11:36 AM, Ben Myers wrote:
> Hi Trond,
>
> On Sat, Feb 28, 2009 at 01:09:28PM -0500, Trond Myklebust wrote:
>> On Fri, 2009-02-27 at 21:07 -0800, Aaron Straus wrote:
>>> It seems like xs_sendpages does check if sock is NULL and returns
>>> -ENOTCONN in that case.
>
> Here's a trace that shows the above:
>
> Starting kernel based NFS server: idmapd mountd statd
> nfsdrpc.nfsd[4502]: NaT co
> nsumption 17179869216 [1]
> Modules linked in: nfsd lockd nfs_acl auth_rpcgss sunrpc exportfs
> ib_ipoib ib_cm
> ib_sa ipv6 ib_uverbs ib_umad iw_cxgb3 cxgb3 mlx4_en mlx4_ib
> mlx4_core binfmt_mi
> sc fuse dm_crypt crypto_blkcipher nls_iso8859_1 loop dm_mod sr_mod
> cdrom ib_mthc
> a tg3 ib_mad shpchp sg ib_core button libphy pci_hotplug qla2xxx
> mptctl usbhid h
> id ff_memless ohci_hcd ehci_hcd usbcore sd_mod crc_t10dif
> ide_generic mptfc scsi
> _transport_fc scsi_tgt mptspi scsi_transport_spi qla1280 sata_vsc
> sgiioc4 ioc4 x
> fs fan ide_pci_generic siimage ide_core mptsas mptscsih mptbase
> scsi_transport_s
> as thermal processor thermal_sys hwmon pata_sil680 libata scsi_mod
> dock
> Supported: Yes
>
> Pid: 4502, CPU 2, comm: rpc.nfsd
> psr : 0000101008526030 ifs : 800000000000040d ip :
> [<a0000002044acf30>] Not
> tainted (2.6.27.15-2-default)
> ip is at xs_udp_send_request+0x270/0x340 [sunrpc]
> unat: 0000000000000000 pfs : 000000000000040d rsc : 0000000000000003
> rnat: 0000000000000000 bsps: 0000000000000000 pr : aa99aaa6aa5aa999
> ldrs: 0000000000000000 ccv : 0000000000000000 fpsr: 0009804c8a70033f
> csd : 0000000000000000 ssd : 0000000000000000
> b0 : a0000002044acd30 b6 : a0000002044accc0 b7 : a0000002044c8fe0
> f6 : 1003e000000000000222e f7 : 1003e00000000000004e2
> f8 : 1003e00000000000009c4 f9 : 1003e0000001f3eb1b6ce
> f10 : 1003e93b691609417c0ec f11 : 1003e0000000000000013
> r1 : a0000002044d9760 r2 : a000000204511ba0 r3 : a000000204511ba0
> r8 : ffffffffffffff95 r9 : e0000060431825a8 r10 : a000000204511ce0
> r11 : 0000000000000000 r12 : e0000060434cfc30 r13 : e0000060434c0000
> r14 : 0000000000000000 r15 : e00000607bd79198 r16 : a0000002044e9fa8
> r17 : e000006078fc6040 r18 : 0000000000000054 r19 : e000006043182668
> r20 : ffffffffffffff95 r21 : 00000001000023ca r22 : e000006078fc6148
> r23 : a000000100efa780 r24 : a000000204511ce0 r25 : 0000000000000000
> r26 : e000006078fc613c r27 : e00000607bd795a8 r28 : 0000000000000000
> r29 : e00000607bd796e0 r30 : 0000000000000000 r31 : e00000607b3dc000
>
> Call Trace:
> [<a000000100016a30>] show_stack+0x50/0xa0
> sp=e0000060434cf790
> bsp=e0000060434c1720
> [<a0000001000172a0>] show_regs+0x820/0x860
> sp=e0000060434cf960
> bsp=e0000060434c16c8
> [<a0000001000266a0>] die+0x1a0/0x2e0
> sp=e0000060434cf960
> bsp=e0000060434c1688
> [<a000000100026830>] die_if_kernel+0x50/0x80
> sp=e0000060434cf960
> bsp=e0000060434c1658
> [<a0000001006e79a0>] ia64_fault+0xac0/0xb60
> sp=e0000060434cf960
> bsp=e0000060434c1610
> [<a00000010000c7a0>] ia64_native_leave_kernel+0x0/0x270
> sp=e0000060434cfa60
> bsp=e0000060434c1610
> [<a0000002044acf30>] xs_udp_send_request+0x270/0x340 [sunrpc]
> sp=e0000060434cfc30
> bsp=e0000060434c15a0
> [<a0000002044a9a20>] xprt_transmit+0x3a0/0x620 [sunrpc]
> sp=e0000060434cfc30
> bsp=e0000060434c1560
> [<a0000002044a28a0>] call_transmit+0x600/0x720 [sunrpc]
> sp=e0000060434cfc30
> bsp=e0000060434c1520
> [<a0000002044b5690>] __rpc_execute+0x1d0/0x7a0 [sunrpc]
> sp=e0000060434cfc30
> bsp=e0000060434c14a0
> [<a0000002044b5ce0>] rpc_execute+0x80/0xa0 [sunrpc]
> sp=e0000060434cfc30
> bsp=e0000060434c1480
> [<a0000002044a43d0>] rpc_run_task+0xf0/0x120 [sunrpc]
> sp=e0000060434cfc30
> bsp=e0000060434c1460
> [<a0000002044a46e0>] rpc_call_sync+0xe0/0x160 [sunrpc]
> sp=e0000060434cfc30
> bsp=e0000060434c1430
> [<a0000002044ca500>] rpcb_register_call+0x120/0x220 [sunrpc]
> sp=e0000060434cfc70
> bsp=e0000060434c13e8
> [<a0000002044ca7b0>] rpcb_register+0x1b0/0x1e0 [sunrpc]
> sp=e0000060434cfcc0
> bsp=e0000060434c1398
> [<a0000002044bd200>] svc_register+0x1e0/0x360 [sunrpc]
> sp=e0000060434cfd20
> bsp=e0000060434c1310
> [<a0000002044c0c50>] svc_setup_socket+0x150/0x9a0 [sunrpc]
> sp=e0000060434cfd30
> bsp=e0000060434c12c0
> [<a0000002044c3e10>] svc_addsock+0x1d0/0x480 [sunrpc]
> sp=e0000060434cfd40
> bsp=e0000060434c1270
> [<a000000204b52550>] write_ports+0x1f0/0x6c0 [nfsd]
> sp=e0000060434cfdd0
> bsp=e0000060434c1228
> [<a000000204b54270>] nfsctl_transaction_write+0xf0/0x1c0 [nfsd]
> sp=e0000060434cfe20
> bsp=e0000060434c11e8
> [<a0000001001b6a90>] vfs_write+0x1b0/0x360
> sp=e0000060434cfe20
> bsp=e0000060434c1198
> [<a0000001001b6de0>] sys_write+0x80/0x100
> sp=e0000060434cfe20
> bsp=e0000060434c1120
> [<a00000010000c600>] ia64_ret_from_syscall+0x0/0x20
> sp=e0000060434cfe30
> bsp=e0000060434c1120
> [<a000000000010720>] __kernel_syscall_via_break+0x0/0x20
> sp=e0000060434d0000
> bsp=e0000060434c1120
>
> If I read it correctly the xs_sendpages return value is in r8, -
> ENOTCONN. We
> had xs_udp_finish_connecting in rpciod racing with the above write.

I had a patch that addressed a similar race in the UDP connect logic.
Trond, did you get that one when I sent it before Connectathon?

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com