2009-02-23 20:33:00

by Aaron Straus

[permalink] [raw]
Subject: BUG NULL pointer dereference in SUNRPC xs_udp_send_request

Hi,

We received the trace below on one of our machines this weekend. The
machine is running vanilla 2.6.27.14.

If I'm reading the trace correctly, it looks like this line of
xs_udp_send_request:

clear_bit(SOCK_ASYNC_NOSPACE, &transport->sock->flags);

The machine is x86 (32-bit).

Please let me know if you need anything else e.g .config or full
dmesg.

Thanks for your time.

=a=


--
BUG: unable to handle kernel NULL pointer dereference at 00000008
IP: [<f89fbf5e>] :sunrpc:xs_udp_send_request+0xa3/0xca
*pdpt = 0000000035b6d001 *pde = 0000000000000000
Oops: 0002 [#1] PREEMPT SMP
Modules linked in: nfs lockd autofs4 sunrpc ohci_hcd usbcore tg3 libphy [last unloaded: x_tables]

Pid: 1870, comm: mount Not tainted (2.6.27.14-asb #104)
EIP: 0060:[<f89fbf5e>] EFLAGS: 00010202 CPU: 2
EIP is at xs_udp_send_request+0xa3/0xca [sunrpc]
EAX: 00000000 EBX: ffffff95 ECX: 00000010 EDX: 00000000
ESI: f58ba000 EDI: f58f9000 EBP: f78df800 ESP: c66d1a80
DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Process mount (pid: 1870, ti=c66d0000 task=f78d4a20 task.ti=c66d0000)
Stack: f58ba004 f58ba000 f58ba054 f58ba0b4 f58f9000 f89f981a f78df800 f58f9290
f58ba074 c66d1b64 f58ba000 f78df800 f78df848 f89f83a3 f78df848 f78df800
00000000 f89fcbe2 f78df800 c66d1ae8 c66d1b04 c66d1bc0 f89f78a9 f63bf800
Call Trace:
[<f89f981a>] xprt_transmit+0xd3/0x1ad [sunrpc]
[<f89f83a3>] call_transmit+0x1b0/0x1e6 [sunrpc]
[<f89fcbe2>] __rpc_execute+0x63/0x1e1 [sunrpc]
[<f89f78a9>] rpc_run_task+0x3e/0x45 [sunrpc]
[<f89f7939>] rpc_call_sync+0x38/0x52 [sunrpc]
[<f8a02408>] rpcb_register_call+0x85/0xc7 [sunrpc]
[<f8a024f7>] rpcb_register+0xad/0xb5 [sunrpc]
[<f89fe43e>] svc_register+0xb0/0x12a [sunrpc]
[<f89ff7b2>] svc_setup_socket+0x6b/0x242 [sunrpc]
[<f89ffbde>] svc_create_socket+0x255/0x2bd [sunrpc]
[<f89f7b5d>] rpc_shutdown_client+0xa2/0xaa [sunrpc]
[<f8a02411>] rpcb_register_call+0x8e/0xc7 [sunrpc]
[<f89fc7d4>] rpc_wait_bit_killable+0x0/0x2a [sunrpc]
[<f8a024f7>] rpcb_register+0xad/0xb5 [sunrpc]
[<f89ffc57>] svc_tcp_create+0x11/0x14 [sunrpc]
[<f8a07055>] svc_create_xprt+0xd8/0x18c [sunrpc]
[<f8891f45>] make_socks+0x85/0xc2 [lockd]
[<f889216f>] lockd_up+0x80/0x145 [lockd]
[<f8890384>] nlmclnt_init+0x1c/0x4e [lockd]
[<f8a1d80b>] nfs_start_lockd+0x5f/0x7d [nfs]
[<f8a1e1dc>] nfs_create_server+0x6d9/0x9bd [nfs]
[<c02a9cd7>] sock_common_recvmsg+0x2f/0x45
[<c02a8784>] sock_recvmsg+0xc8/0xe3
[<c02a8857>] sock_sendmsg+0xb8/0xd1
[<c0114407>] kmap_atomic+0x11/0x14
[<c0143fa3>] get_page_from_freelist+0x352/0x3ca
[<c01bf5b6>] idr_get_empty_slot+0x150/0x225
[<f8a25518>] nfs_get_sb+0x651/0x87d [nfs]
[<c0148e3a>] kstrdup+0x27/0x48
[<c015f9cb>] vfs_kern_mount+0x39/0x72
[<c015fa42>] do_kern_mount+0x2f/0xb4
[<c017166f>] do_new_mount+0x55/0x89
[<c017181f>] do_mount+0x17c/0x19b
[<c030e3da>] error_code+0x72/0x78
[<c016fa36>] copy_mount_options+0x78/0x10d
[<c01718ab>] sys_mount+0x6d/0xaa
[<c0102cb9>] sysenter_do_call+0x12/0x25
=======================
Code: 83 fb e0 74 13 83 fb f5 75 1b 89 e8 e8 6c e7 ff ff bb f5 ff ff ff eb 32 8b 87 f4 02 00 00 f0 80 60 08 fe eb 25 8b 87 f4 02 00 00 <f0> 80 60 08 fe 80 3d 68 b8 a1 f8 00 79 11 89 d8 f7 d8 50 68 fa
EIP: [<f89fbf5e>] xs_udp_send_request+0xa3/0xca [sunrpc] SS:ESP 0068:c66d1a80
---[ end trace a09733af92e06ec6 ]---

--
===================
Aaron Straus
aaron-bYFJunmd+ZV8UrSeD/[email protected]


2009-02-25 02:37:08

by Ben Myers

[permalink] [raw]
Subject: Re: BUG NULL pointer dereference in SUNRPC xs_udp_send_request

Hi Aaron,

On Mon, Feb 23, 2009 at 12:11:09PM -0800, Aaron Straus wrote:
> We received the trace below on one of our machines this weekend. The
> machine is running vanilla 2.6.27.14.
>
> If I'm reading the trace correctly, it looks like this line of
> xs_udp_send_request:
>
> clear_bit(SOCK_ASYNC_NOSPACE, &transport->sock->flags);
>
> The machine is x86 (32-bit).
>
> Please let me know if you need anything else e.g .config or full
> dmesg.

That's a coincidence. I looked at a similar bug today that crashed on
the same line but a different stack. My suggestion is:

Index: linux/net/sunrpc/xprtsock.c
===================================================================
--- linux.orig/net/sunrpc/xprtsock.c
+++ linux/net/sunrpc/xprtsock.c
@@ -1512,14 +1512,13 @@ static void xs_udp_finish_connecting(str
sk->sk_no_check = UDP_CSUM_NORCV;
sk->sk_allocation = GFP_ATOMIC;

- xprt_set_connected(xprt);
-
/* Reset to new socket */
transport->sock = sock;
transport->inet = sk;

xs_set_memalloc(xprt);

+ xprt_set_connected(xprt);
write_unlock_bh(&sk->sk_callback_lock);
}
xs_udp_do_set_buffer_size(xprt);

Looks like xs_sendpages() returned -ENOTCONN. The above should sort
that out by returning earlier in xprt_prepare_transmit() and the rpc
would be retried by __rpc_execute().

-ben

2009-02-26 00:17:47

by Aaron Straus

[permalink] [raw]
Subject: Re: BUG NULL pointer dereference in SUNRPC xs_udp_send_request

Hi Ben,

Thanks for the response.

On Feb 24 08:39 PM, Ben Myers wrote:
> > If I'm reading the trace correctly, it looks like this line of
> > xs_udp_send_request:
> >
> > clear_bit(SOCK_ASYNC_NOSPACE, &transport->sock->flags);
>
> That's a coincidence. I looked at a similar bug today that crashed on
> the same line but a different stack. My suggestion is:
>
> Index: linux/net/sunrpc/xprtsock.c
> ===================================================================
> --- linux.orig/net/sunrpc/xprtsock.c
> +++ linux/net/sunrpc/xprtsock.c
> @@ -1512,14 +1512,13 @@ static void xs_udp_finish_connecting(str
> sk->sk_no_check = UDP_CSUM_NORCV;
> sk->sk_allocation = GFP_ATOMIC;
>
> - xprt_set_connected(xprt);
> -
> /* Reset to new socket */
> transport->sock = sock;
> transport->inet = sk;
>
> xs_set_memalloc(xprt);
>
> + xprt_set_connected(xprt);
> write_unlock_bh(&sk->sk_callback_lock);
> }
> xs_udp_do_set_buffer_size(xprt);
>
> Looks like xs_sendpages() returned -ENOTCONN. The above should sort
> that out by returning earlier in xprt_prepare_transmit() and the rpc
> would be retried by __rpc_execute().

I'll start running with it tonight to see if I can trigger the BUG
again (it was hard to hit).

Quick question, do we need a barrier between setting the transport->sock
and the xprt_set_connected(xprt)? I don't really understand the locking
on the reader side, so I cannot say...

Also, out of curiosity, do you know what changed to introduce the BUG?

Kerneloops doesn't seem to know about it before 2.6.26.3:

http://www.kerneloops.org/search.php?search=xs_udp_send_request&btnG=Function+Search

Anyway, thanks!

=a=



--
===================
Aaron Straus
aaron-bYFJunmd+ZV8UrSeD/[email protected]