From: Trond Myklebust Subject: Re: [PATCH] sunrpc: xprt is not connected until after sock is set Date: Fri, 27 Feb 2009 20:40:37 -0500 Message-ID: <1235785237.20549.51.camel@heimdal.trondhjem.org> References: <20090223201108.GB3308@merfinllc.com> <20090225023900.GD15475@sgi.com> <20090226001744.GB7613@merfinllc.com> <20090227235434.GH15475@sgi.com> <1235781463.20549.33.camel@heimdal.trondhjem.org> <20090228013457.GF7706@merfinllc.com> Mime-Version: 1.0 Content-Type: text/plain Cc: Ben Myers , bfields@fieldses.org, linux-nfs@vger.kernel.org, neilb@suse.de To: Aaron Straus Return-path: Received: from mx2.netapp.com ([216.240.18.37]:38860 "EHLO mx2.netapp.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752777AbZB1Bl3 (ORCPT ); Fri, 27 Feb 2009 20:41:29 -0500 In-Reply-To: <20090228013457.GF7706-bYFJunmd+ZV8UrSeD/g0lQ@public.gmane.org> Sender: linux-nfs-owner@vger.kernel.org List-ID: On Fri, 2009-02-27 at 17:34 -0800, Aaron Straus wrote: > Hi Trond, > > On Feb 27 07:37 PM, Trond Myklebust wrote: > > NACK. If you need a memory barrier, put it in the UDP case only (and > > justify why). The TCP case sets and clears the above in the > > connect/disconnect softirqs and so does not require them. > > > > I certainly see no reason whatsoever for adding memory barriers to > > xprt_connected() or xprt_clear_connected(). > > > > > static inline int xprt_test_and_set_connected(struct rpc_xprt *xprt) > > > Index: linux-2.6.27.15-2/net/sunrpc/xprtsock.c > > > =================================================================== > > > --- linux-2.6.27.15-2.orig/net/sunrpc/xprtsock.c > > > +++ linux-2.6.27.15-2/net/sunrpc/xprtsock.c > > > @@ -1512,14 +1512,13 @@ static void xs_udp_finish_connecting(str > > > sk->sk_no_check = UDP_CSUM_NORCV; > > > sk->sk_allocation = GFP_ATOMIC; > > > > > > - xprt_set_connected(xprt); > > > - > > > /* Reset to new socket */ > > > transport->sock = sock; > > > transport->inet = sk; > > > > > > xs_set_memalloc(xprt); > > > > > > + xprt_set_connected(xprt); > > > write_unlock_bh(&sk->sk_callback_lock); > > > } > > > xs_udp_do_set_buffer_size(xprt); > > > > Please move that call to xprt_set_connected() outside the if statement. > > We want to set the connected flag unconditionally here. > > FYI, I've been digging a bit. This commit adds the clearing of the > bits: > > commit b6ddf64ffe9d59577a9176856bb6fe69a539f573 > Author: Trond Myklebust > Date: Thu Apr 17 18:52:19 2008 -0400 > > SUNRPC: Fix up xprt_write_space() > > > > @@ -588,19 +603,20 @@ static int xs_udp_send_request(struct rpc_task *task) > } > > switch (status) { > + case -EAGAIN: > + xs_nospace(task); > + break; > case -ENETUNREACH: > case -EPIPE: > case -ECONNREFUSED: > /* When the server has died, an ICMP port unreachable message > * prompts ECONNREFUSED. */ > - break; > - case -EAGAIN: > - xs_nospace(task); > + clear_bit(SOCK_ASYNC_NOSPACE, &transport->sock->flags); > break; > default: > + clear_bit(SOCK_ASYNC_NOSPACE, &transport->sock->flags); > dprintk("RPC: sendmsg returned unrecognized error %d\n", > -status); > - break; > } > > return status; > > > This went in to 2.6.26 which was the first time we saw the bug > (2.6.26.3) on kerneloops. > > I don't know if *this* is a bad commit or some other locking changed in > 2.6.26 which tickles the bug. It should be unrelated. If the client managed to get past the call to xs_sendpages(), then the UDP socket must obviously exist already. Can you show me the oops? -- Trond Myklebust Linux NFS client maintainer NetApp Trond.Myklebust@netapp.com www.netapp.com