Return-Path: linux-nfs-owner@vger.kernel.org Received: from userp1040.oracle.com ([156.151.31.81]:44506 "EHLO userp1040.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S935493Ab3BTQUR convert rfc822-to-8bit (ORCPT ); Wed, 20 Feb 2013 11:20:17 -0500 Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 6.2 \(1499\)) Subject: Re: synchronous AF_LOCAL connect From: Chuck Lever In-Reply-To: <20130220160350.GJ14606@fieldses.org> Date: Wed, 20 Feb 2013 11:20:05 -0500 Cc: Trond Myklebust , linux-nfs@vger.kernel.org, simo@redhat.com Message-Id: References: <20130218225424.GD3391@fieldses.org> <20130220154751.GH14606@fieldses.org> <2F275139-9861-4414-8C9F-BD74544C9AD7@oracle.com> <20130220160350.GJ14606@fieldses.org> To: "J. Bruce Fields" Sender: linux-nfs-owner@vger.kernel.org List-ID: On Feb 20, 2013, at 11:03 AM, J. Bruce Fields wrote: > On Wed, Feb 20, 2013 at 10:56:54AM -0500, Chuck Lever wrote: >> >> On Feb 20, 2013, at 10:47 AM, "J. Bruce Fields" >> wrote: >> >>> On Mon, Feb 18, 2013 at 05:54:25PM -0500, bfields wrote: >>>> The rpc code expects all connects to be done asynchronously by a >>>> workqueue. But that doesn't seem necessary in the AF_LOCAL case. >>>> The only user (rpcbind) actually wants the connect done in the >>>> context of a process with the right namespace. (And that will be >>>> true of gss proxy too, which also wants to use AF_LOCAL.) >>>> >>>> But maybe I'm missing something. >>>> >>>> Also, I haven't really tried to understand this code--I just >>>> assumed I could basically call xs_local_setup_socket from ->setup >>>> instead of the workqueue, and that seems to work based on a very >>>> superficial test. At a minimum I guess the PF_FSTRANS fiddling >>>> shouldn't be there. >>> >>> Here it is with that and the other extraneous xprt stuff gone. >>> >>> See any problem with doing this? >> >> Nothing is screaming at me. As long as an AF_LOCAL connect operation >> doesn't ever sleep, this should be safe, I think. > > I'm sure it must sleep. Why would that make any difference? As I understand it, sometimes an ASYNC RPC task is driving the connect, and such a task must never sleep when calling outside of rpciod. rpciod must be allowed to put that task on a wait queue and go do other work if the connect operation doesn't succeed immediately, otherwise all ASYNC RPC operations hang (or worse, an oops occurs). >> How did you test it? > > I'm just doing my usual set of connectathon runs, and assuming mounts > would fail if the server's rpcbind registration failed. Have you tried killing rpcbind first to see how the error cases are handled? Does rpcbind get the registration's "owner" field correct when namespaces are involved? -- Chuck Lever chuck[dot]lever[at]oracle[dot]com