Subject: Re: [PATCH] xs_bind retry binding forever
Content-Type: text/plain; charset=us-ascii
From: Chuck Lever <chuck.lever@oracle.com>
In-Reply-To: <1287769536.6311.9.camel@heimdal.trondhjem.org>
Date: Fri, 22 Oct 2010 14:15:26 -0400
Cc: Linux NFS Mailing List <linux-nfs@vger.kernel.org>
Message-Id: <BDE2E840-86B2-4E2F-B80D-93E6B020E765@oracle.com>
References: <20101021183203.12776.28469.stgit@lady3jane.americas.sgi.com> <20101021183337.12776.18768.stgit@lady3jane.americas.sgi.com> <1287689917.9144.84.camel@heimdal.trondhjem.org> <48B51F5A-E060-4B0B-8AD6-1E4247C4289F@oracle.com> <1287769536.6311.9.camel@heimdal.trondhjem.org>
To: Trond Myklebust <Trond.Myklebust@netapp.com>, Ben Myers <bpm@sgi.com>
Sender: linux-nfs-owner@vger.kernel.org
MIME-Version: 1.0


On Oct 22, 2010, at 1:45 PM, Trond Myklebust wrote:

> On Fri, 2010-10-22 at 11:56 -0400, Chuck Lever wrote:
>> On Oct 21, 2010, at 3:38 PM, Trond Myklebust wrote:
>> 
>>> On Thu, 2010-10-21 at 13:33 -0500, Ben Myers wrote:
>>>> Retry bind for reserved source ports forever.  Add an error message when we
>>>> have a hard time binding one.
>>> 
>>> NACK. This approach leads to the process spinning forever in that loop,
>>> which is exactly why we introduced the limit in the first place. See all
>>> the old archived bug report emails about 'rpciod taking 100% cpu'.
>> 
>> The root problem seems to be the hard loop.  Thinking out loud, what if the client's FSM or some other higher up layer performed the retry, with a short delay inserted after each attempt?
> 
> The problem isn't only the hard loop. The reason why we return the
> EADDRINUSE is in order to allow quick failure of mounts and/or
> automounts when we can't bind the socket.
> 
> I suggest 2 changes:
> 
>     1. In case of error, pass the return value from xs_bind to the
>        pending tasks
>     2. Add a handler for EADDRINUSE in call_status(),
>        xprt_connect_status() and call_connect_status(). Make sure that
>        call_status() and call_connect_status() fail for SOFTCONN tasks,
>        and that they print an error message, delay and retry in the
>        case of ordinary hard tasks.

The thing is, though, we don't want mounts to fail in this case; that's the presenting problem Ben is trying to address.

This is not the same problem as SOFTCONN -- it's entirely one of how the local system allocates its own resources.  Thus, theoretically, it's one where it is possible for us to behave entirely predictably.  At its heart, our privileged port allocation mechanism is really an unfair way to allocate this resource, since there's no way to prevent starvation.

I don't know if this is within the realm of possibilities, but it would be nice, for example, if the MNT client and the rpcbind client could each hold onto a privileged TCP port (to prevent others from using it) and just re-use that port whenever a new request needs to be sent to any remote host.

It would be fun to use net namespaces to allocate a separate port range that no-one else could touch, but I don't think that's possible without a separate IP address.

Ben, our client already has the ability to use unprivileged ports for MNT, as long as the server's mountd is configured to accept it.  That, plus stipulating mountproto=udp, may give you more relief.

-- 
chuck[dot]lever[at]oracle[dot]com