2010-10-21 19:26:40

by Ben Myers

[permalink] [raw]
Subject: [PATCH] xs_bind retry binding forever

Hey Trond,

I've been working on a bug related to autofs where a nfs mount returns EIO
in the middle of a job at which point things start to fail. It turns out
the machine isn't running nscd and that is causing a large number of
reserved source ports to be used over localhost and they crowd out the nfs
client. Prior to commit 67a391d72ca7efb387c30ec761a487e50a3ff085 this was
not a problem because we retried forever until we got a reserved source
port. It's ok to say 'use nscd', but in many cases nscd is not turned on
by default so I'd like to propose we revert to the previous behavior: retry
binding forever.

Thanks,
Ben

---

Ben Myers (1):
xs_bind retry binding forever


net/sunrpc/xprtsock.c | 22 ++++++++++++++++++----
1 files changed, 18 insertions(+), 4 deletions(-)

--


2010-10-21 19:38:39

by Myklebust, Trond

[permalink] [raw]
Subject: Re: [PATCH] xs_bind retry binding forever

On Thu, 2010-10-21 at 13:33 -0500, Ben Myers wrote:
> Retry bind for reserved source ports forever. Add an error message when we
> have a hard time binding one.

NACK. This approach leads to the process spinning forever in that loop,
which is exactly why we introduced the limit in the first place. See all
the old archived bug report emails about 'rpciod taking 100% cpu'.

Cheers
Trond

> Signed-off-by: Ben Myers <[email protected]>
> ---
> net/sunrpc/xprtsock.c | 22 ++++++++++++++++++----
> 1 files changed, 18 insertions(+), 4 deletions(-)
>
> diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
> index b6309db..79a001b 100644
> --- a/net/sunrpc/xprtsock.c
> +++ b/net/sunrpc/xprtsock.c
> @@ -1560,9 +1560,16 @@ static int xs_bind4(struct sock_xprt *transport, struct socket *sock)
> }
> last = port;
> port = xs_next_srcport(transport, sock, port);
> - if (port > last)
> + if (port > last) {
> + if (nloop > 2 && net_ratelimit()) {
> + printk("RPC: %s %pI4: Cannot bind reserved "
> + "source port. Consider decreasing "
> + "min_resvport.\n",
> + __func__, &myaddr.sin_addr);
> + }
> nloop++;
> - } while (err == -EADDRINUSE && nloop != 2);
> + }
> + } while (err == -EADDRINUSE);
> dprintk("RPC: %s %pI4:%u: %s (%d)\n",
> __func__, &myaddr.sin_addr,
> port, err ? "failed" : "ok", err);
> @@ -1593,9 +1600,16 @@ static int xs_bind6(struct sock_xprt *transport, struct socket *sock)
> }
> last = port;
> port = xs_next_srcport(transport, sock, port);
> - if (port > last)
> + if (port > last) {
> + if (nloop > 2 && net_ratelimit()) {
> + printk("RPC: %s %pI6: Cannot bind reserved "
> + "source port. Consider decreasing "
> + "min_resvport.\n",
> + __func__, &myaddr.sin6_addr);
> + }
> nloop++;
> - } while (err == -EADDRINUSE && nloop != 2);
> + }
> + } while (err == -EADDRINUSE);
> dprintk("RPC: xs_bind6 %pI6:%u: %s (%d)\n",
> &myaddr.sin6_addr, port, err ? "failed" : "ok", err);
> return err;
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html



2010-10-22 17:45:37

by Myklebust, Trond

[permalink] [raw]
Subject: Re: [PATCH] xs_bind retry binding forever

On Fri, 2010-10-22 at 11:56 -0400, Chuck Lever wrote:
> On Oct 21, 2010, at 3:38 PM, Trond Myklebust wrote:
>
> > On Thu, 2010-10-21 at 13:33 -0500, Ben Myers wrote:
> >> Retry bind for reserved source ports forever. Add an error message when we
> >> have a hard time binding one.
> >
> > NACK. This approach leads to the process spinning forever in that loop,
> > which is exactly why we introduced the limit in the first place. See all
> > the old archived bug report emails about 'rpciod taking 100% cpu'.
>
> The root problem seems to be the hard loop. Thinking out loud, what if the client's FSM or some other higher up layer performed the retry, with a short delay inserted after each attempt?

The problem isn't only the hard loop. The reason why we return the
EADDRINUSE is in order to allow quick failure of mounts and/or
automounts when we can't bind the socket.

I suggest 2 changes:

1. In case of error, pass the return value from xs_bind to the
pending tasks
2. Add a handler for EADDRINUSE in call_status(),
xprt_connect_status() and call_connect_status(). Make sure that
call_status() and call_connect_status() fail for SOFTCONN tasks,
and that they print an error message, delay and retry in the
case of ordinary hard tasks.

Cheers
Trond

2010-10-21 19:26:44

by Ben Myers

[permalink] [raw]
Subject: [PATCH] xs_bind retry binding forever

Retry bind for reserved source ports forever. Add an error message when we
have a hard time binding one.

Signed-off-by: Ben Myers <[email protected]>
---
net/sunrpc/xprtsock.c | 22 ++++++++++++++++++----
1 files changed, 18 insertions(+), 4 deletions(-)

diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
index b6309db..79a001b 100644
--- a/net/sunrpc/xprtsock.c
+++ b/net/sunrpc/xprtsock.c
@@ -1560,9 +1560,16 @@ static int xs_bind4(struct sock_xprt *transport, struct socket *sock)
}
last = port;
port = xs_next_srcport(transport, sock, port);
- if (port > last)
+ if (port > last) {
+ if (nloop > 2 && net_ratelimit()) {
+ printk("RPC: %s %pI4: Cannot bind reserved "
+ "source port. Consider decreasing "
+ "min_resvport.\n",
+ __func__, &myaddr.sin_addr);
+ }
nloop++;
- } while (err == -EADDRINUSE && nloop != 2);
+ }
+ } while (err == -EADDRINUSE);
dprintk("RPC: %s %pI4:%u: %s (%d)\n",
__func__, &myaddr.sin_addr,
port, err ? "failed" : "ok", err);
@@ -1593,9 +1600,16 @@ static int xs_bind6(struct sock_xprt *transport, struct socket *sock)
}
last = port;
port = xs_next_srcport(transport, sock, port);
- if (port > last)
+ if (port > last) {
+ if (nloop > 2 && net_ratelimit()) {
+ printk("RPC: %s %pI6: Cannot bind reserved "
+ "source port. Consider decreasing "
+ "min_resvport.\n",
+ __func__, &myaddr.sin6_addr);
+ }
nloop++;
- } while (err == -EADDRINUSE && nloop != 2);
+ }
+ } while (err == -EADDRINUSE);
dprintk("RPC: xs_bind6 %pI6:%u: %s (%d)\n",
&myaddr.sin6_addr, port, err ? "failed" : "ok", err);
return err;


2010-10-22 15:57:42

by Chuck Lever III

[permalink] [raw]
Subject: Re: [PATCH] xs_bind retry binding forever


On Oct 21, 2010, at 3:38 PM, Trond Myklebust wrote:

> On Thu, 2010-10-21 at 13:33 -0500, Ben Myers wrote:
>> Retry bind for reserved source ports forever. Add an error message when we
>> have a hard time binding one.
>
> NACK. This approach leads to the process spinning forever in that loop,
> which is exactly why we introduced the limit in the first place. See all
> the old archived bug report emails about 'rpciod taking 100% cpu'.

The root problem seems to be the hard loop. Thinking out loud, what if the client's FSM or some other higher up layer performed the retry, with a short delay inserted after each attempt?

> Cheers
> Trond
>
>> Signed-off-by: Ben Myers <[email protected]>
>> ---
>> net/sunrpc/xprtsock.c | 22 ++++++++++++++++++----
>> 1 files changed, 18 insertions(+), 4 deletions(-)
>>
>> diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
>> index b6309db..79a001b 100644
>> --- a/net/sunrpc/xprtsock.c
>> +++ b/net/sunrpc/xprtsock.c
>> @@ -1560,9 +1560,16 @@ static int xs_bind4(struct sock_xprt *transport, struct socket *sock)
>> }
>> last = port;
>> port = xs_next_srcport(transport, sock, port);
>> - if (port > last)
>> + if (port > last) {
>> + if (nloop > 2 && net_ratelimit()) {
>> + printk("RPC: %s %pI4: Cannot bind reserved "
>> + "source port. Consider decreasing "
>> + "min_resvport.\n",
>> + __func__, &myaddr.sin_addr);
>> + }
>> nloop++;
>> - } while (err == -EADDRINUSE && nloop != 2);
>> + }
>> + } while (err == -EADDRINUSE);
>> dprintk("RPC: %s %pI4:%u: %s (%d)\n",
>> __func__, &myaddr.sin_addr,
>> port, err ? "failed" : "ok", err);
>> @@ -1593,9 +1600,16 @@ static int xs_bind6(struct sock_xprt *transport, struct socket *sock)
>> }
>> last = port;
>> port = xs_next_srcport(transport, sock, port);
>> - if (port > last)
>> + if (port > last) {
>> + if (nloop > 2 && net_ratelimit()) {
>> + printk("RPC: %s %pI6: Cannot bind reserved "
>> + "source port. Consider decreasing "
>> + "min_resvport.\n",
>> + __func__, &myaddr.sin6_addr);
>> + }
>> nloop++;
>> - } while (err == -EADDRINUSE && nloop != 2);
>> + }
>> + } while (err == -EADDRINUSE);
>> dprintk("RPC: xs_bind6 %pI6:%u: %s (%d)\n",
>> &myaddr.sin6_addr, port, err ? "failed" : "ok", err);
>> return err;
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>> the body of a message to [email protected]
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

--
chuck[dot]lever[at]oracle[dot]com





2010-10-22 18:16:48

by Chuck Lever III

[permalink] [raw]
Subject: Re: [PATCH] xs_bind retry binding forever


On Oct 22, 2010, at 1:45 PM, Trond Myklebust wrote:

> On Fri, 2010-10-22 at 11:56 -0400, Chuck Lever wrote:
>> On Oct 21, 2010, at 3:38 PM, Trond Myklebust wrote:
>>
>>> On Thu, 2010-10-21 at 13:33 -0500, Ben Myers wrote:
>>>> Retry bind for reserved source ports forever. Add an error message when we
>>>> have a hard time binding one.
>>>
>>> NACK. This approach leads to the process spinning forever in that loop,
>>> which is exactly why we introduced the limit in the first place. See all
>>> the old archived bug report emails about 'rpciod taking 100% cpu'.
>>
>> The root problem seems to be the hard loop. Thinking out loud, what if the client's FSM or some other higher up layer performed the retry, with a short delay inserted after each attempt?
>
> The problem isn't only the hard loop. The reason why we return the
> EADDRINUSE is in order to allow quick failure of mounts and/or
> automounts when we can't bind the socket.
>
> I suggest 2 changes:
>
> 1. In case of error, pass the return value from xs_bind to the
> pending tasks
> 2. Add a handler for EADDRINUSE in call_status(),
> xprt_connect_status() and call_connect_status(). Make sure that
> call_status() and call_connect_status() fail for SOFTCONN tasks,
> and that they print an error message, delay and retry in the
> case of ordinary hard tasks.

The thing is, though, we don't want mounts to fail in this case; that's the presenting problem Ben is trying to address.

This is not the same problem as SOFTCONN -- it's entirely one of how the local system allocates its own resources. Thus, theoretically, it's one where it is possible for us to behave entirely predictably. At its heart, our privileged port allocation mechanism is really an unfair way to allocate this resource, since there's no way to prevent starvation.

I don't know if this is within the realm of possibilities, but it would be nice, for example, if the MNT client and the rpcbind client could each hold onto a privileged TCP port (to prevent others from using it) and just re-use that port whenever a new request needs to be sent to any remote host.

It would be fun to use net namespaces to allocate a separate port range that no-one else could touch, but I don't think that's possible without a separate IP address.

Ben, our client already has the ability to use unprivileged ports for MNT, as long as the server's mountd is configured to accept it. That, plus stipulating mountproto=udp, may give you more relief.

--
chuck[dot]lever[at]oracle[dot]com





2010-10-22 15:21:12

by Ben Myers

[permalink] [raw]
Subject: Re: [PATCH] xs_bind retry binding forever

Hey Trond,

On Thu, Oct 21, 2010 at 03:38:37PM -0400, Trond Myklebust wrote:
> On Thu, 2010-10-21 at 13:33 -0500, Ben Myers wrote:
> > Retry bind for reserved source ports forever. Add an error message when we
> > have a hard time binding one.
>
> NACK. This approach leads to the process spinning forever in that loop,
> which is exactly why we introduced the limit in the first place. See all
> the old archived bug report emails about 'rpciod taking 100% cpu'.

Ok, fair enough. ;)

Do you feel it is reasonable to try and return some kind of easily
understandable error code (or message) to the user? I can testify to
the fact that something like this isn't trivial to debug unless you know
what to look for ahead of time. If you're interested in that I can
spend some time on it.

Thanks!
-Ben

>
> Cheers
> Trond
>
> > Signed-off-by: Ben Myers <[email protected]>
> > ---
> > net/sunrpc/xprtsock.c | 22 ++++++++++++++++++----
> > 1 files changed, 18 insertions(+), 4 deletions(-)
> >
> > diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
> > index b6309db..79a001b 100644
> > --- a/net/sunrpc/xprtsock.c
> > +++ b/net/sunrpc/xprtsock.c
> > @@ -1560,9 +1560,16 @@ static int xs_bind4(struct sock_xprt *transport, struct socket *sock)
> > }
> > last = port;
> > port = xs_next_srcport(transport, sock, port);
> > - if (port > last)
> > + if (port > last) {
> > + if (nloop > 2 && net_ratelimit()) {
> > + printk("RPC: %s %pI4: Cannot bind reserved "
> > + "source port. Consider decreasing "
> > + "min_resvport.\n",
> > + __func__, &myaddr.sin_addr);
> > + }
> > nloop++;
> > - } while (err == -EADDRINUSE && nloop != 2);
> > + }
> > + } while (err == -EADDRINUSE);
> > dprintk("RPC: %s %pI4:%u: %s (%d)\n",
> > __func__, &myaddr.sin_addr,
> > port, err ? "failed" : "ok", err);
> > @@ -1593,9 +1600,16 @@ static int xs_bind6(struct sock_xprt *transport, struct socket *sock)
> > }
> > last = port;
> > port = xs_next_srcport(transport, sock, port);
> > - if (port > last)
> > + if (port > last) {
> > + if (nloop > 2 && net_ratelimit()) {
> > + printk("RPC: %s %pI6: Cannot bind reserved "
> > + "source port. Consider decreasing "
> > + "min_resvport.\n",
> > + __func__, &myaddr.sin6_addr);
> > + }
> > nloop++;
> > - } while (err == -EADDRINUSE && nloop != 2);
> > + }
> > + } while (err == -EADDRINUSE);
> > dprintk("RPC: xs_bind6 %pI6:%u: %s (%d)\n",
> > &myaddr.sin6_addr, port, err ? "failed" : "ok", err);
> > return err;
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> > the body of a message to [email protected]
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>

2010-10-22 22:27:40

by Myklebust, Trond

[permalink] [raw]
Subject: Re: [PATCH] xs_bind retry binding forever

On Fri, 2010-10-22 at 14:15 -0400, Chuck Lever wrote:
> On Oct 22, 2010, at 1:45 PM, Trond Myklebust wrote:
>
> > On Fri, 2010-10-22 at 11:56 -0400, Chuck Lever wrote:
> >> On Oct 21, 2010, at 3:38 PM, Trond Myklebust wrote:
> >>
> >>> On Thu, 2010-10-21 at 13:33 -0500, Ben Myers wrote:
> >>>> Retry bind for reserved source ports forever. Add an error message when we
> >>>> have a hard time binding one.
> >>>
> >>> NACK. This approach leads to the process spinning forever in that loop,
> >>> which is exactly why we introduced the limit in the first place. See all
> >>> the old archived bug report emails about 'rpciod taking 100% cpu'.
> >>
> >> The root problem seems to be the hard loop. Thinking out loud, what if the client's FSM or some other higher up layer performed the retry, with a short delay inserted after each attempt?
> >
> > The problem isn't only the hard loop. The reason why we return the
> > EADDRINUSE is in order to allow quick failure of mounts and/or
> > automounts when we can't bind the socket.
> >
> > I suggest 2 changes:
> >
> > 1. In case of error, pass the return value from xs_bind to the
> > pending tasks
> > 2. Add a handler for EADDRINUSE in call_status(),
> > xprt_connect_status() and call_connect_status(). Make sure that
> > call_status() and call_connect_status() fail for SOFTCONN tasks,
> > and that they print an error message, delay and retry in the
> > case of ordinary hard tasks.
>
> The thing is, though, we don't want mounts to fail in this case; that's the presenting problem Ben is trying to address.
>
> This is not the same problem as SOFTCONN -- it's entirely one of how the local system allocates its own resources. Thus, theoretically, it's one where it is possible for us to behave entirely predictably. At its heart, our privileged port allocation mechanism is really an unfair way to allocate this resource, since there's no way to prevent starvation.

I beg to differ. It is quite possible for an external server to put us
in this position by closing connections too often. If we're running
against a too heavily loaded Linux NFS server, this is actually expected
to happen.

> I don't know if this is within the realm of possibilities, but it would be nice, for example, if the MNT client and the rpcbind client could each hold onto a privileged TCP port (to prevent others from using it) and just re-use that port whenever a new request needs to be sent to any remote host.

That would mean holding onto connections for longer than required.
Although it may fix some cases where a client mounts many partitions
from the same server, it will cause regressions for clients that mount
many partitions from different servers.

Quite frankly, if you are hitting bind issues due to the
portmapper/rpcbind and NFSv3 mount protocols, then the solution already
exists: switch to NFSv4.

> It would be fun to use net namespaces to allocate a separate port range that no-one else could touch, but I don't think that's possible without a separate IP address.

That is not something the kernel should engage in anyway.

> Ben, our client already has the ability to use unprivileged ports for MNT, as long as the server's mountd is configured to accept it. That, plus stipulating mountproto=udp, may give you more relief.

Or see above...

Cheers
Trond