Registering a local RPC service has a long timeout.
When starting the NFSD service, for example, the RPC server wants to
unregister at least 6 different RPC services (three versions of NFS
and three versions of lockd) before it even tries to register the
services it's bringing up.
Usually this isnt' a problem. However, if a portmapper or rpcbind
daemon isn't running, each one of these registrations causes a long
wait (up to a minute each, I think) while the RPC server attempts to
contact the rpcbind daemon at localhost.
I don't think this wait is interruptible, either.
I'm wondering if this long timeout is really necessary. Can we get
by with a second or so, and a couple of retries?
--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com
I think a second or two is way too short, but I do wonder if it can't
issue the unregisters asynchronously, and in parallel. Then it can
wait for them all, with a timeout maybe on the order of 10 to 15
seconds. A couple of retries while waiting sounds reasonable.
Making the wait interruptible seems dicey. Once the deregistration
is started, it seems like it should always make a best attempt to
complete it. Also, nfsd is usually started as a service, so there's
not likely to be a user.
Tom.
At 12:38 PM 4/4/2008, Chuck Lever wrote:
>Registering a local RPC service has a long timeout.
>
>When starting the NFSD service, for example, the RPC server wants to
>unregister at least 6 different RPC services (three versions of NFS
>and three versions of lockd) before it even tries to register the
>services it's bringing up.
>
>Usually this isnt' a problem. However, if a portmapper or rpcbind
>daemon isn't running, each one of these registrations causes a long
>wait (up to a minute each, I think) while the RPC server attempts to
>contact the rpcbind daemon at localhost.
>
>I don't think this wait is interruptible, either.
>
>I'm wondering if this long timeout is really necessary. Can we get
>by with a second or so, and a couple of retries?
On Fri, 2008-04-04 at 12:49 -0400, Talpey, Thomas wrote:
> I think a second or two is way too short, but I do wonder if it can't
> issue the unregisters asynchronously, and in parallel. Then it can
> wait for them all, with a timeout maybe on the order of 10 to 15
> seconds. A couple of retries while waiting sounds reasonable.
>
> Making the wait interruptible seems dicey. Once the deregistration
> is started, it seems like it should always make a best attempt to
> complete it. Also, nfsd is usually started as a service, so there's
> not likely to be a user.
I'd say that making the RPC call asynchronous, but doing an
interruptible wait on completion is probably the best solution.
Making the process entirely asynchronous can be problematic if you
decide to restart the service due to the potential for reordering
between the unregister/register RPC calls.
Cheers
Trond
--
Trond Myklebust
Linux NFS client maintainer
NetApp
[email protected]
http://www.netapp.com
On Apr 4, 2008, at 1:28 PM, Trond Myklebust wrote:
> On Fri, 2008-04-04 at 12:49 -0400, Talpey, Thomas wrote:
>> I think a second or two is way too short, but I do wonder if it can't
>> issue the unregisters asynchronously, and in parallel. Then it can
>> wait for them all, with a timeout maybe on the order of 10 to 15
>> seconds. A couple of retries while waiting sounds reasonable.
>>
>> Making the wait interruptible seems dicey. Once the deregistration
>> is started, it seems like it should always make a best attempt to
>> complete it. Also, nfsd is usually started as a service, so there's
>> not likely to be a user.
>
> I'd say that making the RPC call asynchronous, but doing an
> interruptible wait on completion is probably the best solution.
>
> Making the process entirely asynchronous can be problematic if you
> decide to restart the service due to the potential for reordering
> between the unregister/register RPC calls.
Another approach that doesn't address interruptibility, but does
prevent waiting for a timeout if rpcbind isn't listening:
Use TCP instead of UDP to contact the local rpcbind daemon. If
rpcbind isn't listening, the connection is refused immediately, and
the RPC client can tell without waiting for a timeout.
We would have to create a new RPC_CLNT_CREATE_FOO flag that tells the
client to fail an RPC immediately if the connection is refused (kind
of like the old "one shot" flag). This shouldn't be the default
behavior.
--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com
At 02:41 PM 4/4/2008, Chuck Lever wrote:
>Use TCP instead of UDP to contact the local rpcbind daemon. If
>rpcbind isn't listening, the connection is refused immediately, and
>the RPC client can tell without waiting for a timeout.
You can get a prompt error if you use UDP connected mode (call
connect() on the UDP socket). Any ICMP port unreachable will
flow back to an ECONNREFUSED error on the socket. This doesn't
occur for unconnected UDP endpoints.
>We would have to create a new RPC_CLNT_CREATE_FOO flag that tells the
>client to fail an RPC immediately if the connection is refused (kind
>of like the old "one shot" flag). This shouldn't be the default
>behavior.
Sounds reasonable. But, a port unreachable error should generally
be a "hard" indication, since it does come from the remote host,
or a middlebox acting as its proxy. In other words, any reasonably
small number of retries would all receive the same response. This is
much like a TCP RST - not a transitory event. UDP errors such as
host unreachable or network down (which come from the infrastructure
not the server) should not of course be considered this way and
retry is appropriate.
Bottom line, I think it might actually be a useful default. Why does
the client currently retry ECONNREFUSED for port resolution?
Tom.
On Fri, 2008-04-04 at 14:54 -0400, Talpey, Thomas wrote:
> At 02:41 PM 4/4/2008, Chuck Lever wrote:
> >Use TCP instead of UDP to contact the local rpcbind daemon. If
> >rpcbind isn't listening, the connection is refused immediately, and
> >the RPC client can tell without waiting for a timeout.
>
> You can get a prompt error if you use UDP connected mode (call
> connect() on the UDP socket). Any ICMP port unreachable will
> flow back to an ECONNREFUSED error on the socket. This doesn't
> occur for unconnected UDP endpoints.
I hadn't realised that was a feature of connected UDP sockets, but
that's a good point.
> >We would have to create a new RPC_CLNT_CREATE_FOO flag that tells the
> >client to fail an RPC immediately if the connection is refused (kind
> >of like the old "one shot" flag). This shouldn't be the default
> >behavior.
> Sounds reasonable. But, a port unreachable error should generally
> be a "hard" indication, since it does come from the remote host,
> or a middlebox acting as its proxy. In other words, any reasonably
> small number of retries would all receive the same response. This is
> much like a TCP RST - not a transitory event. UDP errors such as
> host unreachable or network down (which come from the infrastructure
> not the server) should not of course be considered this way and
> retry is appropriate.
>
> Bottom line, I think it might actually be a useful default. Why does
> the client currently retry ECONNREFUSED for port resolution?
The client currently doesn't retry in the case of a connection failing
for a soft RPC request. For hard RPC requests then it _must_ retry the
connection, but it does so using an exponential backoff rule.
Note that the NFSv4.1 channel bindings will probably force us to change
this...
Cheers
Trond
--
Trond Myklebust
Linux NFS client maintainer
NetApp
[email protected]
http://www.netapp.com
On Apr 4, 2008, at 2:54 PM, Talpey, Thomas wrote:
> At 02:41 PM 4/4/2008, Chuck Lever wrote:
>> Use TCP instead of UDP to contact the local rpcbind daemon. If
>> rpcbind isn't listening, the connection is refused immediately, and
>> the RPC client can tell without waiting for a timeout.
>
> You can get a prompt error if you use UDP connected mode (call
> connect() on the UDP socket). Any ICMP port unreachable will
> flow back to an ECONNREFUSED error on the socket. This doesn't
> occur for unconnected UDP endpoints.
Yeah, RPC client uses unconnected UDP sockets.
>> We would have to create a new RPC_CLNT_CREATE_FOO flag that tells the
>> client to fail an RPC immediately if the connection is refused (kind
>> of like the old "one shot" flag). This shouldn't be the default
>> behavior.
>
> Sounds reasonable. But, a port unreachable error should generally
> be a "hard" indication, since it does come from the remote host,
> or a middlebox acting as its proxy. In other words, any reasonably
> small number of retries would all receive the same response. This is
> much like a TCP RST - not a transitory event. UDP errors such as
> host unreachable or network down (which come from the infrastructure
> not the server) should not of course be considered this way and
> retry is appropriate.
>
> Bottom line, I think it might actually be a useful default. Why does
> the client currently retry ECONNREFUSED for port resolution?
Because the RPC client does that for all RPCs. It retries soft until
a timeout; hard, forever.
It could be waiting for the remote to come up. In which case, a few
retries after ECONNREFUSED is useful, even preferred.
I'm going to try prototyping a new RPC_CLNT_ flag.
--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com
At 03:25 PM 4/4/2008, Trond Myklebust wrote:
>> Bottom line, I think it might actually be a useful default. Why does
>> the client currently retry ECONNREFUSED for port resolution?
>
>The client currently doesn't retry in the case of a connection failing
>for a soft RPC request. For hard RPC requests then it _must_ retry the
>connection, but it does so using an exponential backoff rule.
Yes, for the NFS client this is true, and necessary. I'm wondering why
the rpcbind/portmap client retries on ECONNREFUSED. Seems overly
heroic, to me - the destination responded, and said "go away".
If it fails with a server-provided error such as this, the caller can and
should decide what to do - if that caller is NFS, it can apply soft/hard
to the retry decision. But Chuck's example, for instance, is NFSD.
Tom.
Hi Tom-
On Apr 4, 2008, at 12:49 PM, Talpey, Thomas wrote:
> I think a second or two is way too short, but I do wonder if it can't
> issue the unregisters asynchronously, and in parallel.
You would have to parallelize the setup of the lockd and nfsd
services. Ie this would be a ULP change. Doable, but complicated.
Can you say why you think two seconds is too short for a local host
operation?
> Then it can
> wait for them all, with a timeout maybe on the order of 10 to 15
> seconds. A couple of retries while waiting sounds reasonable.
The current situation is a 5 second timeout, followed by 10, then
20. Even shortening the initial timeout would be helpful, or making
it not do exponential backoff.
NFSD is usually started during system boot. If there are problems
like this, it looks like a boot hang.
> Making the wait interruptible seems dicey. Once the deregistration
> is started, it seems like it should always make a best attempt to
> complete it.
If you interrupt a script like /etc/init.d/nfs, you will just have to
re-run it, and it will try the unregistration again. I'm not sure
what you protect by making unregistration uninterruptible.
This may be an undesired artifact of neutering "intr" in 2.6.25.
> Also, nfsd is usually started as a service, so there's
> not likely to be a user.
The system actually does throw an "ICMP port unreachable" if the
daemon isn't listening. The problem is this never gets back to the
RPC client. Even if it did, what's the correct thing to do?
> At 12:38 PM 4/4/2008, Chuck Lever wrote:
>> Registering a local RPC service has a long timeout.
>>
>> When starting the NFSD service, for example, the RPC server wants to
>> unregister at least 6 different RPC services (three versions of NFS
>> and three versions of lockd) before it even tries to register the
>> services it's bringing up.
>>
>> Usually this isnt' a problem. However, if a portmapper or rpcbind
>> daemon isn't running, each one of these registrations causes a long
>> wait (up to a minute each, I think) while the RPC server attempts to
>> contact the rpcbind daemon at localhost.
>>
>> I don't think this wait is interruptible, either.
>>
>> I'm wondering if this long timeout is really necessary. Can we get
>> by with a second or so, and a couple of retries?
>
--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com
At 03:28 PM 4/4/2008, Chuck Lever wrote:
>It could be waiting for the remote to come up. In which case, a few
>retries after ECONNREFUSED is useful, even preferred.
If the remote rpcbind is coming up, then there's no need to deregister.
I would certainly not advocate bailing the registration of course!
>I'm going to try prototyping a new RPC_CLNT_ flag.
Sure what the heck. If the RPC client needs to serve all upper layers
then a new behavior makes sense. I fondly recall Rick Macklem's
"spongy" mounts, in-between hard and soft, that retried nonidempotent
ops harder than others.
Since you're exploring something on the other side of soft, may I
suggest "squishy"? :-)
Tom.
On Fri, 2008-04-04 at 15:33 -0400, Talpey, Thomas wrote:
> If it fails with a server-provided error such as this, the caller can and
> should decide what to do - if that caller is NFS, it can apply soft/hard
> to the retry decision. But Chuck's example, for instance, is NFSD.
Note that ECONNREFUSED doesn't necessarily mean that the service is
down; it may also indicate a SYN backlog.
In most cases, you therefore definitely want to try to handle this type
of error in the RPC layer, not the caller. If you fault back to the
caller, then you lose RPC level information, and in particular, you will
lose the XID. If you were trying to reconnect in order to replay the RPC
request, then that can be a real problem...
--
Trond Myklebust
Linux NFS client maintainer
NetApp
[email protected]
http://www.netapp.com