2008-10-23 03:57:06

by NeilBrown

[permalink] [raw]
Subject: [grumble] connected UDP sockets [grumble] Solaris [grumble]


The attentive reader of this mailing list may be aware that I was -
some time ago - advocating using connected UDP sockets when UDP was
used to contact the server during a mount. i.e. to talk to portmap and
mountd.

The benefit of this is that errors reported by ICMP (e.g. host
unreachable / port unreachable) are reported to the application with
connected sockets, whereas unconnected sockets need to wait for a
timeout.

I just discovered that there is a problem with this. It involves
multihomed hosts and certain non-Linux operating systems such as
Solaris (I don't know which version(s)).

In one particular case, the UDP request (portmap lookup I assume) was
sent from a Linux client to a Solaris server and the reply promptly
came back from a different IP address (presumably the address of the
interface that Solaris wanted to route through to get to the client).
Linux replied to this with an ICMP error. It couldn't send the reply
to mount.nfs because mount.nfs had a connected UDP sockets that was
connected to a different remote address.

This is arguably a bug in Solaris. It should reply with a source
address matching the destination address of the request. Linux hasn't
had that bug for years. But we probably still have to live with it.

The conclusion is that if we use connected UDP sockets, we will get
unnecessary timeout talking to certain multihomed hosts, and if we
don't we will get unnecessary timeouts talking to certain hosts that
don't support portmap on UDP (for example).

I don't suppose there is a middle ground? A semi-connected socket?
Or we could have one of each and see which one gets a reply first?
No, that's just yuck.


Much as it pains me to say this, maybe we just need to treat UDP as
legacy for all protocols (PORTMAP, MOUNT, NLM, NSM), not just NFS.
None of these problems occur with TCP. TCP does have a slightly
higher overhead for simple transactions, but it is a cost that is
unlikely to be noticeable in reality.


Thoughts?

NeilBrown (grumble grumble).


2008-10-23 04:45:17

by Greg Banks

[permalink] [raw]
Subject: Re: [grumble] connected UDP sockets [grumble] Solaris [grumble]

Neil Brown wrote:
>
> Much as it pains me to say this, maybe we just need to treat UDP as
> legacy for all protocols (PORTMAP, MOUNT, NLM, NSM), not just NFS.
> None of these problems occur with TCP. TCP does have a slightly
> higher overhead for simple transactions, but it is a cost that is
> unlikely to be noticeable in reality.
>
>
> Thoughts?
>
>
I see only two reasons to keep any UDP support at all in either client
or server.

a) legacy compatibility, for toy/broken/antique clients/servers/firewalls

b) on the server, supporting broadcast RPC to/through the portmapper

--
Greg Banks, P.Engineer, SGI Australian Software Group.
Be like the squirrel.
I don't speak for SGI.


2008-10-23 16:30:38

by Chuck Lever

[permalink] [raw]
Subject: Re: [grumble] connected UDP sockets [grumble] Solaris [grumble]

On Oct 22, 2008, at Oct 22, 2008, 11:57 PM, Neil Brown wrote:
> The attentive reader of this mailing list may be aware that I was -
> some time ago - advocating using connected UDP sockets when UDP was
> used to contact the server during a mount. i.e. to talk to portmap and
> mountd.
>
> The benefit of this is that errors reported by ICMP (e.g. host
> unreachable / port unreachable) are reported to the application with
> connected sockets, whereas unconnected sockets need to wait for a
> timeout.
>
> I just discovered that there is a problem with this. It involves
> multihomed hosts and certain non-Linux operating systems such as
> Solaris (I don't know which version(s)).
>
> In one particular case, the UDP request (portmap lookup I assume) was
> sent from a Linux client to a Solaris server and the reply promptly
> came back from a different IP address (presumably the address of the
> interface that Solaris wanted to route through to get to the client).
> Linux replied to this with an ICMP error. It couldn't send the reply
> to mount.nfs because mount.nfs had a connected UDP sockets that was
> connected to a different remote address.
>
> This is arguably a bug in Solaris. It should reply with a source
> address matching the destination address of the request. Linux hasn't
> had that bug for years. But we probably still have to live with it.
>
> The conclusion is that if we use connected UDP sockets, we will get
> unnecessary timeout talking to certain multihomed hosts, and if we
> don't we will get unnecessary timeouts talking to certain hosts that
> don't support portmap on UDP (for example).

Ja, disappointing.

> I don't suppose there is a middle ground? A semi-connected socket?
> Or we could have one of each and see which one gets a reply first?
> No, that's just yuck.
>
> Much as it pains me to say this, maybe we just need to treat UDP as
> legacy for all protocols (PORTMAP, MOUNT, NLM, NSM), not just NFS.
> None of these problems occur with TCP. TCP does have a slightly
> higher overhead for simple transactions, but it is a cost that is
> unlikely to be noticeable in reality.

> Thoughts?

As many good things as there are about TCP, it has one major
drawback: it leaves a connection in CLOSE_WAIT for a long period
(normally 120 seconds) after a normal socket close. This ties up our
port range on workloads that have to do one or just a few requests
then close their socket. We are slowly making headway on improving
this situation (using non-privileged ports wherever practical, for
instance) but it will take some time.

There are also still some common and reasonable configurations where
TCP overhead (the 3-way handshake and tiny ACK packets) is
significant. For example, running in a virtual machine means a domain
switch for every packet.

In my opinion the problem configuration you describe above is more
rare than configurations that are helped by using connected UDP
sockets. (But don't ask me to back that up with real data).

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com