2007-09-19 15:22:14

by Ulrich Drepper

[permalink] [raw]
Subject: follow-up: discrepancy with POSIX

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

As a follow up to my question from yesterday on the netdev list what I
think is a real problem. Either in the kernel or in the POSIX spec.

The POSIX spec currently says this about SOCK_DGRAM sockets:

If address is a null address for the protocol, the socket’s peer
address shall be reset.

The term "null address" is not further specified but it will usually be
read to allow the following scenario to work out:

fd = socket(AT_INET6, ...)

connect(fd, ...some IPv6 address...)

struct sockaddr_in6 sin6 = { .sin6_family = AF_INET6 };
connect(fd, &sin6, sizeof (sin6));

connect(fd, ...some new IPv6 address...)

This does not work on Linux in the moment. The socket remains connected
to the old IPv6 address but the second connect() call does succeed (this
does not sound OK). What does work is if the connect call to
disassociate the address uses AF_UNSPEC instead of AF_INET6.


The question is: do people here think this is a problem in the POSIX
spec? Binding to :: and 0.0.0.0 isn't possible, so maybe the Linux
implementation should allow this?

If you think the POSIX spec is wrong (and can point to other
implementations doing the same as Linux) let me know and I'll work on
getting the spec changed.

- --
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (GNU/Linux)

iD8DBQFG8T6L2ijCOnn/RHQRAnSRAJ9sXDGG9OepEQWQInaPgwxCWlaH6wCghqim
ULttg5/lU8c1rSpBnoRCjB8=
=nGVv
-----END PGP SIGNATURE-----


2007-09-19 15:48:05

by Andi Kleen

[permalink] [raw]
Subject: Re: follow-up: discrepancy with POSIX

Ulrich Drepper <[email protected]> writes:

>
> fd = socket(AT_INET6, ...)
>
> connect(fd, ...some IPv6 address...)
>
> struct sockaddr_in6 sin6 = { .sin6_family = AF_INET6 };
> connect(fd, &sin6, sizeof (sin6));

The standard way to undo connect is to use AF_UNSPEC. Code to handle
that for dgram sockets is there. It's the same code for v4 and v6.

-Andi

2007-09-19 16:15:36

by David Miller

[permalink] [raw]
Subject: Re: follow-up: discrepancy with POSIX

From: Ulrich Drepper <[email protected]>
Date: Wed, 19 Sep 2007 08:21:47 -0700

> If you think the POSIX spec is wrong (and can point to other
> implementations doing the same as Linux) let me know and I'll work on
> getting the spec changed.

The whole AF_UNSPEC thing I'm almost certain comes from BSD, which has
behaved that way for centuries.

Someone needs to cull through Steven's Volume 2 to verify this, I'm
too busy at the moment to do so myself.

2007-09-19 16:32:24

by Alan

[permalink] [raw]
Subject: Re: follow-up: discrepancy with POSIX

On Wed, 19 Sep 2007 09:15:10 -0700 (PDT)
David Miller <[email protected]> wrote:

> From: Ulrich Drepper <[email protected]>
> Date: Wed, 19 Sep 2007 08:21:47 -0700
>
> > If you think the POSIX spec is wrong (and can point to other
> > implementations doing the same as Linux) let me know and I'll work on
> > getting the spec changed.
>
> The whole AF_UNSPEC thing I'm almost certain comes from BSD, which has
> behaved that way for centuries.

We got it from the 1003.4g draft socket specification if I remember
rightly. Its entirely plausible that got it from 4BSE.

Alan

2007-09-19 16:49:36

by Ulrich Drepper

[permalink] [raw]
Subject: Re: follow-up: discrepancy with POSIX

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Andi Kleen wrote:
> The standard way to undo connect is to use AF_UNSPEC. Code to handle
> that for dgram sockets is there. It's the same code for v4 and v6.

I quoted the standard and it does not say anything about AF_UNSPEC. So
you cannot simply make such broad statements.

I also don't say that this behavior should be removed. It's certainly
useful, very much so in fact.

But the spec calls for a "null address" to be used and that's in my
understanding something different from using AF_UNSPEC.

I looked through Stevens TCP Illustrated Vol 2 and it seems not to
mention resetting the address at all. The POSIX spec certainly got this
text from .1g.

I cannot test it on other systems. If somebody has access to some
certified systems (and maybe others), write a bit of code which creates
a DGRAM socket, connect to one address, call connect with a "null
address", then connect to another address (which likely has to use a
different interface since otherwise the connect will just succeed, it
seems).

- --
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (GNU/Linux)

iD8DBQFG8VMF2ijCOnn/RHQRAr9NAJwLxyql0kQnMGJNaPZlRGsuB6rGEACgog88
WIWAFhuBWsjps7PdbcoumUQ=
=oLxP
-----END PGP SIGNATURE-----

2007-09-19 16:53:19

by David Miller

[permalink] [raw]
Subject: Re: follow-up: discrepancy with POSIX

From: Ulrich Drepper <[email protected]>
Date: Wed, 19 Sep 2007 09:49:09 -0700

> But the spec calls for a "null address" to be used and that's in my
> understanding something different from using AF_UNSPEC.

It just occured to me that AF_UNSPEC might be used simply
because "all zeros" might be a valid real bindable address
for some address family. And using AF_UNSPEC avoids that
problem entirely.

2007-09-19 17:27:11

by Andi Kleen

[permalink] [raw]
Subject: Re: follow-up: discrepancy with POSIX

On Wed, Sep 19, 2007 at 09:49:09AM -0700, Ulrich Drepper wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Andi Kleen wrote:
> > The standard way to undo connect is to use AF_UNSPEC. Code to handle
> > that for dgram sockets is there. It's the same code for v4 and v6.
>
> I quoted the standard and it does not say anything about AF_UNSPEC. So
> you cannot simply make such broad statements.

Ok "standard" was perhaps a poor choice of words.

AF_UNSPEC used to be introduced long ago by Alan based on some early
POSIX draft iirc.

Also incidentially it's a null address:

include/linux/socket.h:#define AF_UNSPEC 0

> But the spec calls for a "null address" to be used and that's in my
> understanding something different from using AF_UNSPEC.

memset(&sockaddr, 0, sizeof(sockaddr)) should give you AF_UNSPEC

-Andi

2007-09-19 17:27:33

by Ulrich Drepper

[permalink] [raw]
Subject: Re: follow-up: discrepancy with POSIX

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Ulrich Drepper wrote:
> Yes, but for IPv4/6 it's not an issue. Some implementations might
> handle all-zeros and the spec _currently_ calls for it. In this case an
> alignment would be good.

Searching the web shows up this:

http://developer.apple.com/documentation/Darwin/Reference/ManPages/man2/connect.2.html


Datagram sockets may dissolve the association by connecting to an
invalid address, such as a null address or an address with the address
family set to AF_UNSPEC (the error EAFNOSUPPORT will be harmlessly
returned).


I.e., at least Apple implements both variants.

- --
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (GNU/Linux)

iD8DBQFG8Vvu2ijCOnn/RHQRAsSfAJkBELtiNyul8wMOjVv1x7LfvDWw/ACfR0D0
cm+k1wfhCsT4GjbF3uac+eY=
=nksN
-----END PGP SIGNATURE-----

2007-09-19 17:47:24

by Ulrich Drepper

[permalink] [raw]
Subject: Re: follow-up: discrepancy with POSIX

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Andi Kleen wrote:
>> But the spec calls for a "null address" to be used and that's in my
>> understanding something different from using AF_UNSPEC.
>
> memset(&sockaddr, 0, sizeof(sockaddr)) should give you AF_UNSPEC

But the spec calls for <quote>null address for the protocol</quote>.

That means the family for the null address is the same as the family of
the socket.

- --
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (GNU/Linux)

iD8DBQFG8WCO2ijCOnn/RHQRAgtsAJ9qTFVj5QQbVG/hUflxo/6uPOfl4QCdHSX8
wi2GX7B0pht8VDaswYLqdpM=
=sMSg
-----END PGP SIGNATURE-----

2007-09-19 17:57:22

by Andi Kleen

[permalink] [raw]
Subject: Re: follow-up: discrepancy with POSIX

On Wed, Sep 19, 2007 at 10:46:54AM -0700, Ulrich Drepper wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Andi Kleen wrote:
> >> But the spec calls for a "null address" to be used and that's in my
> >> understanding something different from using AF_UNSPEC.
> >
> > memset(&sockaddr, 0, sizeof(sockaddr)) should give you AF_UNSPEC
>
> But the spec calls for <quote>null address for the protocol</quote>.
>
> That means the family for the null address is the same as the family of
> the socket.

Spec doesn't match traditional behaviour then. IPv4 0.0.0.0 is
traditionally an synonym for old style all broadcast (255.255.255.255)
on UDP/RAW and it's certainly possible to connect() to that.

-Andi

2007-09-19 17:59:40

by Alan

[permalink] [raw]
Subject: Re: follow-up: discrepancy with POSIX

On Wed, 19 Sep 2007 10:46:54 -0700
Ulrich Drepper <[email protected]> wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Andi Kleen wrote:
> >> But the spec calls for a "null address" to be used and that's in my
> >> understanding something different from using AF_UNSPEC.
> >
> > memset(&sockaddr, 0, sizeof(sockaddr)) should give you AF_UNSPEC
>
> But the spec calls for <quote>null address for the protocol</quote>.
>
> That means the family for the null address is the same as the family of
> the socket.

Which is a valid address in some protocols. If I remember rightly then
appletalk net 0 node 0 port 0 is valid although I'd want to look in the
book to check that - ditto AF_ECONET although I doubt anyone cares too
much 8)

Alan

2007-09-19 18:02:32

by Ulrich Drepper

[permalink] [raw]
Subject: Re: follow-up: discrepancy with POSIX

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Andi Kleen wrote:
> Spec doesn't match traditional behaviour then.

Well, determining whether that's the case is part of this exercise.


> IPv4 0.0.0.0 is
> traditionally an synonym for old style all broadcast (255.255.255.255)
> on UDP/RAW and it's certainly possible to connect() to that.

Where do you get this from? And where is this implemented? I don't
doubt it but I have to convince people to change the standard and
possibly introduce incompatibility.

- --
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (GNU/Linux)

iD8DBQFG8WQY2ijCOnn/RHQRAlsBAJ9qZRZXNN2VEy136MFIT1daHfju5ACdGiIW
k0I5e2BGRjvjbJrrAwtehqo=
=fX+i
-----END PGP SIGNATURE-----

2007-09-19 18:30:20

by Andi Kleen

[permalink] [raw]
Subject: Re: follow-up: discrepancy with POSIX

On Wed, Sep 19, 2007 at 11:02:00AM -0700, Ulrich Drepper wrote:
> > on UDP/RAW and it's certainly possible to connect() to that.
>
> Where do you get this from? And where is this implemented? I don't

Sorry it's actually loopback, not broadcast as implemented in Linux.
In Linux it's implemented in ip_route_output_slow(). Essentially
converted to 127.0.0.1

I think it's traditional BSD behaviour but couldn't find it on
a quick look in FreeBSD source (but haven't looked very intensively)

Admittedly port 0 is somewhat dodgy for UDP too, but at least in RAW
context it might be valid.

-Andi

2007-09-19 18:39:31

by Rick Jones

[permalink] [raw]
Subject: Re: follow-up: discrepancy with POSIX

Andi Kleen wrote:
> On Wed, Sep 19, 2007 at 11:02:00AM -0700, Ulrich Drepper wrote:
>
>>>on UDP/RAW and it's certainly possible to connect() to that.
>>
>>Where do you get this from? And where is this implemented? I don't
>
>
> Sorry it's actually loopback, not broadcast as implemented in Linux.
> In Linux it's implemented in ip_route_output_slow(). Essentially
> converted to 127.0.0.1
>
> I think it's traditional BSD behaviour but couldn't find it on
> a quick look in FreeBSD source (but haven't looked very intensively)

One has to set their way-back machine pretty far back to find the *BSD
bits which used 0.0.0.0 as the "all nets, all subnets" (to mis-use a
term) broadcast IPv4 address when sending. Perhaps as far back as the
time before HP-UX 7 or SunOS4. The bit errors in my dimm memory get
pretty dense that far back...

It has hung-on in various places (stacks) as an "accepted" broadcast IP
in the receive path, but not the send path for quite possibly decades now.

rick jones

2007-09-19 19:41:07

by Andi Kleen

[permalink] [raw]
Subject: Re: follow-up: discrepancy with POSIX

> It has hung-on in various places (stacks) as an "accepted" broadcast IP
> in the receive path, but not the send path for quite possibly decades now.

Well it is valid in Linux for sending. And who knows who relies on it.

-Andi

2007-09-19 20:26:46

by Ulrich Drepper

[permalink] [raw]
Subject: Re: follow-up: discrepancy with POSIX

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

David Miller wrote:
> It just occured to me that AF_UNSPEC might be used simply
> because "all zeros" might be a valid real bindable address
> for some address family. And using AF_UNSPEC avoids that
> problem entirely.

Yes, but for IPv4/6 it's not an issue. Some implementations might
handle all-zeros and the spec _currently_ calls for it. In this case an
alignment would be good.

I guess I'll just go ahead and file a problem report with the spec.
Maybe the Unix vendors will test their implementations in provide feedback.

- --
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (GNU/Linux)

iD8DBQFG8Vam2ijCOnn/RHQRAlw2AJwPCkD/GdX5YWCjsidhNXkGT71SiQCeLUDX
XimSWS2NMI9T8QxnnV3FDQ4=
=8XbG
-----END PGP SIGNATURE-----

2007-09-19 20:34:18

by Valdis Klētnieks

[permalink] [raw]
Subject: Re: follow-up: discrepancy with POSIX

On Wed, 19 Sep 2007 11:38:57 PDT, Rick Jones said:

> One has to set their way-back machine pretty far back to find the *BSD
> bits which used 0.0.0.0 as the "all nets, all subnets" (to mis-use a
> term) broadcast IPv4 address when sending. Perhaps as far back as the
> time before HP-UX 7 or SunOS4. The bit errors in my dimm memory get
> pretty dense that far back...

That would be BSD4.2 - BSD4.3 went to all-ones, and it *was* quite the
little mess if you had both flavors of boxes on the same subnet at the same
time, it would packet-storm *quite* easily.


Attachments:
(No filename) (226.00 B)