LinuxLists.cc - select for UNIX sockets?

2003-06-03 14:42:30

Subject: select for UNIX sockets?

Hi,

Should something like this work correctly?

while(1) {
FD_ZERO(&set);
FD_SET(fd, &set);
select(FD_SETSIZE, NULL, &set, NULL, NULL); <<<<<<< for writing

if (FD_ISSET(fd, &set))
sendto(fd, &datagram, 1, 0, ...);
}

fd is a normal local datagram socket. It looks select() returns with
"fd ready for write" and sendto() then blocks as the queue is full.

I don't know if it's expected behaviour or just a not yet known bug.
Of course, I have a more complete test program if needed.

2.4.21rc6, haven't tried any other version.

strace shows:

select(1024, NULL, [3], NULL, NULL) = 1 (out [3])
sendto(3, "\0", 1, 0, {sa_family=AF_UNIX, path="/tmp/tempUn"}, 13 <<< blocks
--
Krzysztof Halasa
Network Administrator

2003-06-03 15:38:18

by Alan

[permalink] [raw]

Subject: Re: select for UNIX sockets?

On Maw, 2003-06-03 at 01:08, Krzysztof Halasa wrote:
> Hi,
>
> Should something like this work correctly?

Sort of. The wakeup may occur for several reasons and you need to check
the return (for signals). Also the wakeup can occur when there is room
but another thread fills it, or return room but not enough for a large
datagram. Those don't seem to be the case on your example

2003-06-04 11:42:49

by Jesse Pollard

[permalink] [raw]

Subject: Re: select for UNIX sockets?

On Monday 02 June 2003 19:08, Krzysztof Halasa wrote:
> Hi,
>
> Should something like this work correctly?
>
> while(1) {
> FD_ZERO(&set);
> FD_SET(fd, &set);
> select(FD_SETSIZE, NULL, &set, NULL, NULL); <<<<<<< for writing
>
> if (FD_ISSET(fd, &set))
> sendto(fd, &datagram, 1, 0, ...);
> }
>
> fd is a normal local datagram socket. It looks select() returns with
> "fd ready for write" and sendto() then blocks as the queue is full.
>
> I don't know if it's expected behaviour or just a not yet known bug.
> Of course, I have a more complete test program if needed.
>
> 2.4.21rc6, haven't tried any other version.
>
> strace shows:
>
> select(1024, NULL, [3], NULL, NULL) = 1 (out [3])
> sendto(3, "\0", 1, 0, {sa_family=AF_UNIX, path="/tmp/tempUn"}, 13 <<<
> blocks

Could. There may be room for the buffer, but unless it is set to nonblock,
you may have a stream open to another host that may not accept the data (busy,
network congestion...) With the required acks, the return may (should?) be
delayed until the ack arrives.

2003-06-04 12:06:36

by Petr Vandrovec

[permalink] [raw]

Subject: Re: select for UNIX sockets?

On 4 Jun 03 at 6:55, Jesse Pollard wrote:
> On Monday 02 June 2003 19:08, Krzysztof Halasa wrote:
> > Hi,
> >
> > Should something like this work correctly?
> >
> > while(1) {
> > FD_ZERO(&set);
> > FD_SET(fd, &set);
> > select(FD_SETSIZE, NULL, &set, NULL, NULL); <<<<<<< for writing
> >
> > if (FD_ISSET(fd, &set))
> > sendto(fd, &datagram, 1, 0, ...);
> > }
> >
> > fd is a normal local datagram socket. It looks select() returns with
> > "fd ready for write" and sendto() then blocks as the queue is full.
> >
> > I don't know if it's expected behaviour or just a not yet known bug.
> > Of course, I have a more complete test program if needed.
> >
> > 2.4.21rc6, haven't tried any other version.
> >
> > strace shows:
> >
> > select(1024, NULL, [3], NULL, NULL) = 1 (out [3])
> > sendto(3, "\0", 1, 0, {sa_family=AF_UNIX, path="/tmp/tempUn"}, 13 <<<
> > blocks
>
> Could. There may be room for the buffer, but unless it is set to nonblock,
> you may have a stream open to another host that may not accept the data (busy,
> network congestion...) With the required acks, the return may (should?) be
> delayed until the ack arrives.

Besides that select() on unconnected socket is nonsense... If you'll
change code to do connect(), select(), send(), then it should work,
unless I missed something.
Petr Vandrovec
[email protected]

2003-06-04 12:48:17

by Krzysztof Halasa

[permalink] [raw]

Subject: Re: select for UNIX sockets?

Jesse Pollard <[email protected]> writes:

> > select(1024, NULL, [3], NULL, NULL) = 1 (out [3])
> > sendto(3, "\0", 1, 0, {sa_family=AF_UNIX, path="/tmp/tempUn"}, 13 <<<
> > blocks
>
> Could. There may be room for the buffer, but unless it is set to nonblock,
> you may have a stream open to another host that may not accept the data
> (busy,
> network congestion...) With the required acks, the return may (should?) be
> delayed until the ack arrives.

But it's local datagram socket (UNIX). OTOH TCP to remote host should
not block waiting for acks.
--
Krzysztof Halasa
Network Administrator

2003-06-04 23:20:26

by Krzysztof Halasa

[permalink] [raw]

Subject: Re: select for UNIX sockets?

Attachments:

(No filename) (1.63 kB)
recv.c (1.09 kB)
send.c (1.13 kB)
Download all attachments

2003-06-05 13:04:11

by Krzysztof Halasa

[permalink] [raw]

Subject: Re: select for UNIX sockets?

Krzysztof Halasa <[email protected]> writes:

> but unix_peer_get(sk) returns NULL.

Well... I missed the fact that this program uses "unconnected" UNIX UDP.
Still, it shouldn't block on sendmsg after a successful select(), maybe
dropping the packet should be better?. It's not that simple.

Thanks for all replies, investigating this issue further.
--
Krzysztof Halasa
Network Administrator

2003-06-06 00:15:20

by Valdis Klētnieks

[permalink] [raw]

Subject: Re: select for UNIX sockets?

On Wed, 04 Jun 2003 14:19:34 +0200, Petr Vandrovec said:

> > > FD_ZERO(&set);
> > > FD_SET(fd, &set);
> > > select(FD_SETSIZE, NULL, &set, NULL, NULL); <<<<<<< for writing
> > >
> > > if (FD_ISSET(fd, &set))
> > > sendto(fd, &datagram, 1, 0, ...);

> Besides that select() on unconnected socket is nonsense... If you'll
> change code to do connect(), select(), send(), then it should work,
> unless I missed something.

We FD_SET the bit, ignore the return value of select, and test if the bit is
still set. Plenty of programming bad karma there. However, one would vaguely
hope that the kernel would notice that the socket isn't connected and -ENOTCONN
rather than blocking....

Attachments:

(No filename) (226.00 B)

2003-06-06 00:24:43

by Petr Vandrovec

[permalink] [raw]

Subject: Re: select for UNIX sockets?

On Thu, Jun 05, 2003 at 08:28:46PM -0400, [email protected] wrote:
> On Wed, 04 Jun 2003 14:19:34 +0200, Petr Vandrovec said:
>
> > > > FD_ZERO(&set);
> > > > FD_SET(fd, &set);
> > > > select(FD_SETSIZE, NULL, &set, NULL, NULL); <<<<<<< for writing
> > > >
> > > > if (FD_ISSET(fd, &set))
> > > > sendto(fd, &datagram, 1, 0, ...);
>
> > Besides that select() on unconnected socket is nonsense... If you'll
> > change code to do connect(), select(), send(), then it should work,
> > unless I missed something.
>
> We FD_SET the bit, ignore the return value of select, and test if the bit is
> still set. Plenty of programming bad karma there. However, one would vaguely
> hope that the kernel would notice that the socket isn't connected and -ENOTCONN
> rather than blocking....

You'll get ENOTCONN from send, just sendto blocks. select() returns that fd is
ready because this end of socket is ready, and there is probably at least one
UNIX socket on the system which is ready to accept data - so I think that it
is correct that select() returns data ready.

I think that whole problem comes from code's author idea that UNIX datagram
sockets are equivalent to UDP through localhost while they are completely
different thing.
Petr Vandrovec
[email protected]

2003-06-06 12:04:03

by MarKol

[permalink] [raw]

Subject: Re: select for UNIX sockets?

Hi!

Since I was an initiator of this topic on one of the Polish Linux groups
I'd like to explain some issues. We've been porting some larger piece of
software from Solaris to Linux and problem has arisen. Below is
corrected
example (with errors checking after function calls), where isolated
problem
is presented. I hope this will cut off any suggestions that some of
function
calls return errors which aren't detected and handled.

An experiment shows that there is no error occurrences while running
these
examples on Linux and sender blocks on sendto() (after sending
_successfully_
some datagrams to the receiver) when select() returns with ready to
write descriptor.

The same example works _correct_ on Solaris and QNX (sender blocks on
select() call and _never_ on sendto() ).

Question is:
Am I doing something wrong or maybe there is a bug in select() function
under Linux?

/* ---------- start: sender source ----------- */
#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <sys/un.h>
#include <fcntl.h>
#include <errno.h>
#include <string.h>

int socketUn;
struct sockaddr_un addrUn;
int lenUn;
char datagram[2000];
int dgramCounter = 0;
fd_set writeFdToWatch;

void sockInit(void);

int main(){
sockInit();

while(1) {
FD_ZERO(&writeFdToWatch);
FD_SET(socketUn, &writeFdToWatch);

printf("slct:"); fflush(stdout);
int retval = select(socketUn+1, (fd_set *)NULL,
&writeFdToWatch,(fd_set *)NULL,
(struct timeval *)NULL);

if (retval==-1){ // select returned error
perror("select() : "); exit(-1);
}else if (retval==0){ // timeout or another wakeup reason
printf("????\n"); fflush(stdout);
}else{ // there are ready descriptors
if ( FD_ISSET(socketUn, &writeFdToWatch) ) {
printf("sndt:"); fflush(stdout);
int size = sendto(socketUn, &datagram, sizeof(datagram),
0, (struct sockaddr *)&addrUn, lenUn);
if (size == -1){
perror("sendto() : "); exit(-1);
}else if ( size!=sizeof(datagram) ){
perror("sendto() - size incorrect : "); exit(-1);
}
printf("sent - %3d\n", ++dgramCounter); fflush(stdout);
}else{ // disaster??
printf("????\n"); fflush(stdout);
}
}
}
return 0;
}

void sockInit(void) {
socketUn = socket(AF_UNIX, SOCK_DGRAM, 0);
if (socketUn == -1){
perror("socket() : "); exit(-1);
}
addrUn.sun_family = AF_UNIX;
strcpy(addrUn.sun_path, "/tmp/tempUn");
lenUn = strlen(addrUn.sun_path) + sizeof(addrUn.sun_family);
}
/* ---------- end: sender source ----------- */

/* ---------- start: receiver source ----------- */
#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <sys/un.h>
#include <fcntl.h>
#include <errno.h>
#include <string.h>

int socketUn;
struct sockaddr_un addrUn;
int lenUn;
int size;
char datagram[2000];

void sockInit(void);

int main() {
sockInit();

while(1) {
printf("Sleep ...\n");
sleep(15);

do {
size = recvfrom(socketUn, &datagram, sizeof(datagram), 0,
(struct sockaddr*)NULL, (socklen_t *)NULL);
if (size == -1){
if (errno != EAGAIN){ // there is no data available now
perror("recvfrom() : "); exit(-1);
}
break;
}else if ( size != sizeof datagram ){
perror("recvfrom() - size : "); exit(-1);
}
printf ("Ok:"); fflush(stdout);
}
while( size!=-1 );
}
}

void sockInit(void) {
if ( unlink("/tmp/tempUn") == -1 ){
perror("unlink() : ");
}

socketUn = socket(AF_UNIX, SOCK_DGRAM, 0);
if (socketUn == -1){
perror("socket() : "); exit(-1);
}

addrUn.sun_family = AF_UNIX;

strcpy(addrUn.sun_path, "/tmp/tempUn");
lenUn = strlen(addrUn.sun_path) + sizeof(addrUn.sun_family);
if ( bind(socketUn, (struct sockaddr *)&addrUn, lenUn) == -1 ){
perror("bind() : "); exit(-1);
}

if ( fcntl(socketUn, F_SETFL, O_APPEND|O_NONBLOCK) == -1 ){
perror("fcntl() : "); exit(-1);
}
}
/* ---------- end: receiver source ----------- */

PS.
You should run receiver before sender in order to perform this test
successfully

Regards
--
Marek Kolacz

2003-06-07 00:00:59

by David Schwartz

[permalink] [raw]

Subject: RE: select for UNIX sockets?

> Since I was an initiator of this topic on one of the Polish Linux groups
> I'd like to explain some issues. We've been porting some larger piece of
> software from Solaris to Linux and problem has arisen. Below is
> corrected
> example (with errors checking after function calls), where isolated
> problem
> is presented. I hope this will cut off any suggestions that some of
> function
> calls return errors which aren't detected and handled.
>
> An experiment shows that there is no error occurrences while running
> these
> examples on Linux and sender blocks on sendto() (after sending
> _successfully_
> some datagrams to the receiver) when select() returns with ready to
> write descriptor.
>
> The same example works _correct_ on Solaris and QNX (sender blocks on
> select() call and _never_ on sendto() ).
>
> Question is:
> Am I doing something wrong or maybe there is a bug in select() function
> under Linux?

You are doing something wrong. You are using 'select' along with blocking
I/O operations. You can't make bricks without clay. If you don't want to
block, you must use non-blocking socket operations. End of story.

Just because 'select' indicates a write hit, you are not assured that some
particular write at a later time will not block. Past performance does not
guarantee future results.

Suppose, for example, a machine has two network interfaces. One is very
busy, queue full, and one is totally idle, queue empty. What do you think
'select' for write on an unconnected UDP socket should do? If you say it
should block, then it can block forever even if there's plenty of buffer
space on the network card you were going to send to. So, it can't block, it
must indicate writability.

Now, tell me what a blocking UDP write should do if the buffer is full.
Should it return success but drop the packet silently? Does that seem right
to you?

You have any number of sane choices. My suggestion is that you make the
socket non-blocking and treat an EWOULDBLOCK return as equivalent to
success. You can additionally take it as a hint that the packet will be as
if it was dropped.

DS

2003-06-07 23:55:23

by Krzysztof Halasa

[permalink] [raw]

Subject: Re: select for UNIX sockets?

"David Schwartz" <[email protected]> writes:

> You are doing something wrong. You are using 'select' along with
> blocking
> I/O operations. You can't make bricks without clay. If you don't want to
> block, you must use non-blocking socket operations. End of story.

There is a little problem here. Do you see any place for select() here?
There isn't any.

If you have a working select(), you can use (blocking or non-blocking)
I/O functions a get a) low latency b) small CPU overhead.
If you want to use non-blocking I/O, either with broken select() or
without it at all, you get either a) high latency, or b) high CPU overhead.

> Just because 'select' indicates a write hit, you are not assured
> that some
> particular write at a later time will not block. Past performance does not
> guarantee future results.

The problem is select() on UNIX datagram sockets returns immediately,
and thus it could be well substituted by a NOP. There isn't any
"performance".

> Suppose, for example, a machine has two network interfaces. One is very
> busy, queue full, and one is totally idle, queue empty. What do you think
> 'select' for write on an unconnected UDP socket should do? If you say it
> should block, then it can block forever even if there's plenty of buffer
> space on the network card you were going to send to. So, it can't block, it
> must indicate writability.

That's a little different problem, and a datagram will be transmitted by
this busy interface at last (while you will never send a datagram if nobody
is reading the socket).

Hoverer, select() doesn't work on connected sockets either (I missed
the fact the example program doesn't connect at first, but it's
unimportant here).

> You have any number of sane choices. My suggestion is that you make the
> socket non-blocking and treat an EWOULDBLOCK return as equivalent to
> success. You can additionally take it as a hint that the packet will be as
> if it was dropped.

You essentially transform a code such as:
while () {
select();
blocking_send();
}

into:

while() {
non_blocking_send();
}

Not very CPU-friendly :-(

Having working select() on at least connected sockets is a must.

intrepid:/tmp$ strace -f ./test 2>&1 |egrep 'socket|bind|connect|send|recv'

[pid 1051] socket(PF_UNIX, SOCK_DGRAM, 0) = 3
[pid 1051] bind(3, {sa_family=AF_UNIX, path="/tmp/test"}, 11) = 0
[pid 1050] socket(PF_UNIX, SOCK_DGRAM, 0) = 3
[pid 1050] connect(3, {sa_family=AF_UNIX, path="/tmp/test"}, 11) = 0
[pid 1050] send(3, "\1", 1, 0) = 1
[pid 1050] send(3, "\1", 1, 0) = 1
[pid 1050] send(3, "\1", 1, 0) = 1
[pid 1050] send(3, "\1", 1, 0) = 1
[pid 1050] send(3, "\1", 1, 0) = 1
[pid 1050] send(3, "\1", 1, 0) = 1
[pid 1050] send(3, "\1", 1, 0) = 1
[pid 1050] send(3, "\1", 1, 0) = 1
[pid 1050] send(3, "\1", 1, 0) = 1
[pid 1050] send(3, "\1", 1, 0) = 1
[pid 1050] send(3, "\1", 1, 0) = 1 <<<<< the last packet queued
[pid 1050] send(3, "\1", 1, 0 <unfinished ...> <<<<<< doesn't fit in queue
[pid 1051] recvfrom(3, "\1", 2000, 0, NULL, NULL) = 1
[pid 1051] recvfrom(3, "\1", 2000, 0, NULL, NULL) = 1
[pid 1051] recvfrom(3, "\1", 2000, 0, NULL, NULL) = 1
[pid 1051] recvfrom(3, "\1", 2000, 0, NULL, NULL) = 1
[pid 1051] recvfrom(3, "\1", 2000, 0, NULL, NULL) = 1 <<<<< makes room
[pid 1050] <... send resumed> ) = 1
[pid 1050] send(3, "\1", 1, 0) = 1
[pid 1050] send(3, "\1", 1, 0) = 1
[pid 1050] send(3, "\1", 1, 0) = 1
[pid 1050] send(3, "\1", 1, 0) = 1
[pid 1050] send(3, "\1", 1, 0 <unfinished ...>
--
Krzysztof Halasa
Network Administrator

2003-06-08 04:02:21

by Chris Friesen

[permalink] [raw]

Subject: Re: select for UNIX sockets?

David Schwartz wrote:

> You are doing something wrong. You are using 'select' along with blocking
> I/O operations. You can't make bricks without clay. If you don't want to
> block, you must use non-blocking socket operations. End of story.

That's funny, I was under the impression that the whole point of using select()
was to enable the use of blocking I/O. If you are on a uniprocessor system, in
a single thread, and select() says that a socket is writeable, then I had darn
well better be able to write to that socket!

Sure, this gets more complicated when multiprocessing or multithreading, but the
test program does neither of these.

> Just because 'select' indicates a write hit, you are not assured that some
> particular write at a later time will not block. Past performance does not
> guarantee future results.

Think about the whole reason for select()'s existance. If a single-threaded app
calls select() and is told a socket is writeable, then a write to that socket
should either immediately succeed or immediately fail (if the other socket
disappeared in between the calls, for instance).

Now granted I use non-blocking I/O out of paranoia, but even there if select()
says it is writeable and the send call returns EAGAIN then we get into a nice
little infinite loop.

select() should be reliable.

Chris

--
Chris Friesen | MailStop: 043/33/F10
Nortel Networks | work: (613) 765-0557
3500 Carling Avenue | fax: (613) 765-2986
Nepean, ON K2H 8E9 Canada | email: [email protected]

2003-06-09 02:57:47

by David Schwartz

[permalink] [raw]

Subject: RE: select for UNIX sockets?

> "David Schwartz" <[email protected]> writes:

> > You are doing something wrong. You are using 'select' along with
> > blocking
> > I/O operations. You can't make bricks without clay. If you don't want to
> > block, you must use non-blocking socket operations. End of story.

> There is a little problem here. Do you see any place for select() here?
> There isn't any.

For unconnected UDP sockets, I see no place for 'select'ing for write. No.

> If you have a working select(), you can use (blocking or non-blocking)
> I/O functions a get a) low latency b) small CPU overhead.
> If you want to use non-blocking I/O, either with broken select() or
> without it at all, you get either a) high latency, or b) high CPU
> overhead.

It is fundamental that an application that sends UDP packets must control
the transmit timing. That's just the way it is with UDP.

> > Just because 'select' indicates a write hit, you are not assured
> > that some
> > particular write at a later time will not block. Past
> > performance does not
> > guarantee future results.

> The problem is select() on UNIX datagram sockets returns immediately,
> and thus it could be well substituted by a NOP. There isn't any
> "performance".

Right. It's silly to 'select' on an unconnected UDP datagram socket. There
is no single defined buffer whose fullness or emptiness can be the subject
of the 'select'ing. It's not like TCP where there's a send queue and the
network stack is responsible for transmit pacing. With UDP, the application
is responsible for transmit pacing.

> > Suppose, for example, a machine has two network interfaces.
> One is very
> > busy, queue full, and one is totally idle, queue empty. What do
> you think
> > 'select' for write on an unconnected UDP socket should do? If you say it
> > should block, then it can block forever even if there's plenty of buffer
> > space on the network card you were going to send to. So, it
> can't block, it
> > must indicate writability.

> That's a little different problem, and a datagram will be transmitted by
> this busy interface at last (while you will never send a datagram
> if nobody
> is reading the socket).

> Hoverer, select() doesn't work on connected sockets either (I missed
> the fact the example program doesn't connect at first, but it's
> unimportant here).

It really doesn't matter. UDP applications have to control the transmit
pacing at application level. There is absolutely no way for the kernel to
know whether the path to the recipient is congested or not.

> > You have any number of sane choices. My suggestion is that
> > you make the
> > socket non-blocking and treat an EWOULDBLOCK return as equivalent to
> > success. You can additionally take it as a hint that the packet
> > will be as
> > if it was dropped.

> You essentially transform a code such as:
> while () {
> select();
> blocking_send();
> }
>
> into:
>
> while() {
> non_blocking_send();
> }
>
> Not very CPU-friendly :-(

No, no no. This is not how you write UDP applications. If you're sending
UDP, you must have a transmit scheduler somewhere.

> Having working select() on at least connected sockets is a must.

The kernel can't tell you when to send because that depends upon factors
that are remote. The application *MUST* schedule its transmissions. There's
no two ways about it.

Yes, it would be nice of the kernel helped more. But the application has to
deal with remote packet loss as well. It HAS TO decide when to send the
packet and can't rely upon the availability or unavailability of local
resources to mean anything with regard to the connection as a whole.

DS

2003-06-09 02:52:21

by David Schwartz

[permalink] [raw]

Subject: RE: select for UNIX sockets?

> David Schwartz wrote:
>
> > Yu are doing something wrong. You are using 'select' along
> > with blocking
> > I/O operations. You can't make bricks without clay. If you don't want to
> > block, you must use non-blocking socket operations. End of story.

> That's funny, I was under the impression that the whole point of
> using select()
> was to enable the use of blocking I/O.

This is a very common misconception. No, the point of 'select' is to hint
to you when you should attempt a non-blocking socket operation. It has never
been the case that 'select' guaranteed that a following blocking operation
wouldn't block.

> If you are on a
> uniprocessor system, in
> a single thread, and select() says that a socket is writeable,
> then I had darn
> well better be able to write to that socket!

No. A 'write' hit from 'select' cannot guarantee that any arbitrary 'write'
won't block unless the following 'write' is non-blocking.

Consider a TCP connection. I get a 'write' hit, and then I call 'write' to
send 250Kb of data. Do you seriously want to argue that the kernel must
somehow handle that 250Kb without blocking?

> Sure, this gets more complicated when multiprocessing or
> multithreading, but the
> test program does neither of these.

No, it's more complicated in many situations. Suppose the kernel says a
connected UDP socket won't block because the network interface the packet
would go out is unused. Before you can call 'send', the interface goes down
and the packet now has to take a congested network interface. The 'write'
will block.

> > Just because 'select' indicates a write hit, you are not
> > assured that some
> > particular write at a later time will not block. Past
> > performance does not
> > guarantee future results.

> Think about the whole reason for select()'s existance. If a
> single-threaded app
> calls select() and is told a socket is writeable, then a write to
> that socket
> should either immediately succeed or immediately fail (if the
> other socket
> disappeared in between the calls, for instance).

Sure, but you can't ensure this in all cases unless you set the socket
non-blocking. The kernel can't guarantee the future and it has no way of
knowing that it's important to the application that the following operation
not block unless yuo tell it.

> Now granted I use non-blocking I/O out of paranoia, but even
> there if select()
> says it is writeable and the send call returns EAGAIN then we get
> into a nice
> little infinite loop.

Only if the application foolishly insists. For UDP, you should treat EAGAIN
as a hint that you're sending too fast. With UDP, the application is
responsible for send timing and can't foist this responsibility on the OS by
misusing 'select'.

> select() should be reliable.

It cannot be made so without other greater losses. The perfect is the enemy
of the good. Use it as a hint.

DS

2003-06-09 16:29:26

by MarKol

[permalink] [raw]

Subject: Re: select for UNIX sockets?

Hi

----- Original Message -----
From: "David Schwartz" <[email protected]>
> Suppose, for example, a machine has two network interfaces. One is
very
> busy, queue full, and one is totally idle, queue empty. What do you
think
> 'select' for write on an unconnected UDP socket should do?

There is an internal buffer for this UDP socket. Select() should depend
on it's state.
I heard that SO_SNDLOWAT i SO_RCVLOWAT might be useful in this approach,
but it is not implemented in Linux.

Moreover my example uses AF_UNIX socket and AFAIK this should be
reliable communication.
I don't know why are you taking about network interfaces in this
context?

This quotation is taken from man select:
"
Three independent sets of descriptors are watched. Those
listed in readfds will be watched to see if characters
become available for reading (more precisely, to see if a
read will not block - in particular, a file descriptor is
also ready on end-of-file), those in writefds will be
watched to see if a write will not block, and those in
exceptfds will be watched for exceptions."

and this from man socket:
" Socket creates an endpoint for communication and returns a
descriptor. "

I'm aware of the fact that my english is rather poor, but I see that
socket returns a descriptor, and select is watching descriptors and
returns descriptors ready for writing if a write operation will not
block.

I would agree with you if my program wouldn't work on Solaris or QNX.
But it works on both and it looks consistent with man!

Regards
--
Marek Kolacz

2003-06-09 16:51:45

by David Schwartz

[permalink] [raw]

Subject: RE: select for UNIX sockets?

> From: "David Schwartz" <[email protected]>

> > Suppose, for example, a machine has two network interfaces. One is
> > very
> > busy, queue full, and one is totally idle, queue empty. What do you
> > think
> > 'select' for write on an unconnected UDP socket should do?

> There is an internal buffer for this UDP socket. Select() should depend
> on it's state.
> I heard that SO_SNDLOWAT i SO_RCVLOWAT might be useful in this approach,
> but it is not implemented in Linux.

What you are suggesting just can't work for an unconnected socket. The
kernel has no idea where the next packet you send is going to go. It could
be along an uncongested path, it could be along a congested path. If there
were a single buffer, then packets bound for congested paths would block
packets bound for uncongested paths. UDP streaming media programs would
effectively reduce their throughput to that of their slowest local path.

> This quotation is taken from man select:
> "
> Three independent sets of descriptors are watched. Those
> listed in readfds will be watched to see if characters
> become available for reading (more precisely, to see if a
> read will not block - in particular, a file descriptor is
> also ready on end-of-file), those in writefds will be
> watched to see if a write will not block, and those in
> exceptfds will be watched for exceptions."
>
> and this from man socket:
> " Socket creates an endpoint for communication and returns a
> descriptor. "
>
> I'm aware of the fact that my english is rather poor, but I see that
> socket returns a descriptor, and select is watching descriptors and
> returns descriptors ready for writing if a write operation will not
> block.

Right, if some arbitrary write operation will not block. There's no
guarantee about a particular future write operation. Consider TCP and a
non-blocking 250Kb write.

> I would agree with you if my program wouldn't work on Solaris or QNX.
> But it works on both and it looks consistent with man!

I think too much of the context has been lost for me to reply specifically.
I don't remember which specific case you were dealing with. But the
fundamental point I'm trying to make is this -- you cannot guarantee
non-blocking operation with blocking socket calls. You can't make bricks
without straw.

If you don't ever want to block, you *MUST* communicate this to the kernel
by issuing non-blocking operations. You *CANNOT* rely on 'select' to provide
you accurate information about arbitrary future operations. The kernel
simply cannot, in principle, provide such guarantees.

I'm providing examples of cases where the 'guarantee' doesn't hold not
because those specific examples are the cases you are hitting. I'm providing
them to show you that you don't have a guarantee.

DS

2003-06-09 17:21:16

by Krzysztof Halasa

[permalink] [raw]

Subject: Re: select for UNIX sockets?

"David Schwartz" <[email protected]> writes:

> It really doesn't matter. UDP applications have to control the transmit
> pacing at application level. There is absolutely no way for the kernel to
> know whether the path to the recipient is congested or not.

Because what? The kernel knows everything it has to know - i.e. complete
state of socket queue in question.

But if select() on sockets is illegal, should we make it return -Esth
instead of success. Certainly, we should get rid of invalid kernel code,
right?

> The kernel can't tell you when to send because that depends upon
> factors
> that are remote.

Such as?

> Yes, it would be nice of the kernel helped more. But the application
> has to
> deal with remote packet loss as well.

Could you please show me a place in the kernel which could cause such
a loss on local datagram sockets?
--
Krzysztof Halasa
Network Administrator

2003-06-09 17:42:00

by David Schwartz

[permalink] [raw]

Subject: RE: select for UNIX sockets?

> "David Schwartz" <[email protected]> writes:

> > It really doesn't matter. UDP applications have to control
> > the transmit
> > pacing at application level. There is absolutely no way for the
> > kernel to
> > know whether the path to the recipient is congested or not.

> Because what? The kernel knows everything it has to know - i.e. complete
> state of socket queue in question.

For the last time, there is no socket queue. You wouldn't want there to be
one.

Consider a UDP application that is sending packets to two destinations, one
over a 56Kbps serial link running PPP and one over gigabit Ethernet. If
there were a socket send queue, the packets going over the 56Kbps serial
link would block the packets going over the gigabit Ethernet.

> But if select() on sockets is illegal, should we make it return -Esth
> instead of success. Certainly, we should get rid of invalid kernel code,
> right?

No, it is legal, you are just misusing it. If you don't want your socket
operations to ever block, use non-blocking socket operations. If you use
UDP, or another connectionless protocol, you should understand that *you*
are responsible for transmit pacing.

> > The kernel can't tell you when to send because that depends upon
> > factors
> > that are remote.

> Such as?

Such as where the packet you send is actually *going*.

> > Yes, it would be nice of the kernel helped more. But the application
> > has to
> > deal with remote packet loss as well.

> Could you please show me a place in the kernel which could cause such
> a loss on local datagram sockets?

I guess I'm not getting through. The fact is, you don't have the guarantee
that you think you have. I'm giving you examples to show you why you don't
have that guarantee. You argue that the examples don't apply to your
specific case. I'm not saying they do. I'm saying that because there are
unavoidable cases where what you're trying to do won't work, then what
you're trying to do is not guaranteed to work in all cases and you shouldn't
try to do it.

The kernel does not remember that you got a write hit on 'select' and use
it to somehow ensure that your next 'write' doesn't block. A 'write' hit
from 'select' is just a hint and not an absolute guarantee that whatever
'write' operation you happen to choose to do won't block.

DS

2003-06-09 22:27:58

by James Stevenson

[permalink] [raw]

Subject: Re: select for UNIX sockets?

On 9 Jun 2003, Krzysztof Halasa wrote:
> "David Schwartz" <[email protected]> writes:
>
> > It really doesn't matter. UDP applications have to control the transmit
> > pacing at application level. There is absolutely no way for the kernel to
> > know whether the path to the recipient is congested or not.
>
> Because what? The kernel knows everything it has to know - i.e. complete
> state of socket queue in question.

yes it does when you call select

take a probgram thats sharing the same cosket between 2 processes
or a multithreaded program sharing any socket from the time
that select is called and read / write is called the data
or buffer form the socket could have been completly filled or completly
emptyed.

> But if select() on sockets is illegal, should we make it return -Esth
> instead of success. Certainly, we should get rid of invalid kernel code,
> right?

nobody said it was illegla but in certin situations it
might as well count as a nop;

> > The kernel can't tell you when to send because that depends upon
> > factors
> > that are remote.
>
> Such as?

if you are on a udp socket say you have host a host b and host c
host a and host b are on the same network host c is on another networked
connected by 512kbit link (faster / slower) and you are calling select on
host a. host b blast a silly amount of data to host c
host a does the same. What happen ? who wins ?

> > Yes, it would be nice of the kernel helped more. But the application
> > has to
> > deal with remote packet loss as well.
>
> Could you please show me a place in the kernel which could cause such
> a loss on local datagram sockets?
>

Same as a multithreaded program select could say its ok to write when you
write data it could be a completly different story

2003-06-09 22:55:45

by Krzysztof Halasa

[permalink] [raw]

Subject: Re: select for UNIX sockets?

"David Schwartz" <[email protected]> writes:

> For the last time, there is no socket queue. You wouldn't want there
> to be
> one.

Sure. No queue. Of course.

And these are only misleading names - net/unix/af_unix.c:
static int unix_dgram_sendmsg(struct kiocb *kiocb, struct socket *sock,
struct msghdr *msg, int len)
{
...

if (unix_peer(other) != sk &&
skb_queue_len(&other->receive_queue) > other->max_ack_backlog) {

and then

skb_queue_tail(&other->receive_queue, skb);
unix_state_runlock(other);
other->data_ready(other, len);
sock_put(other);

Right?

> Consider a UDP application that is sending packets to two
> destinations, one
> over a 56Kbps serial link running PPP and one over gigabit Ethernet. If
> there were a socket send queue, the packets going over the 56Kbps serial
> link would block the packets going over the gigabit Ethernet.

First, PPP and Ethernet use IP/UDP and not local UNIX sockets.
Second, I hope you don't want to tell me PPP and Ethernet have no
device queues, do you? Sure there are "virtual" devices with no queue,
but that's another story.
Have you checked what the above scenario would do? I guess the PPP would
really limit the rate if you used only one socket.

Having no per-sender socket queue for UDP/IP is totally irrelevant here.

> > But if select() on sockets is illegal, should we make it return -Esth
> > instead of success. Certainly, we should get rid of invalid kernel code,
> > right?
>
> No, it is legal, you are just misusing it. If you don't want your
> socket
> operations to ever block, use non-blocking socket operations. If you use
> UDP, or another connectionless protocol, you should understand that *you*
> are responsible for transmit pacing.

I'm not talking about I/O operation, the problem is in select().
Tell me - how can I use select() with UNIX local sockets (wrt write
descriptors) and it's effectively not a NOP?

> Such as where the packet you send is actually *going*.

It's going to another local socket, of course. man bind/connect and see
a strace log I've posted. The kernel know this very well.

> I guess I'm not getting through. The fact is, you don't have the
> guarantee
> that you think you have.

I don't want a guarantee. I want the select() doing what it has to do.
I.e. checking if the receiving queue (which, of course, does not exist),
connected with connect() first, has a room for a datagram. Unconnected
sockets can be dealt with later.
I see it's me not getting through - do you want select() (wrt write
descriptors) on UNIX datagram sockets effectively a NOP?

The question if send/sendto/etc() will actually block is, of course,
another matter. For example, the (nonexistent) queue could be filled
by another process between my calls to select() and send(). I have no
problem with send() blocking but I want select() to check if the
connected socket could accept anything at all at the time select()
is called (and wait for such condition/timeout otherwise).

> I'm giving you examples to show you why you don't
> have that guarantee. You argue that the examples don't apply to your
> specific case.

Sure, UDP/IP has nothing to do with UNIX sockets.

> I'm not saying they do. I'm saying that because there are
> unavoidable cases where what you're trying to do won't work, then what
> you're trying to do is not guaranteed to work in all cases and you shouldn't
> try to do it.

What *I* should do is really unimportant here. What the *kernel* should do
is all that now matters.

> The kernel does not remember that you got a write hit on 'select'
> and use
> it to somehow ensure that your next 'write' doesn't block. A 'write' hit
> from 'select' is just a hint and not an absolute guarantee that whatever
> 'write' operation you happen to choose to do won't block.

A "write" hit from select() is not a hit - it's exactly nothing and this
is the problem.
Have you at least looked at the actual code? unix_dgram_sendmsg() and
datagram_poll()?
--
Krzysztof Halasa
Network Administrator

2003-06-10 13:08:27

by Timothy Miller

[permalink] [raw]

Subject: Re: select for UNIX sockets?

Krzysztof Halasa wrote:
> "David Schwartz" <[email protected]> writes:

>> The kernel does not remember that you got a write hit on 'select'
>>and use
>>it to somehow ensure that your next 'write' doesn't block. A 'write' hit
>>from 'select' is just a hint and not an absolute guarantee that whatever
>>'write' operation you happen to choose to do won't block.
>
>
> A "write" hit from select() is not a hit - it's exactly nothing and this
> is the problem.
> Have you at least looked at the actual code? unix_dgram_sendmsg() and
> datagram_poll()?

I think the issue here is not what it means when select() returns but
what it means when it DOESN'T return (well, blocks).

In my understanding, one of select()'s purposes is to keep processes
from having to busy-wait, burning CPU for nothing. Your guarantee with
select() is that if it blocks, then the write target(s) definately
cannot accept data. The inverse is not true, although the inverse is
very likely: if select() does not block, then it's extremely likely
that the target can accept SOME data. But it it certainly can't accept
ALL data you want to give it if you want to give it a lot of data.

If you were to use blocking writes, and you sent too much data, then you
would block. If you were to use non-blocking writes, then the socket
would take as much data as it could, then return from write() with an
indication of how much data actually got sent. Then you call select()
again so as to wait for your next opportunity to send some more of your
data.

It may be that some operating systems have large or expandable queues
for UNIX sockets. As a result, you have been able to send a lot of data
with a blocking write without it blocking. I can see how it would be an
advantage to function that way, up to a certain point, after which you
start eating too much memory for your queue. However, what you have
experienced is not universally guaranteed behavior. What Linux does is
canonically correct; it's just a variant that you're not used to. If
you were to change your approach to fit the standard, then you would get
more consistent behavior across multiple platforms.

Up to this point, I believe you have been riding on luck, not guaranteed
behavior.

2003-06-10 13:36:46

by Richard B. Johnson

[permalink] [raw]

Subject: Re: select for UNIX sockets?

On Tue, 10 Jun 2003, Timothy Miller wrote:

>
>
> Krzysztof Halasa wrote:
> > "David Schwartz" <[email protected]> writes:
>
> >> The kernel does not remember that you got a write hit on 'select'
> >>and use
> >>it to somehow ensure that your next 'write' doesn't block. A 'write' hit
> >>from 'select' is just a hint and not an absolute guarantee that whatever
> >>'write' operation you happen to choose to do won't block.
> >
> >
> > A "write" hit from select() is not a hit - it's exactly nothing and this
> > is the problem.
> > Have you at least looked at the actual code? unix_dgram_sendmsg() and
> > datagram_poll()?
>
>
> I think the issue here is not what it means when select() returns but
> what it means when it DOESN'T return (well, blocks).
>
> In my understanding, one of select()'s purposes is to keep processes
> from having to busy-wait, burning CPU for nothing. Your guarantee with
> select() is that if it blocks, then the write target(s) definately
> cannot accept data. The inverse is not true, although the inverse is
> very likely: if select() does not block, then it's extremely likely
> that the target can accept SOME data. But it it certainly can't accept
> ALL data you want to give it if you want to give it a lot of data.
>
> If you were to use blocking writes, and you sent too much data, then you
> would block. If you were to use non-blocking writes, then the socket
> would take as much data as it could, then return from write() with an
> indication of how much data actually got sent. Then you call select()
> again so as to wait for your next opportunity to send some more of your
> data.
>
> It may be that some operating systems have large or expandable queues
> for UNIX sockets. As a result, you have been able to send a lot of data
> with a blocking write without it blocking. I can see how it would be an
> advantage to function that way, up to a certain point, after which you
> start eating too much memory for your queue. However, what you have
> experienced is not universally guaranteed behavior. What Linux does is
> canonically correct; it's just a variant that you're not used to. If
> you were to change your approach to fit the standard, then you would get
> more consistent behavior across multiple platforms.
>
> Up to this point, I believe you have been riding on luck, not guaranteed
> behavior.
>

The behavior of select and poll for read and write must be somewhat
different when it comes to blocking. This is because of the difference
between read() and write().

If I attempt to read() N bytes and there is only one byte available,
read() will immediately return with the one byte. Its return value
will show one byte was received even though the caller "requested"
N bytes. This is the expected behavior. Therefore when select() shows
that data are available, there may be only one byte available.

On a write(), the caller expects that all the bytes being written
will, in fact occur eventually occur. However select() doesn't
know how many bytes the caller expects to write. It only knows that
the caller can write some. This "some" in principle, may be only
one byte. The user of select and write needs to be prepared to
make multiple calls to complete the writing of large buffers.

To do this, one can set the write() file-descriptor to non-blocking
and simply make repeated write() calls, checking the return value
(and errno) each time. However, this wastes CPU cycles. Instead
of this, the caller can still keep a non-blocking file-descriptor,
but sleep in select() or poll() until the kernel has space for
more bytes. The kernel can never guarantee, nor is there a way to
show, space for N bytes. The caller just has to walk up the buffer
and make as many calls as necessary.

Cheers,
Dick Johnson
Penguin : Linux version 2.4.20 on an i686 machine (797.90 BogoMips).
Why is the government concerned about the lunatic fringe? Think about it.

2003-06-10 14:08:47

by Krzysztof Halasa

[permalink] [raw]

Subject: Re: select for UNIX sockets?

Timothy Miller <[email protected]> writes:

> If you were to use blocking writes, and you sent too much data, then
> you would block. If you were to use non-blocking writes, then the
> socket would take as much data as it could, then return from write()
> with an indication of how much data actually got sent. Then you call
> select() again so as to wait for your next opportunity to send some
> more of your data.

This is all true in general but in this particular case of unix datagram
sockets select (poll) is just buggy.
--
Krzysztof Halasa
Network Administrator

2003-06-10 18:54:27

by Jesse Pollard

[permalink] [raw]

Subject: Re: select for UNIX sockets?

On Tuesday 10 June 2003 09:21, Krzysztof Halasa wrote:
> Timothy Miller <[email protected]> writes:
> > If you were to use blocking writes, and you sent too much data, then
> > you would block. If you were to use non-blocking writes, then the
> > socket would take as much data as it could, then return from write()
> > with an indication of how much data actually got sent. Then you call
> > select() again so as to wait for your next opportunity to send some
> > more of your data.
>
> This is all true in general but in this particular case of unix datagram
> sockets select (poll) is just buggy.

Ahh no.

The following is from Solaris 2.8 manpage on poll:

The poll() function supports regular files, terminal and
pseudo-terminal devices, STREAMS-based files, FIFOs and
pipes. The behavior of poll() on elements of fds that refer
to other types of file is unspecified.

The poll() function supports sockets.

A file descriptor for a socket that is listening for connec-
tions will indicate that it is ready for reading, once con-
nections are available. A file descriptor for a socket that
is connecting asynchronously will indicate that it is ready
for writing, once a connection has been established.

As in: ALWAYS ready to write as soon as a connection is made. It can
still block on a write if the amount to write is larger than the buffer
available. Nothing is said about the AMOUNT that can be written
(though with most FIFOs/pipes the limit is ~ 4K, though not guaranteed
since other writers may fill it between the poll and the write.

The select function (3c):

The select() function supports regular files, terminal and
pseudo-terminal devices, STREAMS-based files, FIFOs and
pipes. The behavior of select() on file descriptors that
refer to other types of file is unspecified.

And the following on it's use on sockets:

A file descriptor for a socket that is listening for connec-
tions will indicate that it is ready for reading, when con-
nections are available. A file descriptor for a socket that
is connecting asynchronously will indicate that it is ready
for writing, when a connection has been established.

as in "READY for writing", not that it won't block when you DO write.

(Also "READY for reading", not that it won't block when you DO read.)

You've been lucky to have relatively idle systems or large memory
systems.

I suspect you actually were blocking, just not for very long.

2003-06-10 21:28:27

by David Schwartz

[permalink] [raw]

Subject: RE: select for UNIX sockets?

> "David Schwartz" <[email protected]> writes:

> > For the last time, there is no socket queue. You wouldn't want there
> > to be
> > one.

> Sure. No queue. Of course.

Yep.

> And these are only misleading names - net/unix/af_unix.c:
> static int unix_dgram_sendmsg(struct kiocb *kiocb, struct socket *sock,
> struct msghdr *msg, int len)
> {
> ...
>
> if (unix_peer(other) != sk &&
> skb_queue_len(&other->receive_queue) >
> other->max_ack_backlog) {

That looks like a *receive* queue to me. We were talking about selecting
for writes, weren't we.

> and then
>
> skb_queue_tail(&other->receive_queue, skb);
> unix_state_runlock(other);
> other->data_ready(other, len);
> sock_put(other);
>
> Right?

Looks like another receive queue to me. There is no send queue and you
wouldn't want there to be one.

> > Consider a UDP application that is sending packets to two
> > destinations, one
> > over a 56Kbps serial link running PPP and one over gigabit Ethernet. If
> > there were a socket send queue, the packets going over the 56Kbps serial
> > link would block the packets going over the gigabit Ethernet.

> First, PPP and Ethernet use IP/UDP and not local UNIX sockets.
> Second, I hope you don't want to tell me PPP and Ethernet have no
> device queues, do you? Sure there are "virtual" devices with no queue,
> but that's another story.
> Have you checked what the above scenario would do? I guess the PPP would
> really limit the rate if you used only one socket.

They have device queues, they have no socket send queues.

> Having no per-sender socket queue for UDP/IP is totally irrelevant here.

It is relevent. Because when you select for write, you're trying to find
out whether there's space to write to the socket. That would require there
to be something for there to be space in or not to be space in. Whatever you
want to call that (I call it a 'socket send queue', but it doesn't matter)
that queue doesn't exist for UDP and you wouldn't want it to exist.

With UDP, or any connectionless protocol, the application is ultimately
responsible for transmit pacing. You could argue that it would be nice if
the kernel helped out more than it currently does, but it has no obligation
to do so.

DS

2003-06-11 12:38:11

by Edgar Toernig

[permalink] [raw]

Subject: Re: select for UNIX sockets?

Krzysztof Halasa wrote:
>
> Timothy Miller <[email protected]> writes:
>
> > If you were to use blocking writes, and you sent too much data, then
> > you would block. If you were to use non-blocking writes, then the
> > socket would take as much data as it could, then return from write()
> > with an indication of how much data actually got sent. Then you call
> > select() again so as to wait for your next opportunity to send some
> > more of your data.
>
> This is all true in general but in this particular case of unix datagram
> sockets select (poll) is just buggy.

Do you want to install a magic crystal ball in the kernel? :-)

For select to properly block on write it has to know the destination of
the write. For unconnected sockets you haven't told the destination.
There's know way for the kernel to know at select time which receiver
to check for free space.

You were talking about a "send queue". I guess you think it should
work like: if destination has enough room move data to destination
else queue it in the send queue. Then select would check whether the
*send queue* has enough space for another packet. But that would mean
that a single slow receiver would block all others. I.e. /tmp/a is
slow; you fill the queue; select blocks even when you actually want to
send to /tmp/b which has plenty of space.

Ciao, ET.

2003-06-11 21:43:39

by Krzysztof Halasa

[permalink] [raw]

Subject: Re: select for UNIX sockets?

Jesse Pollard <[email protected]> writes:

> As in: ALWAYS ready to write as soon as a connection is made. It can
> still block on a write if the amount to write is larger than the buffer
> available. Nothing is said about the AMOUNT that can be written
> (though with most FIFOs/pipes the limit is ~ 4K, though not guaranteed
> since other writers may fill it between the poll and the write.

Still, it is local (UNIX) datagram socket and thus the number of
datagrams is the limit, not the number of bytes. And yes, the problem
is present with 1-byte datagrams. And still, the problem is with select()
and not with send*().

It is *not* FIFO/pipe/stream socket etc. And not a UDP/IP socket either.

> A file descriptor for a socket that is listening for connec-
> tions will indicate that it is ready for reading, when con-
> nections are available. A file descriptor for a socket that
> is connecting asynchronously will indicate that it is ready
> for writing, when a connection has been established.
>
> as in "READY for writing", not that it won't block when you DO write.
>
> (Also "READY for reading", not that it won't block when you DO read.)

I think the above covers only stream-based connection establishing.

> You've been lucky to have relatively idle systems or large memory
> systems.
>
> I suspect you actually were blocking, just not for very long.

No. It is just something completely different.

I understand a system can have no such support. But, currently, Linux
has support which is broken.
--
Krzysztof Halasa
Network Administrator

2003-06-11 21:50:43

by Krzysztof Halasa

[permalink] [raw]

Subject: Re: select for UNIX sockets?

"David Schwartz" <[email protected]> writes:

> Looks like another receive queue to me. There is no send queue and you
> wouldn't want there to be one.

So?

> They have device queues, they have no socket send queues.

Well?

> > Having no per-sender socket queue for UDP/IP is totally irrelevant here.
>
> It is relevent. Because when you select for write, you're trying to
> find
> out whether there's space to write to the socket.

Which socket? IP/UDP or UNIX one? You know, UNIX sockets are a little
special - both ends are on the same machine. This is why the sending
routine can check the receiving queue length.

> That would require there
> to be something for there to be space in or not to be space in. Whatever you
> want to call that (I call it a 'socket send queue', but it doesn't matter)
> that queue doesn't exist for UDP and you wouldn't want it to exist.

Sure.

> With UDP, or any connectionless protocol, the application is ultimately
> responsible for transmit pacing.

Still, this is all irrelevant, this is a kernel-only issue.

> You could argue that it would be nice if
> the kernel helped out more than it currently does, but it has no obligation
> to do so.

You're missing the fact that the kernel _has_ code to help but this
_existing_ code is broken (and yes, it was fine in earlier kernels).
--
Krzysztof Halasa
Network Administrator

2003-06-11 22:37:16

by David Schwartz

[permalink] [raw]

Subject: RE: select for UNIX sockets?

> Jesse Pollard <[email protected]> writes:

> > As in: ALWAYS ready to write as soon as a connection is made. It can
> > still block on a write if the amount to write is larger than the buffer
> > available. Nothing is said about the AMOUNT that can be written
> > (though with most FIFOs/pipes the limit is ~ 4K, though not guaranteed
> > since other writers may fill it between the poll and the write.

> Still, it is local (UNIX) datagram socket and thus the number of
> datagrams is the limit, not the number of bytes. And yes, the problem
> is present with 1-byte datagrams. And still, the problem is with select()
> and not with send*().

Can you find me where any standard says that local UNIX datagrams can't
have lengths of less than one byte? It's really this simple -- you don't
have the guarantee you think you have. Yes, the kernel could be nicer. But
what you're trying to do (foist on the kernel the job of transmit scheduling
on a connectionless socket) is fundamentally wrong.

DS