2004-10-07 13:00:41

by George Spelvin

[permalink] [raw]
Subject: [PATCH] Re: UDP recvmsg blocks after select(), 2.6 bug?

How about the following? Should I make a similar addition to
poll(2)?

Legalese:
These changes are works of original authorship.
These changes are hereby released into the public domain; copyright abandoned.

--- man2/select.2.old 2004-10-07 07:58:46.000000000 -0400
+++ man2/select.2 2004-10-07 08:38:24.000000000 -0400
@@ -170,7 +170,7 @@
.IR sigmask ,
avoiding the race.)
Since Linux today does not have a
-.IR pselect ()
+.BR pselect ()
system call, the current glibc2 routine still contains this race.
.SS "The timeout"
The time structures involved are defined in
@@ -291,6 +291,18 @@
return 0;
}
.fi
+.SH BUGS
+.B pselect
+is currently emulated with a user-space wrapper that has a race condition.
+For reliable (and more portable) signal trapping, use the self-pipe trick.
+(Where a signal handler writes to a pipe whose other end is read by the
+main loop.)
+
+.B select
+and
+.B pselect
+permit blocking file descritprs in the fd_sets, even though
+there is no valid reason for a program to do this.
.SH "CONFORMING TO"
4.4BSD (the
.B select
@@ -315,6 +327,39 @@
.I fd
to be a valid file descriptor.

+When
+.B select
+indicates that a file descriptor is ready, this is only a strong hint,
+not a guarantee, that a read or write is possible without blocking.
+For this reason, the associated file descriptors must always be in
+non-blocking mode (see
+.BR fcntl (2))
+in a correct program. Reasons why the I/O could block include:
+.TP
+(i)
+Another process may have performed I/O on the
+.I fd
+in the meantime.
+.TP
+(ii)
+Some needed kernel buffer space may have been consumed for reasons
+totally unrelated to this I/O, or
+.TP
+(iii)
+Since 2.4.x, Linux has overlapped UDP checksum verification with
+copying to user-space. If a UDP packet arrives,
+.B select
+will indicate that data is ready, but during the read, if the checksum is
+bad, the packet will disappear and (if no subsequent packet with a
+valid checksum is waiting) the read will indicate that no data is available.
+.PP
+In general, it is legal for
+.B select
+to make some optimistic assumptions, subject to later verification by the
+subsequent I/O, as long as this does not result in a busy-loop where
+.B select
+is stuck thinking data is ready when it is not.
+
Concerning the types involved, the classical situation is that
the two fields of a struct timeval are longs (as shown above),
and the struct is defined in


2004-10-08 11:46:29

by Andries E. Brouwer

[permalink] [raw]
Subject: Re: [PATCH] Re: UDP recvmsg blocks after select(), 2.6 bug?

On Thu, Oct 07, 2004 at 12:49:09PM -0000, [email protected] wrote:

> How about the following?

> +.B pselect
> +is currently emulated with a user-space wrapper that has a race condition.
> +For reliable (and more portable) signal trapping, use the self-pipe trick.
> +(Where a signal handler writes to a pipe whose other end is read by the
> +main loop.)

Thanks, added.

> +.B select
> +and
> +.B pselect
> +permit blocking file descritprs in the fd_sets, even though
> +there is no valid reason for a program to do this.

Hmm.

> +When
> +.B select
> +indicates that a file descriptor is ready, this is only a strong hint,
> +not a guarantee, that a read or write is possible without blocking.

Yes, perhaps.
It was easy to decide not to take your text, since it is far too long.

Presently I have

Under Linux, select may report a socket file descriptor as
"ready for reading", while nevertheless a subsequent read
blocks. This could for example happen when data has
arrived but upon examination has wrong checksum and is
discarded. There may be other circumstances. Thus it may
be safer to use O_NONBLOCK on sockets that should not
block.


Your above few lines on the self-pipe trick suffice for select.2,
but I wouldnt mind if you (or anybody else) wrote something a bit
more explicit in select_tut.2.

Andries