2003-06-18 08:36:11

by Eli Barzilay

[permalink] [raw]
Subject: Problem with select?

Hello,

When I run the following program, and block the terminal's output
(C-s), the `select' doesn't seem to have any effect, resulting in a
100% cpu usage (this is on a RH8, with 2.4.18). I wouldn't be
surprised if I'm doing something stupid, but it does seem to work fine
on Solaris.

Is there anything wrong with this, or is this some bug?

======================================================================
#include <unistd.h>
#include <fcntl.h>
int main() {
int flags, fd, len; fd_set writefds;
fd = 1;
flags = fcntl(fd, F_GETFL, 0);
fcntl(fd, F_SETFL, flags | O_NONBLOCK);
while (1) {
FD_ZERO(&writefds);
FD_SET(fd, &writefds);
len = select(fd + 1, NULL, &writefds, NULL, NULL);
if (!FD_ISSET(fd,&writefds)) exit(0);
len = write(fd, "hi\n", 3);
}
fcntl(fd, F_SETFL, flags);
}
======================================================================

--
((lambda (x) (x x)) (lambda (x) (x x))) Eli Barzilay:
http://www.barzilay.org/ Maze is Life!


2003-07-24 05:13:09

by Eli Barzilay

[permalink] [raw]
Subject: Repost: Bug with select?


[This is a second post, since I didn't get any replies the first time.
It looks more like a bug now, which sounds strange for something that
common...]

When I run the following program, and block the terminal's output
(C-s), the `select' doesn't seem to have any effect, resulting in a
100% cpu usage (this is on a RH8, with 2.4.18). I wouldn't be
surprised if I'm doing something stupid, but it does seem to work fine
on Solaris.

Is there anything wrong with this, or is this some bug?

======================================================================
#include <unistd.h>
#include <fcntl.h>
int main() {
int flags, fd, len; fd_set writefds;
fd = 1;
flags = fcntl(fd, F_GETFL, 0);
fcntl(fd, F_SETFL, flags | O_NONBLOCK);
while (1) {
FD_ZERO(&writefds);
FD_SET(fd, &writefds);
len = select(fd + 1, NULL, &writefds, NULL, NULL);
if (!FD_ISSET(fd,&writefds)) exit(0);
len = write(fd, "hi\n", 3);
}
fcntl(fd, F_SETFL, flags);
}
======================================================================

--
((lambda (x) (x x)) (lambda (x) (x x))) Eli Barzilay:
http://www.barzilay.org/ Maze is Life!

2003-07-25 13:26:51

by Marco Roeland

[permalink] [raw]
Subject: Re: Repost: Bug with select?

On Thursday July 24th 2003 at 01:28 uur Eli Barzilay wrote:

> When I run the following program, and block the terminal's output
> (C-s), the `select' doesn't seem to have any effect, resulting in a
> 100% cpu usage (this is on a RH8, with 2.4.18). I wouldn't be
> surprised if I'm doing something stupid, but it does seem to work fine
> on Solaris.
>
> Is there anything wrong with this, or is this some bug?
>
> ======================================================================
> #include <unistd.h>
> #include <fcntl.h>
> int main() {
> int flags, fd, len; fd_set writefds;
> fd = 1;
> flags = fcntl(fd, F_GETFL, 0);
> fcntl(fd, F_SETFL, flags | O_NONBLOCK);

You use non-blocking mode here.

> while (1) {
> FD_ZERO(&writefds);
> FD_SET(fd, &writefds);
> len = select(fd + 1, NULL, &writefds, NULL, NULL);

A select with no timeout, so it will immediately return.

> if (!FD_ISSET(fd,&writefds)) exit(0);

This might be what Solaris does differently, by _not_ including '1' in
the returned descriptors? Linux will say (rightly) that a following call
will not block, which is something very different than 'will not fail'!

> len = write(fd, "hi\n", 3);

You don't check the exit status here, but when you press Ctrl-C (stdout
blocked) it will indicate an error here (exit status -1) with errno set
to EAGAIN, meaning you should try again, which is the appropriate result
for a non-blocking descriptor or socket here. Anyway, the call "succeeds" and
we loop back into the while(1), indeed as you say creating a busy loop.
No surprises there I'd say.

> }
> fcntl(fd, F_SETFL, flags);
> }

You might start by checking for EAGAIN as result of the write, and then
reacting according to your needs (waiting a while or exiting the
program or whatever).

2003-07-26 00:05:15

by Ben Greear

[permalink] [raw]
Subject: Re: Repost: Bug with select?

Marco Roeland wrote:
> On Thursday July 24th 2003 at 01:28 uur Eli Barzilay wrote:
>
>
>>When I run the following program, and block the terminal's output
>>(C-s), the `select' doesn't seem to have any effect, resulting in a
>>100% cpu usage (this is on a RH8, with 2.4.18). I wouldn't be
>>surprised if I'm doing something stupid, but it does seem to work fine
>>on Solaris.
>>
>>Is there anything wrong with this, or is this some bug?
>>
>>======================================================================
>>#include <unistd.h>
>>#include <fcntl.h>
>>int main() {
>> int flags, fd, len; fd_set writefds;
>> fd = 1;
>> flags = fcntl(fd, F_GETFL, 0);
>> fcntl(fd, F_SETFL, flags | O_NONBLOCK);
>
>
> You use non-blocking mode here.
>
>
>> while (1) {
>> FD_ZERO(&writefds);
>> FD_SET(fd, &writefds);
>> len = select(fd + 1, NULL, &writefds, NULL, NULL);
>
>
> A select with no timeout, so it will immediately return.
>
>
>> if (!FD_ISSET(fd,&writefds)) exit(0);
>
>
> This might be what Solaris does differently, by _not_ including '1' in
> the returned descriptors? Linux will say (rightly) that a following call
> will not block, which is something very different than 'will not fail'!

I thought select is supposed to tell you when you can read/write at least something without
failing. Otherwise it would be worthless when doing non-blocking IO because you can
both read and write w/out blocking at all times. If you run similar code on a tcp
socket instead of std-out, do you see the same busy spin? (To do it right, make
sure the network between source and destination is slower than the CPU can handle,
ie 10bt hub.)


--
Ben Greear <[email protected]>
Candela Technologies Inc http://www.candelatech.com


2003-07-26 00:20:20

by Philippe Troin

[permalink] [raw]
Subject: Re: Repost: Bug with select?

Eli Barzilay <[email protected]> writes:

> [This is a second post, since I didn't get any replies the first time.
> It looks more like a bug now, which sounds strange for something that
> common...]
>
> When I run the following program, and block the terminal's output
> (C-s), the `select' doesn't seem to have any effect, resulting in a
> 100% cpu usage (this is on a RH8, with 2.4.18). I wouldn't be
> surprised if I'm doing something stupid, but it does seem to work fine
> on Solaris.
>
> Is there anything wrong with this, or is this some bug?
>
> ======================================================================
> #include <unistd.h>
> #include <fcntl.h>
> int main() {
> int flags, fd, len; fd_set writefds;
> fd = 1;
> flags = fcntl(fd, F_GETFL, 0);
> fcntl(fd, F_SETFL, flags | O_NONBLOCK);
> while (1) {
> FD_ZERO(&writefds);
> FD_SET(fd, &writefds);
> len = select(fd + 1, NULL, &writefds, NULL, NULL);
> if (!FD_ISSET(fd,&writefds)) exit(0);
> len = write(fd, "hi\n", 3);
> }
> fcntl(fd, F_SETFL, flags);
> }
> ======================================================================

Looks like a bug to me.
Strace says:

select(2, NULL, [1], NULL, NULL) = 1 (out [1])
write(1, "hi\n", 3) = -1 EAGAIN (Resource temporarily unavailable)

forever.

Then select() should not return fd 1 as writable, at least not
reapeatedly.

Phil.

2003-07-26 08:50:49

by Marco Roeland

[permalink] [raw]
Subject: Re: Repost: Bug with select?

On Friday July 25th 2003 at 17:20 uur Ben Greear wrote:

> I thought select is supposed to tell you when you can read/write at least
> something without failing. Otherwise it would be worthless when doing
> non-blocking IO because you can both read and write w/out blocking at all
> times. If you run similar code on a tcp socket instead of std-out, do you see
> the same busy spin? (To do it right, make sure the network between source and
> destination is slower than the CPU can handle, ie 10bt hub.)

My 'analysis' was indeed based on experience with sockets, where you
don't get the busy spin. It's indeed a bit baffling why select keeps
insisting that fd 1 is writable. A quick test on kernel versions
2.2.12-20, 2.4.20 and 2.6.0-test1 all give the same results, so I
suppose select itself is doing it's expected duty, and that in that case
the special underlying mechanics of stdout require special mechanics to
find out if it's blocked?! Beats me, but that's pretty easy... ;-)

Marco Roeland

2003-07-26 14:10:29

by Eli Barzilay

[permalink] [raw]
Subject: Re: Repost: Bug with select?

On Jul 25, Marco Roeland wrote:
> > len = select(fd + 1, NULL, &writefds, NULL, NULL);
>
> A select with no timeout, so it will immediately return.

The man page says:

timeout is an upper bound on the amount of time elapsed
before select returns. It may be zero, causing select to
return immediately. (This is useful for polling.) If time?
out is NULL (no timeout), select can block indefinitely.

But I did (obviously) try adding one just in case -- the problem does
not go away.


> > if (!FD_ISSET(fd,&writefds)) exit(0);
>
> This might be what Solaris does differently, by _not_ including '1'
> in the returned descriptors? Linux will say (rightly) that a
> following call will not block, which is something very different
> than 'will not fail'!

I just added that when trying to trace the problem and reading
somewhere that ISSET must be used... It never had any effect -- never
exits and otherwise the program is still on a busy spin in Linux and
fine on Solaris.


> > len = write(fd, "hi\n", 3);
>
> You don't check the exit status here, but when you press Ctrl-C
> (stdout blocked) it will indicate an error here (exit status -1)
> with errno set to EAGAIN, meaning you should try again, which is the
> appropriate result for a non-blocking descriptor or socket
> here. Anyway, the call "succeeds" and we loop back into the
> while(1), indeed as you say creating a busy loop. No surprises
> there I'd say.

Uh, that's just a stripped down example -- in the original the
returned value is checked and the write is retried if the result is
EINTR. The problem is that AFAICT, select should wait until the fd is
writable, but then write fails with EAGAIN, only to have the next
select succeed as if there is no problems.


> > }
> > fcntl(fd, F_SETFL, flags);
> > }
>
> You might start by checking for EAGAIN as result of the write, and
> then reacting according to your needs (waiting a while or exiting
> the program or whatever).

Yeah, when the problem occurs, write will result in an EAGAIN, but
the next select should block until writing is ok.

When I played with this now I saw another strange thing -- when there
is a timeout in place, the FD_ISSET *will* return 0 after some output
was done (probably when its waiting for output). So I thought that it
might be a good place to put a sleep, but the problem is that 0 is not
returned when the output is stopped.

This is the program:
======================================================================
#include <unistd.h>
#include <fcntl.h>
#include <errno.h>
int main() {
int flags, fd, len; fd_set writefds;
struct timeval timeout; timeout.tv_sec = 1; timeout.tv_usec = 0;
fd = 1;
flags = fcntl(fd, F_GETFL, 0);
fcntl(fd, F_SETFL, flags | O_NONBLOCK);
while (1) {
FD_ZERO(&writefds);
FD_SET(fd, &writefds);
len = select(fd + 1, NULL, &writefds, NULL, &timeout);
if (len<0) exit(1);
while (!FD_ISSET(fd,&writefds)) {
sleep(1);
FD_ZERO(&writefds);
FD_SET(fd, &writefds);
select(fd + 1, NULL, &writefds, NULL, &timeout);
if (len<0) exit(1);
}
do {
len = write(fd, "hi\n", 3);
} while ((len == -1) && (errno == EINTR));
if (len<0 && errno==EINTR) exit(2);
/* if (len<0 && errno==EAGAIN) exit(3); */
}
fcntl(fd, F_SETFL, flags);
}
======================================================================


On Jul 25, Ben Greear wrote:
> I thought select is supposed to tell you when you can read/write at
> least something without failing. Otherwise it would be worthless
> when doing non-blocking IO because you can both read and write w/out
> blocking at all times.

That was the point I was trying to make.


On Jul 26, Marco Roeland wrote:
> My 'analysis' was indeed based on experience with sockets, where you
> don't get the busy spin. It's indeed a bit baffling why select keeps
> insisting that fd 1 is writable. A quick test on kernel versions
> 2.2.12-20, 2.4.20 and 2.6.0-test1 all give the same results, so I
> suppose select itself is doing it's expected duty, and that in that
> case the special underlying mechanics of stdout require special
> mechanics to find out if it's blocked?! Beats me, but that's pretty
> easy... ;-)

This doesn't solve the problem, and as evidence, the code will look
ugly with special cases for terminal output.

--
((lambda (x) (x x)) (lambda (x) (x x))) Eli Barzilay:
http://www.barzilay.org/ Maze is Life!

2003-07-26 14:14:02

by Eli Barzilay

[permalink] [raw]
Subject: Re: Repost: Bug with select?

On Jul 25, Philippe Troin wrote:
> Looks like a bug to me.
> Strace says:
>
> select(2, NULL, [1], NULL, NULL) = 1 (out [1])
> write(1, "hi\n", 3) = -1 EAGAIN (Resource temporarily unavailable)
>
> forever.
>
> Then select() should not return fd 1 as writable, at least not
> reapeatedly.

Exactly -- I didn't even think of using strace where this is made
obvious. (I don't have any solaris where I can run strace, but I
wonder what does that say.)

--
((lambda (x) (x x)) (lambda (x) (x x))) Eli Barzilay:
http://www.barzilay.org/ Maze is Life!

2003-07-26 15:28:13

by Marco Roeland

[permalink] [raw]
Subject: Re: Repost: Bug with select?

Op Saturday 26 July 2003 at 10:25 Eli Barzilay wrote:

> ...

> I just added that when trying to trace the problem and reading
> somewhere that ISSET must be used... It never had any effect -- never
> exits and otherwise the program is still on a busy spin in Linux and
> fine on Solaris.

After some more testing the behaviour here seems indeed a bit odd. For
what it's worth I just tested the program under IBM AIX 4.2 on an old
RS/6000 machine, and it doesn't busy spin there either.
--
Marco Roeland

2003-07-28 02:19:04

by Manfred Spraul

[permalink] [raw]
Subject: Re: Repost: Bug with select?

--- 2.5/drivers/char/n_tty.c 2003-07-05 09:13:01.000000000 +0200
+++ build-2.5/drivers/char/n_tty.c 2003-07-27 20:44:58.000000000 +0200
@@ -1251,7 +1251,8 @@
else
tty->minimum_to_wake = 1;
}
- if (tty->driver->chars_in_buffer(tty) < WAKEUP_CHARS)
+ if (tty->driver->chars_in_buffer(tty) < WAKEUP_CHARS &&
+ tty->driver->write_room(tty) > 0)
mask |= POLLOUT | POLLWRNORM;
return mask;
}


Attachments:
patch-tty-fix (403.00 B)