2006-05-11 14:25:33

by linux-os (Dick Johnson)

[permalink] [raw]
Subject: Linux poll() <sigh> again



Hello,

I'm trying to fix a long-standing bug which has a
work-around that has been working for a year or
so.

The bug relates to Linux implementation of poll()
on a connected socket. If poll() is set to detect
changes on a connected socket, with an infinite
timeout (-1), and the client disconnects, it returns
with a positive value (correct). The returned
events (revents member), shows only POLLIN bit
set. This, according to all known documentation
including man pages on the web, is supposed to
mean that there are data to be read. In fact,
there are no data and a read will return 0.

I have used the subsequent read() with a returned
value of zero, to indicate that the client disconnected
(as a work around). However, on recent versions of
Linux, this is not reliable and the read() may
wait forever instead of immediately returning.

So, it's time to fix poll. Will somebody please
have poll return POLLHUP when the client disconnects
a connected socket? I don't need to add any more
work arounds (like setting the socket to non-blocking
so a read will return even if poll() erroneously
reports that data are ready.

Here is relevent code:

for(;;) {
mem->pfd.fd = fd;
mem->pfd.events = POLLIN|POLLERR|POLLHUP|POLLNVAL;
mem->pfd.revents = 0x00;
message("Calling poll\n");
if(poll(&mem->pfd, 0x01, -1) != 0x01)
break;
message("Poll returns okay with %08x\n", mem->pfd.revents);
if(mem->pfd.revents & (POLLHUP|POLLERR|POLLNVAL)) {
message("Poll says client hung up\n");
break;
}
if(mem->pfd.revents & POLLIN) {
message("Poll says data ready\n");
if((status = read(fd, mem->buf, BUF_LEN)) <=0 ) {
message("but read returns %d\n", status);
break;
}
}
}
message("Disconnected\n");

Here is what the code reports:

Script started on Thu 11 May 2006 10:18:34 AM EDT
[root@chaos servers]# ./control
Calling poll
Poll returns okay with 00000001
Poll says data ready
but read returns 0
Disconnected
Aborted via ^C
[root@chaos servers]# exit
Script done on Thu 11 May 2006 10:19:15 AM EDT

Cheers,
Dick Johnson
Penguin : Linux version 2.6.16.4 on an i686 machine (5592.89 BogoMips).
New book: http://www.lymanschool.com
_


****************************************************************
The information transmitted in this message is confidential and may be privileged. Any review, retransmission, dissemination, or other use of this information by persons or entities other than the intended recipient is prohibited. If you are not the intended recipient, please notify Analogic Corporation immediately - by replying to this message or by sending an email to [email protected] - and destroy all copies of this information, including any attachments, without reading or disclosing them.

Thank you.


2006-05-11 20:47:34

by Nishanth Aravamudan

[permalink] [raw]
Subject: Re: Linux poll() <sigh> again

On 11.05.2006 [10:25:29 -0400], linux-os (Dick Johnson) wrote:
>
>
> Hello,
>
> I'm trying to fix a long-standing bug which has a
> work-around that has been working for a year or
> so.

<snip valiant efforts>

> Here is relevent code:
>
> for(;;) {
> mem->pfd.fd = fd;
> mem->pfd.events = POLLIN|POLLERR|POLLHUP|POLLNVAL;
> mem->pfd.revents = 0x00;

Hrm, in looking at the craziness that is sys_poll() for a bit, I think
it's the underlying f_ops that are responsible for not setting POLLHUP,
that is:

if (file != NULL) {
mask = DEFAULT_POLLMASK;
if (file->f_op && file->f_op->poll)
mask = file->f_op->poll(file, *pwait);
mask &= fdp->events | POLLERR | POLLHUP;
fput_light(file, fput_needed);
}

and file->f_op->poll(file, *pwait) is not setting POLLHUP on the
disconnect. What filesystem is this?

On an independent note, it seems like the relatively recent cleanups to
sys_poll() made the negative case a bit inefficient (and reliant on
msecs_to_jiffies() dealing with negative values, which I don't think it
was really ever designed to (it's mostly used for converting time
values, which can never go negative)). Maybe the following would make
sense? Peter, I know you had been looking at poll() issues earlier, does
this change make sense?

Description: Rather than make msecs_to_jiffies() deal with negative
values, just send them on to do_sys_poll(), which (eventually in
do_poll()) explicitly checks for them.

Signed-off-by: Nishanth Aravamudan <[email protected]>

diff -urpN 2.6.17-rc3-git18/fs/select.c 2.6.17-rc3-git18-dev/fs/select.c
--- 2.6.17-rc3-git18/fs/select.c 2006-05-11 12:17:15.000000000 -0700
+++ 2.6.17-rc3-git18-dev/fs/select.c 2006-05-11 12:38:16.000000000 -0700
@@ -727,9 +727,9 @@ out_fds:
asmlinkage long sys_poll(struct pollfd __user *ufds, unsigned int nfds,
long timeout_msecs)
{
- s64 timeout_jiffies = 0;
+ s64 timeout_jiffies;

- if (timeout_msecs) {
+ if (timeout_msecs > 0) {
#if HZ > 1000
/* We can only overflow if HZ > 1000 */
if (timeout_msecs / 1000 > (s64)0x7fffffffffffffffULL / (s64)HZ)
@@ -737,6 +737,8 @@ asmlinkage long sys_poll(struct pollfd _
else
#endif
timeout_jiffies = msecs_to_jiffies(timeout_msecs);
+ } else {
+ timeout_jiffies = timeout_msecs;
}

return do_sys_poll(ufds, nfds, &timeout_jiffies);

--
Nishanth Aravamudan <[email protected]>
IBM Linux Technology Center

2006-05-11 21:04:49

by linux-os (Dick Johnson)

[permalink] [raw]
Subject: Re: Linux poll() <sigh> again


On Thu, 11 May 2006, Nishanth Aravamudan wrote:

> On 11.05.2006 [10:25:29 -0400], linux-os (Dick Johnson) wrote:
>>
>>
>> Hello,
>>
>> I'm trying to fix a long-standing bug which has a
>> work-around that has been working for a year or
>> so.
>
> <snip valiant efforts>
>
>> Here is relevent code:
>>
>> for(;;) {
>> mem->pfd.fd = fd;
>> mem->pfd.events = POLLIN|POLLERR|POLLHUP|POLLNVAL;
>> mem->pfd.revents = 0x00;
>
> Hrm, in looking at the craziness that is sys_poll() for a bit, I think
> it's the underlying f_ops that are responsible for not setting POLLHUP,
> that is:
>
> if (file != NULL) {
> mask = DEFAULT_POLLMASK;
> if (file->f_op && file->f_op->poll)
> mask = file->f_op->poll(file, *pwait);
> mask &= fdp->events | POLLERR | POLLHUP;
> fput_light(file, fput_needed);
> }
>
> and file->f_op->poll(file, *pwait) is not setting POLLHUP on the
> disconnect. What filesystem is this?

I think that's the problem. A socket isn't a file-system and the
code won't set either bits if it isn't. Perhaps, the kernel code
needs to consider a socket as a virtual file of some kind? Surely
one needs to use poll() on sockets, no?

>
> On an independent note, it seems like the relatively recent cleanups to
> sys_poll() made the negative case a bit inefficient (and reliant on
> msecs_to_jiffies() dealing with negative values, which I don't think it
> was really ever designed to (it's mostly used for converting time
> values, which can never go negative)). Maybe the following would make
> sense? Peter, I know you had been looking at poll() issues earlier, does
> this change make sense?
>
> Description: Rather than make msecs_to_jiffies() deal with negative
> values, just send them on to do_sys_poll(), which (eventually in
> do_poll()) explicitly checks for them.
>
> Signed-off-by: Nishanth Aravamudan <[email protected]>
>
> diff -urpN 2.6.17-rc3-git18/fs/select.c 2.6.17-rc3-git18-dev/fs/select.c
> --- 2.6.17-rc3-git18/fs/select.c 2006-05-11 12:17:15.000000000 -0700
> +++ 2.6.17-rc3-git18-dev/fs/select.c 2006-05-11 12:38:16.000000000 -0700
> @@ -727,9 +727,9 @@ out_fds:
> asmlinkage long sys_poll(struct pollfd __user *ufds, unsigned int nfds,
> long timeout_msecs)
> {
> - s64 timeout_jiffies = 0;
> + s64 timeout_jiffies;
>
> - if (timeout_msecs) {
> + if (timeout_msecs > 0) {
> #if HZ > 1000
> /* We can only overflow if HZ > 1000 */
> if (timeout_msecs / 1000 > (s64)0x7fffffffffffffffULL / (s64)HZ)
> @@ -737,6 +737,8 @@ asmlinkage long sys_poll(struct pollfd _
> else
> #endif
> timeout_jiffies = msecs_to_jiffies(timeout_msecs);
> + } else {
> + timeout_jiffies = timeout_msecs;
> }
>
> return do_sys_poll(ufds, nfds, &timeout_jiffies);
>
> --
> Nishanth Aravamudan <[email protected]>
> IBM Linux Technology Center
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

Cheers,
Dick Johnson
Penguin : Linux version 2.6.16.4 on an i686 machine (5592.89 BogoMips).
New book: http://www.lymanschool.com
_


****************************************************************
The information transmitted in this message is confidential and may be privileged. Any review, retransmission, dissemination, or other use of this information by persons or entities other than the intended recipient is prohibited. If you are not the intended recipient, please notify Analogic Corporation immediately - by replying to this message or by sending an email to [email protected] - and destroy all copies of this information, including any attachments, without reading or disclosing them.

Thank you.

2006-05-11 21:16:03

by Nishanth Aravamudan

[permalink] [raw]
Subject: Re: Linux poll() <sigh> again

On 11.05.2006 [17:04:46 -0400], linux-os (Dick Johnson) wrote:
>
> On Thu, 11 May 2006, Nishanth Aravamudan wrote:
>
> > On 11.05.2006 [10:25:29 -0400], linux-os (Dick Johnson) wrote:
> >>
> >>
> >> Hello,
> >>
> >> I'm trying to fix a long-standing bug which has a
> >> work-around that has been working for a year or
> >> so.
> >
> > <snip valiant efforts>
> >
> >> Here is relevent code:
> >>
> >> for(;;) {
> >> mem->pfd.fd = fd;
> >> mem->pfd.events = POLLIN|POLLERR|POLLHUP|POLLNVAL;
> >> mem->pfd.revents = 0x00;
> >
> > Hrm, in looking at the craziness that is sys_poll() for a bit, I think
> > it's the underlying f_ops that are responsible for not setting POLLHUP,
> > that is:
> >
> > if (file != NULL) {
> > mask = DEFAULT_POLLMASK;
> > if (file->f_op && file->f_op->poll)
> > mask = file->f_op->poll(file, *pwait);
> > mask &= fdp->events | POLLERR | POLLHUP;
> > fput_light(file, fput_needed);
> > }
> >
> > and file->f_op->poll(file, *pwait) is not setting POLLHUP on the
> > disconnect. What filesystem is this?
>
> I think that's the problem. A socket isn't a file-system and the
> code won't set either bits if it isn't. Perhaps, the kernel code
> needs to consider a socket as a virtual file of some kind? Surely
> one needs to use poll() on sockets, no?

Duh, I'm not reading well today -- for sockets, we do

file->f_op->poll() -> (socket_file_ops) sock_poll() -> sock->ops->poll()

So, now I need to know what kind of socket is this to go from there ...

Thanks,
Nish

--
Nishanth Aravamudan <[email protected]>
IBM Linux Technology Center

2006-05-12 00:09:04

by Robert Hancock

[permalink] [raw]
Subject: Re: Linux poll() <sigh> again

linux-os (Dick Johnson) wrote:
> The bug relates to Linux implementation of poll()
> on a connected socket. If poll() is set to detect
> changes on a connected socket, with an infinite
> timeout (-1), and the client disconnects, it returns
> with a positive value (correct). The returned
> events (revents member), shows only POLLIN bit
> set. This, according to all known documentation
> including man pages on the web, is supposed to
> mean that there are data to be read. In fact,
> there are no data and a read will return 0.

According to the Single UNIX Specification:

http://www.opengroup.org/onlinepubs/007908799/xsh/poll.html

POLLIN means "Data other than high-priority data may be read without
blocking. For STREAMS, this flag is set in revents even if the message
is of zero length." The way I read it, all this is telling you is that a
read on that file descriptor will not block at that particular moment.
It doesn't mean there is actually any data to be read. On a device like
a socket, read returning 0 tells you that the connection's been closed.

POLLHUP means "The device has been disconnected." This would obviously
be appropriate for a device such as a serial line or TTY, etc. but for a
socket it is less obvious that this return value is appropriate.

>
> I have used the subsequent read() with a returned
> value of zero, to indicate that the client disconnected
> (as a work around). However, on recent versions of
> Linux, this is not reliable and the read() may
> wait forever instead of immediately returning.

If you want nonblocking behavior, you should set the socket to
nonblocking. This is a bit strange though, unless the data was stolen by
another thread or something. Are you sure you've seen this?

--
Robert Hancock Saskatoon, SK, Canada
To email, remove "nospam" from [email protected]
Home Page: http://www.roberthancock.com/

2006-05-12 05:27:42

by David Schwartz

[permalink] [raw]
Subject: RE: Linux poll() <sigh> again


> I have used the subsequent read() with a returned
> value of zero, to indicate that the client disconnected
> (as a work around). However, on recent versions of
> Linux, this is not reliable and the read() may
> wait forever instead of immediately returning.

If a 'read' on a non-blocking socket is waiting, something is seriously
wrong that goes way beyond 'poll'. If you're using a blocking socket, well,
blocking sockets block, 'poll' notwithstanding.

DS


2006-05-12 10:37:31

by Jan Engelhardt

[permalink] [raw]
Subject: Re: Linux poll() <sigh> again

>
>I think that's the problem. A socket isn't a file-system and the
>code won't set either bits if it isn't. Perhaps, the kernel code
>needs to consider a socket as a virtual file of some kind?

I think they are a virtual file of some kind:

[root@mason ~]# ls -l /proc/4361/fd
lrwx------ 1 root root 64 May 12 12:35 3 -> socket:[41460]
lr-x------ 1 root root 64 May 12 12:35 4 -> pipe:[41496]
l-wx------ 1 root root 64 May 12 12:35 5 -> pipe:[41496]

These "files" (socket:[] and pipe:[]) have a dentry in sockfs and
pipefs (you can't mount the fss, but they are there, see /proc/filesystems)


Jan Engelhardt
--

2006-05-12 11:42:14

by linux-os (Dick Johnson)

[permalink] [raw]
Subject: Re: Linux poll() <sigh> again


On Thu, 11 May 2006, Nishanth Aravamudan wrote:

> On 11.05.2006 [17:04:46 -0400], linux-os (Dick Johnson) wrote:
>>
>> On Thu, 11 May 2006, Nishanth Aravamudan wrote:
>>
>>> On 11.05.2006 [10:25:29 -0400], linux-os (Dick Johnson) wrote:
>>>>
>>>>
>>>> Hello,
>>>>
>>>> I'm trying to fix a long-standing bug which has a
>>>> work-around that has been working for a year or
>>>> so.
>>>
>>> <snip valiant efforts>
>>>
>>>> Here is relevent code:
>>>>
>>>> for(;;) {
>>>> mem->pfd.fd = fd;
>>>> mem->pfd.events = POLLIN|POLLERR|POLLHUP|POLLNVAL;
>>>> mem->pfd.revents = 0x00;
>>>
>>> Hrm, in looking at the craziness that is sys_poll() for a bit, I think
>>> it's the underlying f_ops that are responsible for not setting POLLHUP,
>>> that is:
>>>
>>> if (file != NULL) {
>>> mask = DEFAULT_POLLMASK;
>>> if (file->f_op && file->f_op->poll)
>>> mask = file->f_op->poll(file, *pwait);
>>> mask &= fdp->events | POLLERR | POLLHUP;
>>> fput_light(file, fput_needed);
>>> }
>>>
>>> and file->f_op->poll(file, *pwait) is not setting POLLHUP on the
>>> disconnect. What filesystem is this?
>>
>> I think that's the problem. A socket isn't a file-system and the
>> code won't set either bits if it isn't. Perhaps, the kernel code
>> needs to consider a socket as a virtual file of some kind? Surely
>> one needs to use poll() on sockets, no?
>
> Duh, I'm not reading well today -- for sockets, we do
>
> file->f_op->poll() -> (socket_file_ops) sock_poll() -> sock->ops->poll()
>
> So, now I need to know what kind of socket is this to go from there ..
>
> Thanks,
> Nish

A stream socket can be "connected". Anything that can be connected
needs to know when the connection is broken.

socket(AF_INET, SOCK_STREAM, IPPROTO_IP);
ip_sock.sin_family = AF_INET;

Such a socket is bound to an address and port using bind(), listen()
is established, the accept() is called to accept connections. Accept
returns a socket (fd) of the connected host. It's this fd that needs
to "know" if/when the host disconnects.


Cheers,
Dick Johnson
Penguin : Linux version 2.6.16.4 on an i686 machine (5592.89 BogoMips).
New book: http://www.lymanschool.com
_


****************************************************************
The information transmitted in this message is confidential and may be privileged. Any review, retransmission, dissemination, or other use of this information by persons or entities other than the intended recipient is prohibited. If you are not the intended recipient, please notify Analogic Corporation immediately - by replying to this message or by sending an email to [email protected] - and destroy all copies of this information, including any attachments, without reading or disclosing them.

Thank you.

2006-05-12 11:53:45

by linux-os (Dick Johnson)

[permalink] [raw]
Subject: Re: Linux poll() <sigh> again


On Thu, 11 May 2006, Robert Hancock wrote:

> linux-os (Dick Johnson) wrote:
>> The bug relates to Linux implementation of poll()
>> on a connected socket. If poll() is set to detect
>> changes on a connected socket, with an infinite
>> timeout (-1), and the client disconnects, it returns
>> with a positive value (correct). The returned
>> events (revents member), shows only POLLIN bit
>> set. This, according to all known documentation
>> including man pages on the web, is supposed to
>> mean that there are data to be read. In fact,
>> there are no data and a read will return 0.
>
> According to the Single UNIX Specification:
>
> http://www.opengroup.org/onlinepubs/007908799/xsh/poll.html
>
> POLLIN means "Data other than high-priority data may be read without
> blocking. For STREAMS, this flag is set in revents even if the message
> is of zero length." The way I read it, all this is telling you is that a
> read on that file descriptor will not block at that particular moment.
> It doesn't mean there is actually any data to be read. On a device like
> a socket, read returning 0 tells you that the connection's been closed.
>
> POLLHUP means "The device has been disconnected." This would obviously
> be appropriate for a device such as a serial line or TTY, etc. but for a
> socket it is less obvious that this return value is appropriate.
>

Hardly "less obvious". SunOs has returned POLLHUP as has other
Unixes like Interactive, from which the software was ported. It
went from Interactive, to SunOs, to Linux. Linux was the first
OS that required the hack. This was reported several years ago
and I was simply excoriated for having the audacity to report
such a thing. So, I just implemented a hack. Now the hack is
biting me. It's about time for poll() to return the correct
stuff.

>>
>> I have used the subsequent read() with a returned
>> value of zero, to indicate that the client disconnected
>> (as a work around). However, on recent versions of
>> Linux, this is not reliable and the read() may
>> wait forever instead of immediately returning.
>
> If you want nonblocking behavior, you should set the socket to
> nonblocking. This is a bit strange though, unless the data was stolen by
> another thread or something. Are you sure you've seen this?
>

I don't use threads. The hang under the specified conditions was first
observed on 2.6.16.4 (that I'm running on this system). The hack, previously
used, i.e., the read of zero was used since 2.4.x with success except it's
a hack and shouldn't be required. It was not ever required on SunOs from
which the software was ported.

> --
> Robert Hancock Saskatoon, SK, Canada
> To email, remove "nospam" from [email protected]
> Home Page: http://www.roberthancock.com/
>
>

Cheers,
Dick Johnson
Penguin : Linux version 2.6.16.4 on an i686 machine (5592.89 BogoMips).
New book: http://www.lymanschool.com
_


****************************************************************
The information transmitted in this message is confidential and may be privileged. Any review, retransmission, dissemination, or other use of this information by persons or entities other than the intended recipient is prohibited. If you are not the intended recipient, please notify Analogic Corporation immediately - by replying to this message or by sending an email to [email protected] - and destroy all copies of this information, including any attachments, without reading or disclosing them.

Thank you.

2006-05-12 14:33:35

by Robert Hancock

[permalink] [raw]
Subject: Re: Linux poll() <sigh> again

linux-os (Dick Johnson) wrote:
>> POLLHUP means "The device has been disconnected." This would obviously
>> be appropriate for a device such as a serial line or TTY, etc. but for a
>> socket it is less obvious that this return value is appropriate.
>>
>
> Hardly "less obvious". SunOs has returned POLLHUP as has other
> Unixes like Interactive, from which the software was ported. It
> went from Interactive, to SunOs, to Linux. Linux was the first
> OS that required the hack. This was reported several years ago
> and I was simply excoriated for having the audacity to report
> such a thing. So, I just implemented a hack. Now the hack is
> biting me. It's about time for poll() to return the correct
> stuff.

The standard doesn't require that a close on a socket should report
POLLHUP. Thus this behavior may differ between UNIX implementations. If
your software is requiring a POLLHUP to indicate the socket is closed I
think it is being unnecessarily picky since read returning 0 universally
indicates that the connection has been closed. Such are the compromises
that are sometimes required to write portable software.

>
>>> I have used the subsequent read() with a returned
>>> value of zero, to indicate that the client disconnected
>>> (as a work around). However, on recent versions of
>>> Linux, this is not reliable and the read() may
>>> wait forever instead of immediately returning.
>> If you want nonblocking behavior, you should set the socket to
>> nonblocking. This is a bit strange though, unless the data was stolen by
>> another thread or something. Are you sure you've seen this?
>
> I don't use threads. The hang under the specified conditions was first
> observed on 2.6.16.4 (that I'm running on this system). The hack, previously
> used, i.e., the read of zero was used since 2.4.x with success except it's
> a hack and shouldn't be required. It was not ever required on SunOs from
> which the software was ported.

This may be a bug somewhere.. however, once again if you don't want read
to block under any circumstances, set your sockets to non-blocking!

--
Robert Hancock Saskatoon, SK, Canada
To email, remove "nospam" from [email protected]
Home Page: http://www.roberthancock.com/

2006-05-12 14:46:09

by jimmy

[permalink] [raw]
Subject: Re: Linux poll() <sigh> again

Robert Hancock wrote:
> linux-os (Dick Johnson) wrote:
>>> POLLHUP means "The device has been disconnected." This would obviously
>>> be appropriate for a device such as a serial line or TTY, etc. but for a
>>> socket it is less obvious that this return value is appropriate.
>>>
>>
>> Hardly "less obvious". SunOs has returned POLLHUP as has other
>> Unixes like Interactive, from which the software was ported. It
>> went from Interactive, to SunOs, to Linux. Linux was the first
>> OS that required the hack. This was reported several years ago
>> and I was simply excoriated for having the audacity to report
>> such a thing. So, I just implemented a hack. Now the hack is
>> biting me. It's about time for poll() to return the correct
>> stuff.
>
> The standard doesn't require that a close on a socket should report
> POLLHUP. Thus this behavior may differ between UNIX implementations. If
> your software is requiring a POLLHUP to indicate the socket is closed I
> think it is being unnecessarily picky since read returning 0 universally
> indicates that the connection has been closed. Such are the compromises
> that are sometimes required to write portable software.
>
>>
>>>> I have used the subsequent read() with a returned
>>>> value of zero, to indicate that the client disconnected
>>>> (as a work around). However, on recent versions of
>>>> Linux, this is not reliable and the read() may
>>>> wait forever instead of immediately returning.
>>> If you want nonblocking behavior, you should set the socket to
>>> nonblocking. This is a bit strange though, unless the data was stolen by
>>> another thread or something. Are you sure you've seen this?
>>
>> I don't use threads. The hang under the specified conditions was first
>> observed on 2.6.16.4 (that I'm running on this system). The hack,
>> previously
>> used, i.e., the read of zero was used since 2.4.x with success except
>> it's
>> a hack and shouldn't be required. It was not ever required on SunOs from
>> which the software was ported.
>
> This may be a bug somewhere.. however, once again if you don't want read
> to block under any circumstances, set your sockets to non-blocking!
>
But that's another hack. AFAICS why ppl (mostly) use select/poll wud be
to know if their send/recv/read/write would go thru rather than getting
blocked!


-jb
--
Only two things are infinite, the universe and human stupidity, and I'm
not sure about the former. - Albert Einstein

2006-05-12 14:57:18

by linux-os (Dick Johnson)

[permalink] [raw]
Subject: Re: Linux poll() <sigh> again


On Fri, 12 May 2006, jimmy wrote:

> Robert Hancock wrote:
>> linux-os (Dick Johnson) wrote:
>>>> POLLHUP means "The device has been disconnected." This would obviously
>>>> be appropriate for a device such as a serial line or TTY, etc. but for a
>>>> socket it is less obvious that this return value is appropriate.
>>>>
>>>
>>> Hardly "less obvious". SunOs has returned POLLHUP as has other
>>> Unixes like Interactive, from which the software was ported. It
>>> went from Interactive, to SunOs, to Linux. Linux was the first
>>> OS that required the hack. This was reported several years ago
>>> and I was simply excoriated for having the audacity to report
>>> such a thing. So, I just implemented a hack. Now the hack is
>>> biting me. It's about time for poll() to return the correct
>>> stuff.
>>
>> The standard doesn't require that a close on a socket should report
>> POLLHUP. Thus this behavior may differ between UNIX implementations. If
>> your software is requiring a POLLHUP to indicate the socket is closed I
>> think it is being unnecessarily picky since read returning 0 universally
>> indicates that the connection has been closed. Such are the compromises
>> that are sometimes required to write portable software.

This is from the Linux man-page shipped with recent distributions


SOCKET(7) Linux Programmer?EUR(tm)s Manual SOCKET(7)



+--------------------------------------------------------------------+
| I/O events |
+-----------+-----------+--------------------------------------------+
|Event | Poll flag | Occurrence |
+-----------+-----------+--------------------------------------------+
|Read | POLLIN | New data arrived. |
+-----------+-----------+--------------------------------------------+
|Read | POLLIN | A connection setup has been completed (for |
| | | connection-oriented sockets) |
+-----------+-----------+--------------------------------------------+
|Read | POLLHUP | A disconnection request has been initiated |
| | | by the other end. |
+-----------+-----------+--------------------------------------------+
|Read | POLLHUP | A connection is broken (only for connec- |
| | | tion-oriented protocols). When the socket |
| | | is written SIGPIPE is also sent. |
+-----------+-----------+--------------------------------------------+
|Write | POLLOUT | Socket has enough send buffer space for |
| | | writing new data. |
+-----------+-----------+--------------------------------------------+
|Read/Write | POLLIN| | An outgoing connect(2) finished. |
| | POLLOUT | |
+-----------+-----------+--------------------------------------------+
|Read/Write | POLLERR | An asynchronous error occurred. |
+-----------+-----------+--------------------------------------------+
|Read/Write | POLLHUP | The other end has shut down one direction. |
+-----------+-----------+--------------------------------------------+
|Exception | POLLPRI | Urgent data arrived. SIGURG is sent then. |
+-----------+-----------+--------------------------------------------+


If linux doesn't support POLLHUP, then it shouldn't be documented.
I got the same king of crap^M^M^M^Mresponse the last time I reported
this __very__ __obvious__ defect! The information is available
in the kernel. It should certainly report it, just like other
operating systems do, including <shudder> wsock32.

>>
>>>
>>>>> I have used the subsequent read() with a returned
>>>>> value of zero, to indicate that the client disconnected
>>>>> (as a work around). However, on recent versions of
>>>>> Linux, this is not reliable and the read() may
>>>>> wait forever instead of immediately returning.
>>>> If you want nonblocking behavior, you should set the socket to
>>>> nonblocking. This is a bit strange though, unless the data was stolen by
>>>> another thread or something. Are you sure you've seen this?
>>>
>>> I don't use threads. The hang under the specified conditions was first
>>> observed on 2.6.16.4 (that I'm running on this system). The hack,
>>> previously
>>> used, i.e., the read of zero was used since 2.4.x with success except
>>> it's
>>> a hack and shouldn't be required. It was not ever required on SunOs from
>>> which the software was ported.
>>
>> This may be a bug somewhere.. however, once again if you don't want read
>> to block under any circumstances, set your sockets to non-blocking!
>>
> But that's another hack. AFAICS why ppl (mostly) use select/poll wud be
> to know if their send/recv/read/write would go thru rather than getting
> blocked!
>

Yes. You need to know if something has changed. This could mean
many things such as new data available or a disconnection. This
is a communications link for crysake, one needs to handle
communications events.

>
> -jb
> --
> Only two things are infinite, the universe and human stupidity, and I'm
> not sure about the former. - Albert Einstein
>

Cheers,
Dick Johnson
Penguin : Linux version 2.6.16.4 on an i686 machine (5592.89 BogoMips).
New book: http://www.lymanschool.com
_


****************************************************************
The information transmitted in this message is confidential and may be privileged. Any review, retransmission, dissemination, or other use of this information by persons or entities other than the intended recipient is prohibited. If you are not the intended recipient, please notify Analogic Corporation immediately - by replying to this message or by sending an email to [email protected] - and destroy all copies of this information, including any attachments, without reading or disclosing them.

Thank you.

2006-05-12 15:06:35

by Eric Dumazet

[permalink] [raw]
Subject: Re: Linux poll() <sigh> again

linux-os (Dick Johnson) a ?crit :
> On Fri, 12 May 2006, jimmy wrote:
>
>
>> Robert Hancock wrote:
>>
>>> linux-os (Dick Johnson) wrote:
>>>
>>>>> POLLHUP means "The device has been disconnected." This would obviously
>>>>> be appropriate for a device such as a serial line or TTY, etc. but for a
>>>>> socket it is less obvious that this return value is appropriate.
>>>>>
>>>>>
>>>> Hardly "less obvious". SunOs has returned POLLHUP as has other
>>>> Unixes like Interactive, from which the software was ported. It
>>>> went from Interactive, to SunOs, to Linux. Linux was the first
>>>> OS that required the hack. This was reported several years ago
>>>> and I was simply excoriated for having the audacity to report
>>>> such a thing. So, I just implemented a hack. Now the hack is
>>>> biting me. It's about time for poll() to return the correct
>>>> stuff.
>>>>
>>> The standard doesn't require that a close on a socket should report
>>> POLLHUP. Thus this behavior may differ between UNIX implementations. If
>>> your software is requiring a POLLHUP to indicate the socket is closed I
>>> think it is being unnecessarily picky since read returning 0 universally
>>> indicates that the connection has been closed. Such are the compromises
>>> that are sometimes required to write portable software.
>>>
>
> This is from the Linux man-page shipped with recent distributions
>
>
> SOCKET(7) Linux Programmer?EUR(tm)s Manual SOCKET(7)
>
>
>
> +--------------------------------------------------------------------+
> | I/O events |
> +-----------+-----------+--------------------------------------------+
> |Event | Poll flag | Occurrence |
> +-----------+-----------+--------------------------------------------+
> |Read | POLLIN | New data arrived. |
> +-----------+-----------+--------------------------------------------+
> |Read | POLLIN | A connection setup has been completed (for |
> | | | connection-oriented sockets) |
> +-----------+-----------+--------------------------------------------+
> |Read | POLLHUP | A disconnection request has been initiated |
> | | | by the other end. |
> +-----------+-----------+--------------------------------------------+
> |Read | POLLHUP | A connection is broken (only for connec- |
> | | | tion-oriented protocols). When the socket |
> | | | is written SIGPIPE is also sent. |
> +-----------+-----------+--------------------------------------------+
> |Write | POLLOUT | Socket has enough send buffer space for |
> | | | writing new data. |
> +-----------+-----------+--------------------------------------------+
> |Read/Write | POLLIN| | An outgoing connect(2) finished. |
> | | POLLOUT | |
> +-----------+-----------+--------------------------------------------+
> |Read/Write | POLLERR | An asynchronous error occurred. |
> +-----------+-----------+--------------------------------------------+
> |Read/Write | POLLHUP | The other end has shut down one direction. |
> +-----------+-----------+--------------------------------------------+
> |Exception | POLLPRI | Urgent data arrived. SIGURG is sent then. |
> +-----------+-----------+--------------------------------------------+
>
>
> If linux doesn't support POLLHUP, then it shouldn't be documented.
> I got the same king of crap^M^M^M^Mresponse the last time I reported
> this __very__ __obvious__ defect! The information is available
> in the kernel. It should certainly report it, just like other
> operating systems do, including <shudder> wsock32.
>
Hi Dick

On socket disconnection, POLLIN set in poll()->revents and recv()
returning 0 is the only portable and reliable method.

This is well explained in Stevens book (The absolute reference imho). It
was writen well before Linus wrote a single line of C code.

If you dont have this book (that would be a shame !!! )

Please refer to
http://www.opengroup.org/onlinepubs/000095399/functions/poll.html

POLLHUP

The device has been disconnected. This event and POLLOUT are
mutually-exclusive; a stream can never be writable if a hangup has
occurred. However, this event and POLLIN, POLLRDNORM, POLLRDBAND, or
POLLPRI are not mutually-exclusive. This flag is only valid in the
/revents/ bitmask; it shall be ignored in the /events/ member.

So you should not set POLLHUP in the mem->pfd.events member, since
POLLHUP is non maskable.

Also you might find this interesting :
http://www.greenend.org.uk/rjk/2001/06/poll.html

More over this comment found in the linux kernel (file net/ipv4/tcp.c)
is quite good :

/*
* POLLHUP is certainly not done right. But poll() doesn't
* have a notion of HUP in just one direction, and for a
* socket the read side is more interesting.
*
* Some poll() documentation says that POLLHUP is incompatible
* with the POLLOUT/POLLWR flags, so somebody should check this
* all. But careful, it tends to be safer to return too many
* bits than too few, and you can easily break real applications
* if you don't tell them that something has hung up!
*
* Check-me.
*
* Check number 1. POLLHUP is _UNMASKABLE_ event (see UNIX98 and
* our fs/select.c). It means that after we received EOF,
* poll always returns immediately, making impossible poll() on
write()
* in state CLOSE_WAIT. One solution is evident --- to set POLLHUP
* if and only if shutdown has been made in both directions.
* Actually, it is interesting to look how Solaris and DUX
* solve this dilemma. I would prefer, if PULLHUP were maskable,
* then we could set it on SND_SHUTDOWN. BTW examples given
* in Stevens' books assume exactly this behaviour, it explains
* why PULLHUP is incompatible with POLLOUT. --ANK
*
* NOTE. Check for TCP_CLOSE is added. The goal is to prevent
* blocking on fresh not-connected or disconnected socket. --ANK
*/
if (sk->sk_shutdown == SHUTDOWN_MASK || sk->sk_state == TCP_CLOSE)
mask |= POLLHUP;


So basically a POLLHUP could be stick in revent if POLLOUT was not given
in event, but it would be of litle interest...

Eric



2006-05-12 15:12:14

by Davide Libenzi

[permalink] [raw]
Subject: Re: Linux poll() <sigh> again

On Fri, 12 May 2006, linux-os (Dick Johnson) wrote:

> If linux doesn't support POLLHUP, then it shouldn't be documented.
> I got the same king of crap^M^M^M^Mresponse the last time I reported
> this __very__ __obvious__ defect! The information is available
> in the kernel. It should certainly report it, just like other
> operating systems do, including <shudder> wsock32.

Try to search the list (and the source) for POLLRDHUP ...


- Davide


2006-05-12 18:50:04

by David Schwartz

[permalink] [raw]
Subject: RE: Linux poll() <sigh> again


> > This may be a bug somewhere.. however, once again if you don't
> > want read
> > to block under any circumstances, set your sockets to non-blocking!

> But that's another hack. AFAICS why ppl (mostly) use select/poll wud be
> to know if their send/recv/read/write would go thru rather than getting
> blocked!

It's not another hack. If you don't want to block, you must tell the kernel
that. As for select/poll telling you that a write won't block, it has never
done that. The select/poll functions, just like almost every other system
call, do *not* provide future guarantees.

DS