I've recently run across a problem where a server (in this case, sshd)
can deadlock itself if a SIGCHLD arrives just before it calls select(),
but after it has checked whether its child_terminated. So when the
select() is called, the SIGCHLD signal handler has already run and set
the child_terminated flag, and there's nothing to wake the select().
The only real user-space solution to this is to have the SIGCHLD handler
somehow cause the select() to return immediately by e.g. writing a byte
to a looped pipe which is included in the select() readfd set, but this
seems a little contrived. This patch simply adds a proc_base_poll()
method to make /proc/<pid> pollable, with the following semantics:
- /proc/<your_pid>/ returns POLLHUP if you have any unreaped zombie
children.
- /proc/<your_child_pid>/ returns POLLHUP if the specified child is a
zombie or has been reaped.
- Any other /proc/<pid>/ directory returns POLLNVAL, as we've no way to
do a proper poll() on it - only a parent's wait_chldexit wait queue is
awakened when a process exits, so other processes won't get any kind of
notification.
So by including /proc/self in the readset for select(), you can
guarantee that select()/poll() will return if you've just received (and
handled) a SIGCHLD, but not yet reaped the child.
Alternatively, you could block SIGCHLD and just use select() for both
your child notifications and your I/O notifications. In this case, it
becomes a very specialised case of the sigopen() that Dan Kegel
proposed, but is minimally intrusive (consisting of just a single method
added to proc_base_operations).
If do_notify_parent() were changed to wake up tsk->wait_chldexit as well
as its parent's wait_chldexit, then sensible semantics could be added
for polling on any /proc/<pid>/ dir - just do a pollwait() on the target
process' exit_childwait queue. Provided that the target task is properly
refcounted (which should occur naturally due to the existence of the
/proc/<pid>/ inode), this should be safe. Would anyone have objections
to (or enthusiasm for) such a patch?
Paul
--- linux.orig/fs/proc/base.c Fri Jul 20 12:39:56 2001
+++ linux/fs/proc/base.c Thu Oct 4 00:19:45 2001
@@ -23,6 +23,8 @@
#include <linux/init.h>
#include <linux/file.h>
#include <linux/string.h>
+#include <linux/wait.h>
+#include <linux/poll.h>
/*
* For hysterical raisins we keep the same inumbers as in the old procfs.
@@ -643,6 +645,44 @@
return 1;
}
+static unsigned int proc_base_poll(struct file *file, poll_table *wait)
+{
+ struct inode * inode = file->f_dentry->d_inode;
+ struct task_struct *task = inode->u.proc_i.task;
+
+ int mask = 0;
+
+ poll_wait(file, ¤t->wait_chldexit, wait);
+
+ if(task == current) {
+
+ /* Check for any zombie children */
+ struct task_struct *p;
+
+ read_lock(&tasklist_lock);
+ do {
+ for (p = task->p_cptr ; p ; p = p->p_osptr) {
+ if(p->state == TASK_ZOMBIE) {
+ mask = POLLERR;
+ break;
+ }
+ }
+ task = next_thread(task);
+
+ } while(task != current && !mask);
+ read_unlock(&tasklist_lock);
+
+ } else if (task->p_pptr == current) {
+ /* Check for specific zombie child */
+ if(task->state == TASK_ZOMBIE)
+ mask = POLLERR;
+ } else {
+ mask = POLLNVAL;
+ }
+
+ return mask;
+}
+
/* building an inode */
static int task_dumpable(struct task_struct *task)
@@ -914,6 +954,7 @@
static struct file_operations proc_base_operations = {
read: generic_read_dir,
readdir: proc_base_readdir,
+ poll : proc_base_poll,
};
static struct inode_operations proc_base_inode_operations = {
Hi Paul,
On Thu, 04 Oct 2001, Paul Menage wrote:
>
> I've recently run across a problem where a server (in this case,
> sshd) can deadlock itself if a SIGCHLD arrives just before it calls
> select(), but after it has checked whether its child_terminated. So
> when the select() is called, the SIGCHLD signal handler has already
> run and set the child_terminated flag, and there's nothing to wake
> the select().
>
> The only real user-space solution to this is to have the SIGCHLD
> handler somehow cause the select() to return immediately
... or implement pselect:
http://mesh.eecs.umich.edu/cgi-bin/man2html/usr/share/man/man2/select.2.gz
or use sigsetjmp/siglongjmp
Both would be portable and not special to child handling.
Greetings
Christoph
>> The only real user-space solution to this is to have the SIGCHLD
>> handler somehow cause the select() to return immediately
>
>... or implement pselect:
>http://mesh.eecs.umich.edu/cgi-bin/man2html/usr/share/man/man2/select.2.gz
Agreed, althought that's not a user-space solution. Is there any
fundamental reason why no-one's implemented pselect()/ppoll() for Linux
yet?
>
>or use sigsetjmp/siglongjmp
Yes, that would probably solve the situation in question, provided that
siglongjmp() is portably safe. (A comment on LKML in the past suggested
that it's not safe on cygwin, for example.)
>
>Both would be portable and not special to child handling.
One advantage of the pollable /proc/<pid>, (when combined with
do_notify_parent() waking tsk->exit_chldwait) is that any process can
check for the exit of any other process (not just direct children) in a
select()/poll() call.
Paul
Hi Paul,
On Thu, 04 Oct 2001, Paul Menage wrote:
>>> The only real user-space solution to this is to have the SIGCHLD
>>> handler somehow cause the select() to return immediately
>>
>>... or implement pselect:
>>http://mesh.eecs.umich.edu/cgi-bin/man2html/usr/share/man/man2/select.2.gz
>
> Agreed, althought that's not a user-space solution. Is there any
> fundamental reason why no-one's implemented pselect()/ppoll() for
> Linux yet?
Missing knowledge and/or demand? It should be pretty easy to
implement.
>>or use sigsetjmp/siglongjmp
>
> Yes, that would probably solve the situation in question, provided
> that siglongjmp() is portably safe. (A comment on LKML in the past
> suggested that it's not safe on cygwin, for example.)
It should be at least portable between different U*X versions. I never
used cygwin though.
Greetings
Christoph
Paul Menage <[email protected]> wrote:
>The only real user-space solution to this is to have the SIGCHLD handler
>somehow cause the select() to return immediately by e.g. writing a byte
>to a looped pipe which is included in the select() readfd set, but this
>seems a little contrived.
I don't think it's contrived --- writing not a byte, but the pid and
return status of the dead child to a pipe is an old but useful trick.
It gives a natural serialisation of child deaths, and also eliminates
the common race where a child dies before its parent has recorded its
pid in a data structure. See it as a safe way of converting an
asynchronous signal to a queued event.
Using pipes to wake up blocking select()s is a useful thing in general,
and often a lot cleaner than using signals when dealing with threads.
>I don't think it's contrived --- writing not a byte, but the pid and
>return status of the dead child to a pipe is an old but useful trick.
>It gives a natural serialisation of child deaths, and also eliminates
>the common race where a child dies before its parent has recorded its
>pid in a data structure. See it as a safe way of converting an
>asynchronous signal to a queued event.
Except that this enhancement is not completely safe, as if you get more
than 1024 children reaped (assuming you send two bytes of pid and two
bytes of status) between checks of the pipe, you'll lose notifications.
Admittedly this should be a problem in the sshd case, but it's not a
perfect solution in general.
At least if you're only using the pipe to stop select() from blocking,
you don't care about overflowing the pipe as there's no important
information in there anyway.
Paul
Paul Menage <[email protected]> wrote:
>Except that this enhancement is not completely safe, as if you get more
>than 1024 children reaped (assuming you send two bytes of pid and two
>bytes of status) between checks of the pipe, you'll lose notifications.
Obviously, but the cases where the number of children is bounded below
1024 are rather frequent
>At least if you're only using the pipe to stop select() from blocking,
>you don't care about overflowing the pipe as there's no important
>information in there anyway.
sure, but then you have to put the pid/exit status somewhere else and
do some signal blocking/unblocking. In either case, it's portable,
which polling on /proc/pid isn't
Mattias Engdeg?rd wrote:
> I don't think it's contrived --- writing not a byte, but the pid and
> return status of the dead child to a pipe is an old but useful trick.
> It gives a natural serialisation of child deaths, and also eliminates
> the common race where a child dies before its parent has recorded its
> pid in a data structure. See it as a safe way of converting an
> asynchronous signal to a queued event.
>
> Using pipes to wake up blocking select()s is a useful thing in general,
> and often a lot cleaner than using signals when dealing with threads.
This mistake is exactly the reason that Netscape 4.x freezes from time
to time on Linux.
It tries to write too many things to a pipe, making the assumption that
this will never happen.
The correct thing to do, which never freezes, is to maintain a queue or
other structure inside your own process. Just write a single byte to
the pipe when a condition flag becomes true for the first time, after
setting the flag (both inside the signal handler). Clear the flag and
handle the pending conditions whenever you read the byte in the select()
loop. A flag can be implemented as a pair of counters if you can't do
atomic sets and clears.
enjoy,
-- Jamie
Paul Menage wrote:
> >or use sigsetjmp/siglongjmp
>
> Yes, that would probably solve the situation in question, provided that
> siglongjmp() is portably safe.
siglongjmp() or longjmp() should be safe on all unix systems, though not
others.
You have to be careful: if you receive a SIGCHLD _while_ handling a
SIGINT (or another signal), you must not siglongjmp() out of the inner
handler.
The simplest way get this right is to make sure that the other signal
handlers mask out SIGCHLD while they are running. You will need to use
sigaction() for this, and that isn't as portable as siglongjmp().
> (A comment on LKML in the past suggested that it's not safe on
> cygwin, for example.)
I wrote that comment. At least two versions of cygwin crash when you
call siglongjmp() from a signal handler inside select(). Unfortunately
I did not distill a reliable test for this, to check later versions of
cygwin.
However, given that that _should_ work, I don't think you can rely
write() working under these conditions in cygwin either. So don't
choose a pipe just for this reason.
> >Both would be portable and not special to child handling.
>
> One advantage of the pollable /proc/<pid>, (when combined with
> do_notify_parent() waking tsk->exit_chldwait) is that any process can
> check for the exit of any other process (not just direct children) in a
> select()/poll() call.
Well, pollable /proc/<pid> is not portable at all. If you are going to
use something linux specific, you may as well use pselect() which is
portable to a number of modern unix systems.
enjoy,
-- Jamie
On Sat, 13 Oct 2001 17:36:19 +0200, Jamie Lokier
<[email protected]> wrote:
>Mattias Engdegård wrote:
>> I don't think it's contrived --- writing not a byte, but the pid and
>> return status of the dead child to a pipe is an old but useful trick.
>> It gives a natural serialisation of child deaths, and also eliminates
>> the common race where a child dies before its parent has recorded its
>> pid in a data structure. See it as a safe way of converting an
>> asynchronous signal to a queued event.
>>
>> Using pipes to wake up blocking select()s is a useful thing in general,
>> and often a lot cleaner than using signals when dealing with threads.
>
>This mistake is exactly the reason that Netscape 4.x freezes from time
>to time on Linux.
>
>It tries to write too many things to a pipe, making the assumption that
>this will never happen.
At one point a few years ago, someone implemented a loadable library
which "fixed" the Netscape problem by knowing what the bug was, doing
things the right way, and presenting the interface Netscape expected.
john