Hello,
Using the 8139too driver, 0.9.15c, we have noticed that we get a defunct
thread for each device we have; if the driver is built into the kernel.
If the driver is built as a module, no defunct threads appear.
This has happened with any 2.4 kernel we've used, up to and including
2.4.3.
Below is the output from a custom board (but the problem also shows up
with a standard PCI card with RTL-8139B) with three RealTek RTL8139
chipsets on it.
[root@stewart-nw34 /root]# ps uaxwwww|grep eth
root 14 0.0 0.0 0 0 ? Z 13:39 0:00 [eth0 <defunct>]
root 15 0.0 0.0 0 0 ? Z 13:39 0:00 [eth1 <defunct>]
root 16 0.0 0.0 0 0 ? Z 13:39 0:00 [eth2 <defunct>]
root 240 0.0 0.0 0 0 ? SW 13:39 0:00 [eth0]
root 572 0.0 0.0 0 0 pts/1 SW 13:49 0:00 [eth1]
root 538 0.0 0.4 1216 460 pts/0 S 13:41 0:00 grep eth
8139too Fast Ethernet driver 0.9.15c loaded
PCI: Enabling device 00:05.0 (0000 -> 0003)
PCI: Assigned IRQ 6 for device 00:05.0
PCI: Setting latency timer of device 00:05.0 to 64
eth0: RealTek RTL8139 Fast Ethernet at 0xc7800000, 00:10:57:01:00:19,
IRQ 6
eth0: Identified 8139 chip type 'RTL-8139C'
PCI: Enabling device 00:09.0 (0000 -> 0003)
PCI: Assigned IRQ 6 for device 00:09.0
PCI: Setting latency timer of device 00:09.0 to 64
eth1: RealTek RTL8139 Fast Ethernet at 0xc7802100, 00:10:57:02:00:19,
IRQ 6
eth1: Identified 8139 chip type 'RTL-8139C'
PCI: Enabling device 00:0a.0 (0000 -> 0003)
PCI: Assigned IRQ 6 for device 00:0a.0
PCI: Setting latency timer of device 00:0a.0 to 64
eth2: RealTek RTL8139 Fast Ethernet at 0xc7804200, 00:10:57:03:00:19,
IRQ 6
eth2: Identified 8139 chip type 'RTL-8139C'
I'm not certain if this is supposed to be expected behaviour or not, if it
is we'll tell QA to ignore it.
Thanks,
-Rms
Rod Stewart wrote:
>
> Hello,
>
> Using the 8139too driver, 0.9.15c, we have noticed that we get a defunct
> thread for each device we have; if the driver is built into the kernel.
> If the driver is built as a module, no defunct threads appear.
What is the parent PID for the defunct tasks? zero?
<slaps head> swapper doesn't know how to reap children, and
AFAIK there's no way for a kernel thread to fully clean itself
up. This is always done by the parent.
ho-hum. Jeff, I think the best fix here is to bite the
bullet and write kernel_daemon(), which will delegate
thread creation to keventd, which is the only thing
we have which reaps zombies. Any better ideas?
> <slaps head> swapper doesn't know how to reap children, and
> AFAIK there's no way for a kernel thread to fully clean itself
> up. This is always done by the parent.
Make daemonize() move threads with parent 0 to parent 1
On Thu, 12 Apr 2001, Andrew Morton wrote:
> Rod Stewart wrote:
> >
> > Hello,
> >
> > Using the 8139too driver, 0.9.15c, we have noticed that we get a defunct
> > thread for each device we have; if the driver is built into the kernel.
> > If the driver is built as a module, no defunct threads appear.
>
> What is the parent PID for the defunct tasks? zero?
According to ps, 1
[root@stewart-nw34 networking]# ps alexw
F UID PID PPID PRI NI VSZ RSS WCHAN STAT TTY TIME COMMAND
044 0 14 1 9 0 0 0 do_exi Z ? 0:00 [eth0 <defunct>]
044 0 15 1 9 0 0 0 do_exi Z ? 0:00 [eth1 <defunct>]
044 0 16 1 9 0 0 0 do_exi Z ? 0:00 [eth2 <defunct>]
040 0 240 1 9 0 0 0 rtl813 SW ? 0:00 [eth0]
-Rms
Alan Cox wrote:
>
> > <slaps head> swapper doesn't know how to reap children, and
> > AFAIK there's no way for a kernel thread to fully clean itself
> > up. This is always done by the parent.
>
> Make daemonize() move threads with parent 0 to parent 1
Reparenting would require diving inside this lot:
/*
* pointers to (original) parent process, youngest child, younger sibling,
* older sibling, respectively. (p->father can be replaced with
* p->p_pptr->pid)
*/
struct task_struct *p_opptr, *p_pptr, *p_cptr, *p_ysptr, *p_osptr;
struct list_head thread_group;
plus maybe rewriting pgrps, sessions, gids, etc. Challenging.
Plus it would mean that the kernel requires, for its
correct operation, that process "1" is a child reaper.
Is this a good thing?
Rod Stewart wrote:
>
> On Thu, 12 Apr 2001, Andrew Morton wrote:
> > Rod Stewart wrote:
> > >
> > > Hello,
> > >
> > > Using the 8139too driver, 0.9.15c, we have noticed that we get a defunct
> > > thread for each device we have; if the driver is built into the kernel.
> > > If the driver is built as a module, no defunct threads appear.
> >
> > What is the parent PID for the defunct tasks? zero?
>
> According to ps, 1
Ah. Of course. All (or most) kernel initialisation is
done by PID 1. Search for "kernel_thread" in init/main.c
So it seems that in your setup, process 1 is not reaping
children, which is why this hasn't been reported before.
Is there something unusual about your setup?
> Plus it would mean that the kernel requires, for its
> correct operation, that process "1" is a child reaper.
> Is this a good thing?
That is already required. The rest of the reparenting functionality is also
in kernel/exit.c already
On Thu, 12 Apr 2001, Andrew Morton wrote:
> Rod Stewart wrote:
> >
> > On Thu, 12 Apr 2001, Andrew Morton wrote:
> > > Rod Stewart wrote:
> > > >
> > > > Hello,
> > > >
> > > > Using the 8139too driver, 0.9.15c, we have noticed that we get a defunct
> > > > thread for each device we have; if the driver is built into the kernel.
> > > > If the driver is built as a module, no defunct threads appear.
> > >
> > > What is the parent PID for the defunct tasks? zero?
> >
> > According to ps, 1
>
> Ah. Of course. All (or most) kernel initialisation is
> done by PID 1. Search for "kernel_thread" in init/main.c
>
> So it seems that in your setup, process 1 is not reaping
> children, which is why this hasn't been reported before.
> Is there something unusual about your setup?
One box is standard PIII with RH 7.0, the other is a custom Crusoe TM5400
board. But from further investigation it appears to be a kernel config
option. As I've got a 2.4.0 kernel which has very little compiled in and
not showing the problem and another kernel which has many more networking
options built in and showing the problem. I've seen this problem
since 2.4.0.test11.
I'll send a note once I find the config option which is causing this,
probably tomorrow.
Thanks,
-Rms
Rod Stewart wrote:
>
> On Thu, 12 Apr 2001, Andrew Morton wrote:
> > Is there something unusual about your setup?
>
> One box is standard PIII with RH 7.0, the other is a custom Crusoe TM5400
> board. But from further investigation it appears to be a kernel config
> option. As I've got a 2.4.0 kernel which has very little compiled in and
> not showing the problem and another kernel which has many more networking
> options built in and showing the problem. I've seen this problem
> since 2.4.0.test11.
>
Sorry. I meant: what is process 1 on this machine? Is it not
the normal init? If not, then according to Alan, the fault
lies with your userspace. Kernel requires that PID 1 reap
children.
On Thu, 12 Apr 2001, Andrew Morton wrote:
> Rod Stewart wrote:
> >
> > On Thu, 12 Apr 2001, Andrew Morton wrote:
> > > Is there something unusual about your setup?
> >
> > One box is standard PIII with RH 7.0, the other is a custom Crusoe TM5400
> > board. But from further investigation it appears to be a kernel config
> > option. As I've got a 2.4.0 kernel which has very little compiled in and
> > not showing the problem and another kernel which has many more networking
> > options built in and showing the problem. I've seen this problem
> > since 2.4.0.test11.
> >
>
> Sorry. I meant: what is process 1 on this machine? Is it not
> the normal init? If not, then according to Alan, the fault
> lies with your userspace. Kernel requires that PID 1 reap
> children.
Yes, it is the normal init on both boxes.
-Rms
[email protected] said:
> ho-hum. Jeff, I think the best fix here is to bite the bullet and
> write kernel_daemon(), which will delegate thread creation to keventd,
> which is the only thing we have which reaps zombies. Any better
> ideas?
Yes. Let init do it, as God intended. Why reap threads in the kernel when
they could just reparent themselves as children of pid 1?
--
dwmw2
On Thu, 12 Apr 2001, Andrew Morton wrote:
> Rod Stewart wrote:
> >
> > On Thu, 12 Apr 2001, Andrew Morton wrote:
> > > Rod Stewart wrote:
> > > >
> > > > Hello,
> > > >
> > > > Using the 8139too driver, 0.9.15c, we have noticed that we get a defunct
> > > > thread for each device we have; if the driver is built into the kernel.
> > > > If the driver is built as a module, no defunct threads appear.
> > >
> > > What is the parent PID for the defunct tasks? zero?
> >
> > According to ps, 1
>
> Ah. Of course. All (or most) kernel initialisation is
> done by PID 1. Search for "kernel_thread" in init/main.c
>
> So it seems that in your setup, process 1 is not reaping
> children, which is why this hasn't been reported before.
> Is there something unusual about your setup?
I found the difference which causes this. If I build my kernel with
IP_PNP (IP: kernel level autoconfiguration) support I get a defunt thread
for each 8139too device. If I don't build with IP_PNP support I don't get
any, defunct ethernet threads.
This make any sense? Here is the relevant diff from a working config to a
bad one:
[root@stewart-nw34 conf]# diff -u config-p5-good config-p6-bad
--- config-p5-good Fri Apr 13 16:07:10 2001
+++ config-p6-bad Fri Apr 13 16:12:21 2001
@@ -173,7 +173,9 @@
# CONFIG_IP_ROUTE_TOS is not set
# CONFIG_IP_ROUTE_VERBOSE is not set
# CONFIG_IP_ROUTE_LARGE_TABLES is not set
-# CONFIG_IP_PNP is not set
+CONFIG_IP_PNP=y
+# CONFIG_IP_PNP_BOOTP is not set
+# CONFIG_IP_PNP_RARP is not set
CONFIG_NET_IPIP=m
CONFIG_NET_IPGRE=m
# CONFIG_NET_IPGRE_BROADCAST is not set
Thanks,
-Rms
>> Ah. Of course. All (or most) kernel initialisation is
>> done by PID 1. Search for "kernel_thread" in init/main.c
>>
>> So it seems that in your setup, process 1 is not reaping
>> children, which is why this hasn't been reported before.
>> Is there something unusual about your setup?
> I found the difference which causes this. If I build my kernel with
> IP_PNP (IP: kernel level autoconfiguration) support I get a defunt
> thread for each 8139too device. If I don't build with IP_PNP
> support I don't get any, defunct ethernet threads.
Does init(8) reap children that died before it was spawned? I assume
that the defunct tasks were there _before_ init was spawned.
Perhaps init() [in linux/init/main.c] should reap all defunct tasks
before the execve("/sbin/init").
I've attached an untested patch, could you try it?
--
Manfred
On Sat, 14 Apr 2001, Manfred Spraul wrote:
> >> Ah. Of course. All (or most) kernel initialisation is
> >> done by PID 1. Search for "kernel_thread" in init/main.c
> >>
> >> So it seems that in your setup, process 1 is not reaping
> >> children, which is why this hasn't been reported before.
> >> Is there something unusual about your setup?
>
> > I found the difference which causes this. If I build my kernel with
> > IP_PNP (IP: kernel level autoconfiguration) support I get a defunt
> > thread for each 8139too device. If I don't build with IP_PNP
> > support I don't get any, defunct ethernet threads.
>
> Does init(8) reap children that died before it was spawned? I assume
> that the defunct tasks were there _before_ init was spawned.
>
> Perhaps init() [in linux/init/main.c] should reap all defunct tasks
> before the execve("/sbin/init").
>
> I've attached an untested patch, could you try it?
Yes, that fixes my problem. No more defunct eth? processes when IP_PNP is
compiled in. With the fix you said to the patch; replacing curtask with
current.
Thanks,
-Rms
// $Header$
// Kernel Version:
// VERSION = 2
// PATCHLEVEL = 4
// SUBLEVEL = 3
// EXTRAVERSION = -ac3
--- 2.4/init/main.c Sat Apr 7 22:02:27 2001
+++ build-2.4/init/main.c Sat Apr 14 19:18:34 2001
@@ -883,6 +883,13 @@
(void) dup(0);
(void) dup(0);
+
+ while (waitpid(-1, (unsigned int *)0, __WALL|WNOHANG) > 0)
+ ;
+ spin_lock_irq(¤t->sigmask_lock);
+ flush_signals(current);
+ recalc_sigpending(current);
+ spin_unlock_irq(¤t->sigmask_lock);
/*
* We try each of these until one succeeds.
--- 2.4/kernel/context.c Fri Feb 2 15:20:37 2001
+++ build-2.4/kernel/context.c Sat Apr 14 19:09:10 2001
@@ -101,8 +101,10 @@
if (signal_pending(curtask)) {
while (waitpid(-1, (unsigned int *)0, __WALL|WNOHANG) > 0)
;
+ spin_lock_irq(&curtask->sigmask_lock);
flush_signals(curtask);
recalc_sigpending(curtask);
+ spin_unlock_irq(&curtask->sigmask_lock);
}
}
}
> Rod's init version (from RH 7.0) doesn't reap children that died before
> it was started. Is that an init bug or should the kernel reap them
> before the execve?
I would say thats an init bug
> The attached patch reaps all zombies before the execve("/sbin/init").
That has an implicit race, a zombie can always appear as we are execing init.
I think init wants fixing
From: "Alan Cox" <[email protected]>
>
> That has an implicit race, a zombie can always appear as we are
execing init.
> I think init wants fixing
>
Rod, could you boot again with the unpatched kernel and send a sigchild
to init?
#kill -CHLD 1
If I understand the init code correctly the sigchild handler reaps all
zombies, but probably the signal got lost because the children died
before the parent was created ;-)
--
Manfred
Hi,
On Sat, Apr 14, 2001 at 07:53:28PM +0100, Alan Cox wrote:
> > Rod's init version (from RH 7.0) doesn't reap children that died before
> > it was started. Is that an init bug or should the kernel reap them
> > before the execve?
> I would say thats an init bug
It doesn't seem to be that simple.
Redhat's init does child reaping in its SIGCHLD handler using the
following:
while((pid = waitpid(-1, &st, WNOHANG)) != 0) {
if (errno == ECHILD) break;
/* do some stuff, nothing which could break out of the loop */
}
This should reap all leftover childs from kernel startup when init
receives SIGCHLD for the first time, but somehow the kernel seems to
skip them while searching for a dead process in sys_wait4(). I can't
do any further testing because I don't have a 8139 NIC, but I can't
find a problem in init's child reaping code.
Please tell me if I'm missing something, but I think this is really a
kernel issue, not a bug in init.
Andreas
--
I've finally learned what "upward compatible" means. It means we get to
keep all our old mistakes.
-- Dennie van Tassel
On Sat, 14 Apr 2001, Manfred Spraul wrote:
> From: "Alan Cox" <[email protected]>
> >
> > That has an implicit race, a zombie can always appear as we are
> execing init.
> > I think init wants fixing
> >
> Rod, could you boot again with the unpatched kernel and send a sigchild
> to init?
>
> #kill -CHLD 1
>
> If I understand the init code correctly the sigchild handler reaps all
> zombies, but probably the signal got lost because the children died
> before the parent was created ;-)
That doesn't 'fix' it. The thing I find funny is that it only appears
when IP_PNP is compiled in. It is as if the driver ends up in some weird
state when IP_PNP is used. According to ps, from my limited
understanding, the thread is stuck in do_exit
[root@stewart-nw34 /root]# ps elaxww|grep eth
F UID PID PPID PRI NI VSZ RSS WCHAN STAT TTY TIME COMMAND
044 0 7 1 9 0 0 0 do_exi Z ? 0:00 [eth0 <defunct>]
044 0 8 1 9 0 0 0 do_exi Z ? 0:00 [eth1 <defunct>]
044 0 9 1 9 0 0 0 do_exi Z ? 0:00 [eth2 <defunct>]
040 0 229 1 9 0 0 0 rtl813 SW ? 0:00 [eth1]
Thanks for helping with this,
-Rms
diff -ur 2.4/arch/alpha/kernel/entry.S build-2.4/arch/alpha/kernel/entry.S
--- 2.4/arch/alpha/kernel/entry.S Sun Sep 3 20:36:45 2000
+++ build-2.4/arch/alpha/kernel/entry.S Sun Apr 15 14:58:01 2001
@@ -242,12 +242,12 @@
subq $30,4*8,$30
stq $10,16($30)
stq $9,8($30)
- lda $0,CLONE_VM
+ lda $0,CLONE_VM|SIGCHLD
stq $26,0($30)
.prologue 1
mov $16,$9 /* save fn */
mov $17,$10 /* save arg */
- or $18,$0,$16 /* shuffle flags to front; add CLONE_VM. */
+ or $18,$0,$16 /* shuffle flags to front; add CLONE_VM|SIGCHLD. */
bsr $26,kernel_clone
bne $20,1f /* $20 is non-zero in child */
ldq $26,0($30)
diff -ur 2.4/arch/arm/kernel/process.c build-2.4/arch/arm/kernel/process.c
--- 2.4/arch/arm/kernel/process.c Thu Feb 22 22:28:51 2001
+++ build-2.4/arch/arm/kernel/process.c Sun Apr 15 14:51:08 2001
@@ -368,6 +368,8 @@
{
pid_t __ret;
+ flags |= SIGCHLD;
+
__asm__ __volatile__(
"orr r0, %1, %2 @ kernel_thread sys_clone
mov r1, #0
diff -ur 2.4/arch/cris/kernel/process.c build-2.4/arch/cris/kernel/process.c
--- 2.4/arch/cris/kernel/process.c Sat Apr 7 22:01:49 2001
+++ build-2.4/arch/cris/kernel/process.c Sun Apr 15 14:51:16 2001
@@ -127,6 +127,8 @@
int kernel_thread(int (*fn)(void *), void * arg, unsigned long flags)
{
register long __a __asm__ ("r10");
+
+ flags |= SIGCHLD;
__asm__ __volatile__
("movu.w %1,r9\n\t" /* r9 contains syscall number, to sys_clone */
diff -ur 2.4/arch/i386/kernel/process.c build-2.4/arch/i386/kernel/process.c
--- 2.4/arch/i386/kernel/process.c Thu Feb 22 22:28:52 2001
+++ build-2.4/arch/i386/kernel/process.c Sun Apr 15 14:40:43 2001
@@ -440,6 +440,8 @@
{
long retval, d0;
+ flags |= SIGCHLD;
+
__asm__ __volatile__(
"movl %%esp,%%esi\n\t"
"int $0x80\n\t" /* Linux/i386 system call */
diff -ur 2.4/arch/ia64/kernel/process.c build-2.4/arch/ia64/kernel/process.c
--- 2.4/arch/ia64/kernel/process.c Thu Jan 4 21:50:17 2001
+++ build-2.4/arch/ia64/kernel/process.c Sun Apr 15 14:51:44 2001
@@ -500,7 +500,7 @@
struct task_struct *parent = current;
int result, tid;
- tid = clone(flags | CLONE_VM, 0);
+ tid = clone(flags | CLONE_VM | SIGCHLD, 0);
if (parent != current) {
result = (*fn)(arg);
_exit(result);
diff -ur 2.4/arch/m68k/kernel/process.c build-2.4/arch/m68k/kernel/process.c
--- 2.4/arch/m68k/kernel/process.c Thu Feb 22 22:28:54 2001
+++ build-2.4/arch/m68k/kernel/process.c Sun Apr 15 14:51:58 2001
@@ -135,7 +135,7 @@
{
register long retval __asm__ ("d0");
- register long clone_arg __asm__ ("d1") = flags | CLONE_VM;
+ register long clone_arg __asm__ ("d1") = flags | CLONE_VM | SIGCHLD;
__asm__ __volatile__
("clrl %%d2\n\t"
diff -ur 2.4/arch/mips/kernel/process.c build-2.4/arch/mips/kernel/process.c
--- 2.4/arch/mips/kernel/process.c Sat Apr 7 22:01:56 2001
+++ build-2.4/arch/mips/kernel/process.c Sun Apr 15 14:52:12 2001
@@ -161,6 +161,8 @@
{
long retval;
+ flags |= SIGCHLD;
+
__asm__ __volatile__(
".set\tnoreorder\n\t"
"move\t$6,$sp\n\t"
diff -ur 2.4/arch/mips64/kernel/process.c build-2.4/arch/mips64/kernel/process.c
--- 2.4/arch/mips64/kernel/process.c Thu Feb 22 22:28:55 2001
+++ build-2.4/arch/mips64/kernel/process.c Sun Apr 15 14:52:21 2001
@@ -154,6 +154,8 @@
{
int retval;
+ flags |= SIGCHLD;
+
__asm__ __volatile__(
"move\t$6, $sp\n\t"
"move\t$4, %5\n\t"
diff -ur 2.4/arch/parisc/kernel/entry.S build-2.4/arch/parisc/kernel/entry.S
--- 2.4/arch/parisc/kernel/entry.S Sat Apr 7 22:01:58 2001
+++ build-2.4/arch/parisc/kernel/entry.S Sun Apr 15 14:56:58 2001
@@ -497,7 +497,7 @@
#endif
STREG %r26, PT_GR26(%r1) /* Store function & argument for child */
STREG %r25, PT_GR25(%r1)
- ldo CLONE_VM(%r0), %r26 /* Force CLONE_VM since only init_mm */
+ ldo SIGCHLD|CLONE_VM(%r0), %r26 /* Force CLONE_VM since only init_mm */
or %r26, %r24, %r26 /* will have kernel mappings. */
copy %r0, %r25
bl do_fork, %r2
diff -ur 2.4/arch/ppc/kernel/misc.S build-2.4/arch/ppc/kernel/misc.S
--- 2.4/arch/ppc/kernel/misc.S Sat Apr 7 22:01:59 2001
+++ build-2.4/arch/ppc/kernel/misc.S Sun Apr 15 14:56:17 2001
@@ -1080,7 +1080,7 @@
*/
_GLOBAL(kernel_thread)
mr r6,r3 /* function */
- ori r3,r5,CLONE_VM /* flags */
+ ori r3,r5,CLONE_VM|SIGCHLD /* flags */
li r0,__NR_clone
sc
cmpi 0,r3,0 /* parent or child? */
diff -ur 2.4/arch/s390/kernel/process.c build-2.4/arch/s390/kernel/process.c
--- 2.4/arch/s390/kernel/process.c Thu Feb 22 22:28:57 2001
+++ build-2.4/arch/s390/kernel/process.c Sun Apr 15 14:52:58 2001
@@ -242,7 +242,7 @@
int kernel_thread(int (*fn)(void *), void * arg, unsigned long flags)
{
- int clone_arg = flags | CLONE_VM;
+ int clone_arg = flags | CLONE_VM | SIGCHLD;
int retval;
__asm__ __volatile__(
diff -ur 2.4/arch/s390x/kernel/process.c build-2.4/arch/s390x/kernel/process.c
--- 2.4/arch/s390x/kernel/process.c Thu Feb 22 22:28:57 2001
+++ build-2.4/arch/s390x/kernel/process.c Sun Apr 15 14:53:06 2001
@@ -242,7 +242,7 @@
int kernel_thread(int (*fn)(void *), void * arg, unsigned long flags)
{
- int clone_arg = flags | CLONE_VM;
+ int clone_arg = flags | CLONE_VM | SIGCHLD;
int retval;
__asm__ __volatile__(
diff -ur 2.4/arch/sh/kernel/process.c build-2.4/arch/sh/kernel/process.c
--- 2.4/arch/sh/kernel/process.c Thu Feb 22 22:28:57 2001
+++ build-2.4/arch/sh/kernel/process.c Sun Apr 15 14:53:16 2001
@@ -148,7 +148,7 @@
{ /* Don't use this in BL=1(cli). Or else, CPU resets! */
register unsigned long __sc0 __asm__ ("r0");
register unsigned long __sc3 __asm__ ("r3") = __NR_clone;
- register unsigned long __sc4 __asm__ ("r4") = (long) flags | CLONE_VM;
+ register unsigned long __sc4 __asm__ ("r4") = (long) flags | CLONE_VM | SIGCHLD;
register unsigned long __sc5 __asm__ ("r5") = 0;
register unsigned long __sc8 __asm__ ("r8") = (long) arg;
register unsigned long __sc9 __asm__ ("r9") = (long) fn;
diff -ur 2.4/arch/sparc/kernel/process.c build-2.4/arch/sparc/kernel/process.c
--- 2.4/arch/sparc/kernel/process.c Thu Feb 22 22:28:57 2001
+++ build-2.4/arch/sparc/kernel/process.c Sun Apr 15 14:53:39 2001
@@ -678,6 +678,8 @@
{
long retval;
+ flags |= SIGCHLD;
+
__asm__ __volatile("mov %4, %%g2\n\t" /* Set aside fn ptr... */
"mov %5, %%g3\n\t" /* and arg. */
"mov %1, %%g1\n\t"
diff -ur 2.4/arch/sparc64/kernel/process.c build-2.4/arch/sparc64/kernel/process.c
--- 2.4/arch/sparc64/kernel/process.c Sat Mar 31 21:47:37 2001
+++ build-2.4/arch/sparc64/kernel/process.c Sun Apr 15 14:53:49 2001
@@ -647,6 +647,8 @@
{
long retval;
+ flags |= SIGCHLD;
+
/* If the parent runs before fn(arg) is called by the child,
* the input registers of this function can be clobbered.
* So we stash 'fn' and 'arg' into global registers which
On Sun, 15 Apr 2001, Manfred Spraul wrote:
> Alan, which fix would you prefer:
> * init could use wait3 and set __WALL.
> * all kernel thread users could set SIGCHLD. Some already do that
> (__call_usermodehelper).
> * the kernel_thread implementations could force the exit signal to
> SIGCHLD.
>
> I'd prefer the last version.
> The attached patch is tested with i386. The alpha, parisc and ppc
> assember changes are guessed.
This patch fixed my problem.
Thanks,
-Rms
Manfred Spraul wrote:
>
> I found the problem:
>
> * init uses waitpid(-1,,), thus the __WALL flag is not set
> * without __WALL, only processes with exit_signal == SIGCHLD are reaped
> * it's impossible for user space processes to die with another
> exit_signal, forget_original_parent changes the exit_signal back to
> SIGCHLD ("We dont want people slaying init"), and init itself doesn't
> use clone.
> * kernel threads can die with an arbitrary exit_signal.
yep
> Alan, which fix would you prefer:
> * init could use wait3 and set __WALL.
> * all kernel thread users could set SIGCHLD. Some already do that
> (__call_usermodehelper).
> * the kernel_thread implementations could force the exit signal to
> SIGCHLD.
* Add SIGCHLD to all the users of kernel_thread(), in the cases
where the thread can ever exit.
* Add
if (current->exit_signal == 0)
current->exit_signal = SIGCHLD;
to daemonize().
None of these will work. The problems with globally setting exit_signal
to SIGCHLD are that
a) If the parent does waitpid(pid, status, __WCLONE), the
waitpid will fail. request_module() does this. I don't
know _why_ it does this. Maybe it's bogus. There is no
explanation.
b) When the kernel thread exits, it will send a SIGCHLD to
its parent. But the parent is not necessarily init! It
could be a userspace process (in this case, ifconfig).
The kernel has no business spraying signals out to userspace
tasks just because they happened to open a network interface.
And for this reason we can't just go in and change 8139too.c
to use SIGCHLD.
So it seems that we must reparent the thread to init, and
make sure that it delivers SIGCHLD to init when it exits.
The below patch does this, within daemonize(). But this precise
area was the source of serial screwups when I was doing the
call_usermodehelper() stuff, and I bet this approach will
still have problems.
The exit_files() will take care of releasing things like current->tty
and pwd. But we still do not know what scheduling priority and policy
we inherited from the userspace parent, nor do we know what signal mask
we have, nor do we know what uid we're running as. Resource limits.
Capabilties. CPU mask. Etcetera, etcetera. All this stuff comes back
to bite. Been there, got the scars :)
This is why I believe that kernel daemons should be launched by
keventd. They belong to the *kernel*, not to userspace parents.
--- linux-2.4.4-pre3/kernel/sched.c Sun Apr 15 15:34:25 2001
+++ linux-akpm/kernel/sched.c Sun Apr 15 21:59:26 2001
@@ -1260,32 +1260,53 @@
/*
* Put all the gunge required to become a kernel thread without
* attached user resources in one place where it belongs.
+ *
+ * Kernel 2.4.4-pre3, [email protected]: reparent the caller
+ * to init and set the exit signal to SIGCHLD so the thread
+ * will be properly reaped if it exits.
*/
void daemonize(void)
{
struct fs_struct *fs;
-
+ struct task_struct *this_task = current;
/*
* If we were started as result of loading a module, close all of the
* user space pages. We don't need them, and if we didn't close them
* they would be locked into memory.
*/
- exit_mm(current);
+ exit_mm(this_task);
- current->session = 1;
- current->pgrp = 1;
+ this_task->session = 1;
+ this_task->pgrp = 1;
/* Become as one with the init task */
- exit_fs(current); /* current->fs->count--; */
+ exit_fs(this_task); /* this_task->fs->count--; */
fs = init_task.fs;
- current->fs = fs;
+ this_task->fs = fs;
atomic_inc(&fs->count);
- exit_files(current);
- current->files = init_task.files;
- atomic_inc(¤t->files->count);
+ exit_files(this_task);
+ this_task->files = init_task.files;
+ atomic_inc(&this_task->files->count);
+
+ write_lock_irq(&tasklist_lock);
+
+ /* Reparent to init */
+ REMOVE_LINKS(this_task);
+ this_task->p_opptr = child_reaper;
+ this_task->p_pptr = child_reaper;
+ SET_LINKS(this_task);
+
+ /* Set the exit signal to SIGCHLD so we signal init on exit */
+ if (this_task->exit_signal ! 0) {
+ printk(KERN_ERR "task `%s' exit_signal %d in daemonize()\n",
+ this_task->comm, this_task->exit_signal);
+ }
+ this_task->exit_signal = SIGCHLD;
+
+ write_unlock_irq(&tasklist_lock);
}
void __init init_idle(void)
Andrew Morton <[email protected]> writes:
[...]
> None of these will work. The problems with globally setting
> exit_signal to SIGCHLD are that
>
> a) If the parent does waitpid(pid, status, __WCLONE), the
> waitpid will fail. request_module() does this. I don't
> know _why_ it does this. Maybe it's bogus. There is no
> explanation.
waitpid doesn't work on cloned children unless you put in __WCLONE or
__WALL, so this was necessary to catch the child at all. If you set to
use SIGCHLD this will no longer be needed (if I understand correctly).
[...]
> So it seems that we must reparent the thread to init, and
> make sure that it delivers SIGCHLD to init when it exits.
Sounds good. Why isn't SIGCHLD a stronger default anyway.
[...]
> + /* Set the exit signal to SIGCHLD so we signal init on exit */
> + if (this_task->exit_signal ! 0) {
Tyop.
> + printk(KERN_ERR "task `%s' exit_signal %d in daemonize()\n",
> + this_task->comm, this_task->exit_signal);
> + }
> + this_task->exit_signal = SIGCHLD;
> +
> + write_unlock_irq(&tasklist_lock);
> }
>
> void __init init_idle(void)
>
--
http://www.penguinpowered.com/~vii
John Fremlin wrote:
>
>
> > So it seems that we must reparent the thread to init, and
> > make sure that it delivers SIGCHLD to init when it exits.
>
> Sounds good. Why isn't SIGCHLD a stronger default anyway.
mm? The caller gets to choose...
> [...]
>
> > + /* Set the exit signal to SIGCHLD so we signal init on exit */
> > + if (this_task->exit_signal ! 0) {
>
> Tyop.
aargh. Thanks. So that's what `cvs commit' does :)
The patch is OK with this converted into `!='. But I'll
refresh and retest for tomorrow. Still not very happy with
this approach though...