2004-10-14 05:27:08

by Albert Cahalan

[permalink] [raw]
Subject: unkillable process

It's really bad when a task group leader exits.
The process becomes unkillable.

This is with the 2.6.8-rc1 kernel. I haven't seen
any mention of this getting fixed since then.
Here's the top of the /proc/*/status file:

Name: a.out
State: Z (zombie)
SleepAVG: 59%
Tgid: 9662
Pid: 9662
PPid: 1
TracerPid: 0
Uid: 1000 1000 1000 1000
Gid: 1000 1000 1000 1000
FDSize: 0
Groups: 500 1000
Threads: 9

Here's the code:

///////////////////////////////////////////////////////////////
#include <sys/types.h>
#include <unistd.h>
#include <signal.h>
#include <stdio.h>
#include <sched.h>

#ifndef CLONE_THREAD
#define CLONE_THREAD 0x00010000
#endif
#ifndef CLONE_DETACHED
#define CLONE_DETACHED 0x00400000
#endif
#ifndef CLONE_STOPPED
#define CLONE_STOPPED 0x02000000
#endif

#define FLAGS (CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_VM|CLONE_THREAD|CLONE_DETACHED)

static pid_t one;

static void die(int signo){
(void)signo;
_exit(0);
}

static void hang(void){
for(;;) pause();
}

static int clone_fn(void *vp){
(void)vp;
hang();
return 0; // keep gcc happy
}

static long clone_stack_data[2048];
#ifdef __hppa__
static long *clone_stack = &clone_stack_data[0];
#else
static long *clone_stack = &clone_stack_data[2048];
#endif

int main(int argc, char *argv[]){
pid_t minime;
int i = 8;
(void)argc;
(void)argv;

one = getpid();
signal(SIGHUP,die);
if(fork()) hang(); // parent later killed as readyness signal

while(i--){
// better be stopped... they share a stack
minime = clone(clone_fn, clone_stack, FLAGS | CLONE_STOPPED, NULL);
if(minime==-1){
perror("no clone");
kill(one,SIGKILL);
_exit(8);
}
}

kill(one,SIGHUP); // let the shell know we're ready

_exit(0); // make task group leader a zombie
return 0; // keep gcc happy
}
/////////////////////////////////////////////////////////////////////



2004-10-14 07:42:25

by Andrew Morton

[permalink] [raw]
Subject: Re: unkillable process

Albert Cahalan <[email protected]> wrote:
>
> It's really bad when a task group leader exits.
> The process becomes unkillable.
>
> This is with the 2.6.8-rc1 kernel.

That's a pretty old kernel.

> ...
> Here's the code:

I can't get it to misbehave with current -linus. Can you upgrade and retest?

2004-10-14 12:26:12

by Johan Kullstam

[permalink] [raw]
Subject: Re: unkillable process

Albert Cahalan <[email protected]> writes:

> It's really bad when a task group leader exits.
> The process becomes unkillable.

I have been having zombie problems since 2.6.9-rc1. I run a boinc
climateprediction program (related to seti@home) which leaves defunct
"cp" processes about. Killing the climatepredictor (called
hadsm3um_4.03_i686-pc-linux-gnu) which spawns them causes these zombie
cp things to get reaped.

> This is with the 2.6.8-rc1 kernel. I haven't seen
> any mention of this getting fixed since then.
> Here's the top of the /proc/*/status file:

I tried it with 2.6.9-rc3 just now and it doesn't make zombies for
me.

climateprediction still makes defunct cp.

(I fired up 2.6.9-rc4 but it somehow wouldn't load the driver for my
ethernet 3c59x. That's another issue, but I have no idea if the
problem has been fixed there since I am stopped by another problem.)

I skimmed over the changelogs but I have found anything looking like a
change in this area. I am not sure what the right keyword(s) to
search for on this topic would be. I didn't grovel through them yet,
but perhaps someone on the list knows what is going on.

--
Johan KULLSTAM

2004-10-14 15:12:56

by Alex Riesen

[permalink] [raw]
Subject: Re: unkillable process

On 14 Oct 2004 08:26:08 -0400, Johan Kullstam <[email protected]> wrote:
> Albert Cahalan <[email protected]> writes:
>
> > It's really bad when a task group leader exits.
> > The process becomes unkillable.
>
> I have been having zombie problems since 2.6.9-rc1. I run a boinc
> climateprediction program (related to seti@home) which leaves defunct
> "cp" processes about. Killing the climatepredictor (called
> hadsm3um_4.03_i686-pc-linux-gnu) which spawns them causes these zombie
> cp things to get reaped.

I believe this is not related. Just a bug in the program missing
SIGCHLD and not calling waitpid.