Well, the unkillable process continues on. Does nobody else have any ideas on
how to kill an unkillable process in the R state thats sucking up all my unused
cpu cycles?
If not I'm going to have to reboot this thing...
Chris
--
Chris Friesen | MailStop: 043/33/F10
Nortel Networks | work: (613) 765-0557
3500 Carling Avenue | fax: (613) 765-2986
Nepean, ON K2H 8E9 Canada | email: [email protected]
On Thu, 11 Oct 2001, Christopher Friesen wrote:
> Well, the unkillable process continues on. Does nobody else have any ideas on
> how to kill an unkillable process in the R state thats sucking up all my unused
> cpu cycles?
>
> If not I'm going to have to reboot this thing...
Short term hack: renice it to 20, so it doesn't interfere with normal
workload. Also try sending it a SIGSTOP, although I doubt that will work
here. I think strace will fail the same way gdb does, but try that too...
James.
--
"Our attitude with TCP/IP is, `Hey, we'll do it, but don't make a big
system, because we can't fix it if it breaks -- nobody can.'"
"TCP/IP is OK if you've got a little informal club, and it doesn't make
any difference if it takes a while to fix it."
-- Ken Olson, in Digital News, 1988
Christopher Friesen writes:
> Well, the unkillable process continues on. Does nobody else have any ideas on
> how to kill an unkillable process in the R state thats sucking up all my unused
> cpu cycles?
I would suspect that it is actually looping inside the kernel, which
would mean that there indeed was no way to kill it. You could try
alt-scrolllock on the console and see if you get a register dump of
it, or maybe one of the alt-sysrq magic keys might give you some
information. But I suspect that rebooting is ultimately going to be
your only solution.
Paul.
Christopher Friesen wrote:
>
> Well, the unkillable process continues on. Does nobody else have any ideas on
> how to kill an unkillable process in the R state thats sucking up all my unused
> cpu cycles?
>
> If not I'm going to have to reboot this thing...
>
Well, I'd suspect it in "D" state - waiting for some disk I/O to
finish...
But in "R" with your described behavior looks like a bug.
If you care about the CPU time waisted: what about kill -STOP <pid>?
Can you describe your filesystem layout?
I think of a symlink recursion bug or something wrong in /dev/shm
or alike... (no flame, just guessing :)
What are the parameters of "find"?
PW> Well, I'd suspect it in "D" state - waiting for some disk I/O to
PW> finish...
If a process is stuck in D state it's a kernel bug - I
don't think it's ever legitimate to wait forever for something
which could never happen. However, some such bugs are rarely
happening (e.g. a swapin failure due to hdd malfunction)
and thus will unlikely be fixed.
PW> But in "R" with your described behavior looks like a bug.
PW> If you care about the CPU time waisted: what about kill -STOP <pid>?
R state unkillable hang is possible too (infinite loop in kernel
preventing return from a syscall).
In short, in my understanding any syscall should return sooner
or later in order to process to be killable. Anything preventing
that is a kernel bug.
However, I'm not a UNIX guru, I may be wrong.
I really like to be enlightened if I'm wrong.
--
Best regards, vda
mailto:[email protected]
fre, 2001-10-12 kl. 08:16 skrev Paul Mackerras:
Christopher Friesen writes:
> Well, the unkillable process continues on. Does nobody else have any ideas on
> how to kill an unkillable process in the R state thats sucking up all my unused
> cpu cycles?
I would suspect that it is actually looping inside the kernel, which
would mean that there indeed was no way to kill it. You could try
alt-scrolllock on the console and see if you get a register dump of
it, or maybe one of the alt-sysrq magic keys might give you some
information. But I suspect that rebooting is ultimately going to be
your only solution.
You might find out if it's looping inside the kernel by doing strace -p
<pid>, if you're stuck in a syscall, I *belive* strace'll tell you.
You wouldn't by any chance be developing a kernel module??
Paul.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
--
_________________________________________________________________________
Terje Eggestad [email protected]
Scali Scalable Linux Systems http://www.scali.com
Olaf Helsets Vei 6 tel: +47 22 62 89 61 (OFFICE)
P.O.Box 70 Bogerud +47 975 31 574 (MOBILE)
N-0621 Oslo fax: +47 22 62 89 51
NORWAY
_________________________________________________________________________
"Friesen, Christopher [CAR:3R60:EXCH]" wrote:
Well, I've rebooted the thing.
It appears that whatever it was looping on was in the kernel. I suspect that it
has something to do with NFS--this is a 2.2.17 kernel and we ran into some
issues with it and NFS on some other systems.
Thanks for the help guys...unfortunately I rebooted before getting the messages
about checking in /proc--it would have been interesting to see what it was
doing.
Chris
--
Chris Friesen | MailStop: 043/33/F10
Nortel Networks | work: (613) 765-0557
3500 Carling Avenue | fax: (613) 765-2986
Nepean, ON K2H 8E9 Canada | email: [email protected]