2007-09-09 09:31:58

by Yishai Hadas

[permalink] [raw]
Subject: Health monitor of a multi-threaded process

Hi List,

I'm looking for any mechanism in a multi-threaded process to monitor the
health of its running threads - or by a specific monitor thread or by
any other mechanism.

It includes the following aspects:

1) Threads are running and not stuck on any lock.
2) Threads are running and have not died accidentally.
3) Threads are not consuming "too much" CPU/Memory.
4) Threads are not in any infinite loop.


My questions:

1. Which kernel APIs can help here?
2. Which kernel APIs can supply CPU/Memory information on a given thread
ID?
3. Is there any non-intrusive mechanism to achieve those goals?



I found that in kernel 2.6:

Reading the content of /proc/process_id/task_id/ can supply some
information per running thread in a given process.

Is it a recommended way to go with?
What about kernel 2.4 as this information doesn't exist per thread?


My questions relates to Kernel 2.4 and also to newer (e.g. 2.6) Kernel
versions.

Any help will be appreciated.
Thanks,

Yishai.
**********************************************************************************************

The contents of this email and any attachments are confidential.
It is intended for the named recipient(s) only.
If you have received this email in error please notify the system manager or? the
sender immediately and do not disclose the contents to anyone or make copies.
** eSafe scanned this email for viruses, vandals and malicious content **

**********************************************************************************************


2007-09-10 15:35:37

by Chris Snook

[permalink] [raw]
Subject: Re: Health monitor of a multi-threaded process

Yishai Hadas wrote:
> Hi List,
>
> I'm looking for any mechanism in a multi-threaded process to monitor the
> health of its running threads - or by a specific monitor thread or by
> any other mechanism.
>
> It includes the following aspects:
>
> 1) Threads are running and not stuck on any lock.

If you're using posix locking, you'll never find yourself busy-waiting for very
long. Use ps or top.

> 2) Threads are running and have not died accidentally.

Use ps or top.

> 3) Threads are not consuming "too much" CPU/Memory.

Use ps or top. You'll have to decide how much is "too much".

> 4) Threads are not in any infinite loop.

This requires solving the Halting Problem. If your management is demanding this
feature, I suggest informing them that it is mathematically impossible.

Just use top or ps. Don't reinvent the wheel. We've got a really good wheel.
If you don't like top or ps as is, read the ps man page to see all the fancy
formatting it can do, and parse it with a simple script in your favorite
scripting language.

-- Chris

2007-09-10 20:50:33

by David Schwartz

[permalink] [raw]
Subject: RE: Health monitor of a multi-threaded process


> > 4) Threads are not in any infinite loop.

> This requires solving the Halting Problem. If your management is
> demanding this
> feature, I suggest informing them that it is mathematically impossible.

Christ, these academics! They take real world problems that engineers
actually *solve* every day and then "prove" they're impossible.

Actually, it's trivial to solve this. Just wait. If the thread terminates,
it wasn't in an infinite loop.

If you've got the budget, I believe SGI has a machine that can do an
infinite loop in less than five seconds. That will save a lot of waiting.

DS