2003-09-12 02:59:30

by Xuân Baldauf

[permalink] [raw]
Subject: "busy" load counters

Currently, tools like "top" show stats like

Cpu(s): 92.1% user, 6.9% system, 0.0% nice, 1.0% idle

Unfortunately, these stats are not sufficient to determine wether the
system is "busy". Determining wether the system is "busy" is very useful
in case an interactive application (e.g. a shell or some shell command)
does not respond.
Maybe it just hangs (waits for input) or does serious work (e.g. uses
the CPU or accesses the disk). Disk access is not visible in "top".
Depending on the machine, on disk accesses, there might be a slight or
significant rise in the "system" portion of those stats, but this is not
trustable.

I'd like a new stat "busy", which simply is one minus the time, when the
system is idle but does _not_ have outstanding IO requests. Users may
judge from this stat, wether their application waits for input or just
needs some time. This way, they know better what to do when they get
impatient, and they now it faster. (Yes, they can know it by looking up
all processes of their application, strace them and check wether the
actions observed involve just waiting and polling or maybe IO. But this
is very tedious.)

How do you think about this? Would kernel hackers oppose such a
"feature" for any reason?

Xu?n.



2003-09-13 07:12:55

by Albert Cahalan

[permalink] [raw]
Subject: Re: "busy" load counters

Xu?n Baldauf writes:

> Currently, tools like "top" show stats like
>
> Cpu(s): 92.1% user, 6.9% system, 0.0% nice, 1.0% idle
>
> Unfortunately, these stats are not sufficient to determine wether the
> system is "busy". Determining wether the system is "busy" is very useful
> in case an interactive application (e.g. a shell or some shell command)
> does not respond.
> Maybe it just hangs (waits for input) or does serious work (e.g. uses
> the CPU or accesses the disk). Disk access is not visible in "top".
> Depending on the machine, on disk accesses, there might be a slight or
> significant rise in the "system" portion of those stats, but this is not
> trustable.

The feature is available, but you'll need to upgrade
to procps-3.1.12 and linux-2.6.0-test4 at least.

http://www.kernel.org/pub/linux/kernel/v2.6/
http://procps.sf.net/

Once you've done that, both "top" and "vmstat" will
supply the info you want. There are 7 basic %CPU stats
right now:

us regular user apps
sy system (general kernel stuff)
ni nice user apps (low-priority tasks)
id idle
wa waiting for IO to complete
hi hard interrupt (IRQ) handlers
si soft interrupt (network stack, mostly?) handlers

The "top" program shows all of those. The "vmstat"
program mixes "ni" into "us", and mixes "hi" and "si"
into "sy". An example for each:

procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
0 0 0 6896 2668 108896 0 0 0 1 34 14 10 3 87 0

top - 02:56:17 up 12 days, 13:43, 25 users, load average: 0.37, 0.25, 0.22
Tasks: 129 total, 4 running, 124 sleeping, 1 stopped, 0 zombie
Cpu(s): 8.6% us, 5.6% sy, 0.0% ni, 85.8% id, 0.0% wa, 0.0% hi, 0.0% si
Mem: 513924k total, 507068k used, 6856k free, 2664k buffers
Swap: 0k total, 0k used, 0k free, 108844k cached


2003-09-13 08:36:40

by Eric Dumazet

[permalink] [raw]
Subject: Re: "busy" load counters


From: "Albert Cahalan" <[email protected]>
> The feature is available, but you'll need to upgrade
> to procps-3.1.12 and linux-2.6.0-test4 at least.
>
> http://www.kernel.org/pub/linux/kernel/v2.6/
> http://procps.sf.net/
>

With procps-3.1.12 and linux-2.6.0-test5, top and ps reports 0.00 time for
multi-threaded programs.

It seems only the 'main' thread is now visible in /proc, and its cpu time
dont include the cpu time of other threads...

Example :

root 1238 0.0 0.0 2692 1576 pts/1 S 08:13 0:00 /bin/bash
root 1293 0.0 0.0 2692 1576 pts/1 S 08:14 0:00 /bin/bash
root 1298 0.0 19.5 716016 658460 pts/1 S 08:14 0:00 ./server
! THIS process certainly has wrong TIME
root 2465 0.0 0.0 2672 1564 pts/2 S 10:19 0:00 -bash


2003-09-13 15:51:50

by Albert Cahalan

[permalink] [raw]
Subject: Re: "busy" load counters

On Sat, 2003-09-13 at 04:36, dada1 wrote:
> From: "Albert Cahalan" <[email protected]>
> > The feature is available, but you'll need to upgrade
> > to procps-3.1.12 and linux-2.6.0-test4 at least.
> >
> > http://www.kernel.org/pub/linux/kernel/v2.6/
> > http://procps.sf.net/
> >
>
> With procps-3.1.12 and linux-2.6.0-test5, top and ps reports 0.00 time for
> multi-threaded programs.
>
> It seems only the 'main' thread is now visible in /proc, and its cpu time
> dont include the cpu time of other threads...

This is correct. For now, the kernel does not report the
existance of new-style threads. I intend to deal with this
problem during the coming week.

I could use a bit of help with research. If you have access
to a non-Linux system, please let me know how the native ps
and top programs handle threads. I do know that many non-Linux
implementations will group threads together in ps output.

Ways to display all threads include:

ps -m (Tru64, AIX)
ps m (Tru64, AIX)
ps -T (IRIX)
ps -L (Solaris, UnixWare)
ps H (FreeBSD)
ps k (OpenBSD)
ps s (NetBSD)

Examples:
AIX: ps -eo pid,thcount,tid,comm
Solaris: ps -eLf
Tru64: ps -emO THREAD

Please use [email protected] for this data.
Here's a program you can use for testing; you may need
to compile it as "cc foo.c -lpthread".

////////////////////////////////////////////
#include <unistd.h>
#include <pthread.h>
void *hanger(void *vp){
(void)vp;
for(;;) pause();
}
int main(int argc, char *argv[]){
pthread_t thread;
(void)argc;
(void)argv;
pthread_create(&thread, NULL, hanger, NULL);
hanger(NULL);
return 0; // keep gcc happy
}
/////////////////////////////////////////////