2002-11-05 21:33:43

by Albert D. Cahalan

[permalink] [raw]
Subject: Re: ps performance sucks


First of all, sorry to break the threading. I didn't get
a Cc: and the web archives drop most email headers. I'm
going to respond to everyone in a big blob w/o attributions.

> Clearly ps could do with a cleanup. There is no reason to
> read environ if it wasn't asked for. Deciding which files
> are needed based on the command line options would be a

Done. You should be using procps-3.0.5 now. If you're not,
an upgrade is called for. http://procps.sf.net/

(tough luck if you're using some other ps)

Nothing that parses the crap in /proc will ever be fast though.
There's a patch for Linux 2.4.0 that some people might like:

http://www.uwsg.iu.edu/hypermail/linux/kernel/0104.2/1720.html

> Strace it - IIRC it does 5 opens per PID. Vomit.

Nope, it does 2. Perhaps you're not running procps 3 yet?
http://procps.sf.net/

Of course if you do something like "ps ev" you need all 5.

> I'm thinking that ps, top and company are good reasons to
> make an exception of one value per file in proc. Clearly
> open+read+close of 3-5 "files" each extracting data from
> task_struct isn't more efficient than one "file" that
> generates the needed data one field per line.

There are several ways to attack this.

First of all, implement an open_read_close() syscall. Duh.
I expect Hans Reiser would be delighted too. Maybe return
a file descriptor if the file was too big or it blocked.
Maybe provide some basic stat data atomically with the call.

For per-task proc files, one file per kernel lock seems sane.
I haven't looked at how many that would be, and of course it
varies by kernel. So maybe it ends up not being exact; that's OK.

> I think it's pretty trivial to make /proc/<pid>/psinfo, which
> dumps the garbage from all five files in one place. Which makes
> it 5 times better, but it still sucks.

Well, not all the garbage! It'd be nice to have the popular
stuff in a file similar to /proc/*/stat. That would be what ps
needs to support these options: -f -l -F l u v j -j -ly -lc
plus "top". (not counting the process name or args though)

> You could take a more radical approach. Since the goal of such
> a psinfo file would be to accelerate access to information
> that's already available elsewhere, you can do away with many
> of the niceties of procfs, e.g.
>
> - no need to be human-readable (e.g. binary or hex dump may
> make sense in this case)

As long as you expand everything to the biggest data type that
could ever be used, binary is wonderful. Make the ABI be 64-bit
for almost everything, with proper alignment of course. Somebody
slap the person who put a 32-bit ino_t in the latest stat syscall.

> First write says "pid,comm". Internally, this gets translated
> to 0x8c+0x04, 0x2ee+0x10 (offset+length). Next read returns
> "pid 4,comm 16" (include the name, so you can indicate fields
> the kernel doesn't recognize). Then, kmalloc 20*tasks bytes,
> lock, copy the fields from struct task_struct, unlock, let the
> stuff be read by user space, kfree. Adjacent fields can be
> optimized to single byte strings at setup time.

If you're going to do that, then specify stuff via the filename:
/proc/12345/hack/80basic,20pids,20uids,40argv,4tty,4stat

Not that I care for dealing with the above!

>> sgid country
>> * real killer: you think Albert would fail to produce equally
>> crappy code and equally crappy behaviour? Yeah, right.
>
> Well I think Rik and I can handle it in our tree :)

You guys can't even get BSD process selection right.

If necessary I could fix a few spots needed for setgid usage.
I'd rather not need to do so, because then yet another chunk
of non-kernel code is making security decisions.

> * device is not network-transparent - even in principle

ROTFL. What a fantasy. You damn well know /proc isn't either.
If you can hack /proc to be exportable, you can damn well do
the same for a device file. You won't be using NFS for this.
I think Mosix already has a shared /proc anyway; an ioctl() is
a simple matter of writing a little ugly code.

> And i'd still keep environ seperate. I'm inclined to think
> ps should never have presented it in the first place.
> This is the direction i (for what it's worth) favor.

Yeah, well that's BSD compatibility for you. Printing the
environment might actually be useful if you could pick just
the fields you wanted: ps -eo pid,stat,.DISPLAY,comm

Useful? Like that notation?

> Well if we want to be gross and efficient, we could just compile
> a kmem-diving dynamic library with every kernel compile and stick
> it in /boot or somewhere. Mildly less extreme is a flat index file
> for the data you need a la System.map. Then just open /dev/kmem
> and grab what you want. Walking the tasklist with no locking would
> be an interesting challenge, but probably not insurmountable.
> That's how things like ps always used to work IIRC.

Yep, that's gross and efficient for sure. The dynamic library idea
fixes a major problem; BSD "top" is always breaking due to kernel
differences on Solaris and FreeBSD.


2002-11-05 22:41:42

by Robert Love

[permalink] [raw]
Subject: Re: ps performance sucks

On Tue, 2002-11-05 at 17:46, Rik van Riel wrote:

> On Tue, 5 Nov 2002, Albert D. Cahalan wrote:
>
> > (tough luck if you're using some other ps)
>
> Why do your procps mails always contain more references to
> procps 2 than to your own version ?
>
> What is your obsession with procps 2 ?

Because he forked procps and cannot get over it.

Robert Love

2002-11-05 22:40:24

by Rik van Riel

[permalink] [raw]
Subject: Re: ps performance sucks

On Tue, 5 Nov 2002, Albert D. Cahalan wrote:

> (tough luck if you're using some other ps)

Why do your procps mails always contain more references to
procps 2 than to your own version ?

What is your obsession with procps 2 ?

Rik
--
Bravely reimplemented by the knights who say "NIH".
http://www.surriel.com/ http://distro.conectiva.com/
Current spamtrap: <a href=mailto:"[email protected]">[email protected]</a>

2002-11-05 23:30:32

by Albert D. Cahalan

[permalink] [raw]
Subject: Re: ps performance sucks

Rik van Riel writes:
> On Tue, 5 Nov 2002, Albert D. Cahalan wrote:

>> (tough luck if you're using some other ps)
>
> Why do your procps mails always contain more references to
> procps 2 than to your own version ?
>
> What is your obsession with procps 2 ?

I'm rather sick of being blamed for problems that are not
seen in procps 3. Somebody posts about procps needing to
read 5 files per process, then somebody else makes a rude
comment about me... never minding that the procps 3 code
doesn't have the behavior that was being complained about.

I also have to make the differences clear. Really, I hate
doing that. I've learned a harsh lesson though; failure to
advertise leads to forks. It also leads to people using
obsolete code. Some poor soul even started hacking on top,
not realizing that it was already rewritten and is improving
quickly.

Do realize that you _started_ with buggy old code. I really
wish you'd just let it die. There wasn't any need to start
hacking on that buggy old code; I take patches, even from you.


2002-11-05 23:31:17

by Werner Almesberger

[permalink] [raw]
Subject: Re: ps performance sucks

Albert D. Cahalan wrote:
> If you're going to do that, then specify stuff via the filename:
> /proc/12345/hack/80basic,20pids,20uids,40argv,4tty,4stat

Well, you'd get the numbers (sizes) from the kernel, as a
response. Of course, you could define the interface such that
the query (after all, that's what it is) contains the full
field name plus size information, and the kernel just says
"EINVAL" if it doesn't like it, but then you lose some
flexibility. Might not be a big deal, though.

Yeah, perhaps it's actually better to avoid being overly
clever. How frequently are ps and friends hit by the removal
of fields or size changes anyway ?

Oh, BTW, it would be more like /proc/hack/<query>, so you do
all PIDs in one sweep.

> Not that I care for dealing with the above!

Well, that's what programs are for :-)

- Werner

--
_________________________________________________________________________
/ Werner Almesberger, Buenos Aires, Argentina [email protected] /
/_http://www.almesberger.net/____________________________________________/

2002-11-06 00:04:08

by Albert D. Cahalan

[permalink] [raw]
Subject: Re: ps performance sucks

Werner Almesberger writes:
> Albert D. Cahalan wrote:

>> If you're going to do that, then specify stuff via the filename:
>> /proc/12345/hack/80basic,20pids,20uids,40argv,4tty,4stat
>
> Well, you'd get the numbers (sizes) from the kernel, as a
> response. Of course, you could define the interface such that
> the query (after all, that's what it is) contains the full
> field name plus size information, and the kernel just says
> "EINVAL" if it doesn't like it, but then you lose some
> flexibility. Might not be a big deal, though.

I was thinking "80basic" would ask for the first 0x80 words
of basic info. If there's less, zero-fill. If there's more,
truncate the struct. Then "20pids" asks for the first 0x20
words of pid info (pid, ppid, sess, pgid...) and so on.

It's saying "give me 0x80 words of struct basic, followed
by 0x20 words of struct pids..." so that there isn't too
much version trouble.

Note: not expressing either approval or condemnation for
the general idea or for any specific implementation

> Yeah, perhaps it's actually better to avoid being overly
> clever. How frequently are ps and friends hit by the removal
> of fields or size changes anyway ?

Removal is a killer. It hit back in the Linux 1.3.xx days
when /proc/meminfo briefly had the current format. It hit
again just recently, when data was removed from /proc/stat
without even a transition period.

Size changes usually don't hurt, because most people are
satisfied with the old limits. If there is to be a binary
kernel interface, it damn well better use 64-bit values
for most everything.

Name changes are nasty, and are the reason I hate the
status file. Is it "SigCgt" or SigCat" in that file?
The answer depends on your kernel version...

> Oh, BTW, it would be more like /proc/hack/<query>, so you do
> all PIDs in one sweep.

That's nice, until you exceed the amount of memory available.
Right now, a "ps" without sorting can work even if there isn't
enough physical memory or address space for ps to hold info about
every process. Using a snapshot interface would cause ps to fail
under some heavy load conditions that it currently survives.

Hey, if reiserfs can have a database query syscall... >:-)
open("/proc/SELECT PID,TTY,TIME,CMD FROM PS WHERE RUID=42",O_RDONLY)
Somebody check if Al Viro needs a defibrilator. On second thought...


2002-11-06 01:23:51

by Werner Almesberger

[permalink] [raw]
Subject: Re: ps performance sucks

Albert D. Cahalan wrote:
> I was thinking "80basic" would ask for the first 0x80 words
> of basic info. If there's less, zero-fill. If there's more,
> truncate the struct. Then "20pids" asks for the first 0x20
> words of pid info (pid, ppid, sess, pgid...) and so on.

Argl, this has "silent failure" written all over it. No, I think
single-field granularity wouldn't incur excessive overhead: at
run time, you can trivially handle adjacent fields with a single
copy, and I don't think there are really that many practically
useful fields that setup cost (CPU or memory) would be terrible.

[ Various change horrors ]

Hmm yes, about as bad as I remember it from my psmisc days :-(

> That's nice, until you exceed the amount of memory available.

That would the the least of my concerns. If you really run out
of memory, you can always fall back to an iterative process.

> Hey, if reiserfs can have a database query syscall... >:-)
> open("/proc/SELECT PID,TTY,TIME,CMD FROM PS WHERE RUID=42",O_RDONLY)

Cute ;-) But it might be faster just to dump the whole data,
and let user space worry about picking the right entries.

- Werner

--
_________________________________________________________________________
/ Werner Almesberger, Buenos Aires, Argentina [email protected] /
/_http://www.almesberger.net/____________________________________________/

2002-11-07 17:13:31

by Bill Davidsen

[permalink] [raw]
Subject: Re: ps performance sucks

On Tue, 5 Nov 2002, Albert D. Cahalan wrote:

> > Strace it - IIRC it does 5 opens per PID. Vomit.
>
> Nope, it does 2. Perhaps you're not running procps 3 yet?
> http://procps.sf.net/
>
> Of course if you do something like "ps ev" you need all 5.

Well, since you're doing all this stuff to push your version, how about
an option to do a fast ps for most processes and only do the hard work for
processes owned by a given user. Or not owned, so everything not root
would be shown in detail, as an example. What about showing or not
threads, or showing minimal detail (fast) for threads.

There is a lot of room for options if you want to see everything but
only detail for some.

I wish competing procps could be merged, I feel as though it's something
not requiring the time of top kernel developers. If you are willing to add
features suggested by others and they are willing to push a feature list
to you maybe that could happen.

--
bill davidsen <[email protected]>
CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.

2002-11-07 20:37:37

by Albert D. Cahalan

[permalink] [raw]
Subject: Re: ps performance sucks

Bill Davidsen writes:
> On Tue, 5 Nov 2002, Albert D. Cahalan wrote:

>>> Strace it - IIRC it does 5 opens per PID. Vomit.
>>
>> Nope, it does 2. Perhaps you're not running procps 3 yet?
>> http://procps.sf.net/
>>
>> Of course if you do something like "ps ev" you need all 5.
>
> Well, since you're doing all this stuff to push your version, how about
> an option to do a fast ps for most processes and only do the hard work for
> processes owned by a given user. Or not owned, so everything not root
> would be shown in detail, as an example. What about showing or not
> threads, or showing minimal detail (fast) for threads.
>
> There is a lot of room for options if you want to see everything but
> only detail for some.

Would people use it? I risk burying users in options.
The closest things to this that I've considered are:

1. select every process that I can signal (including by TTY)
2. expand the selection with all ancestor processes up to init
3. expand the selection with all descendant processes

As for threads, support will come when the kernel makes it work
sanely. Right now I could make ps crudely guess what is a thread
and what is not, but that is slow and it suffers from both false
positives and false negatives. I'd be in business if the kernel
would do the following:

1. group related (same memory context) tasks in the /proc output
2. supply a "more tasks follow" flag
3. supply a way to identify a task's primary memory context

Note that #3 has to be immune to UML, Wine, and Bochs playing
tricks with segment registers and alternate memory contexts.

> I wish competing procps could be merged, I feel as though it's something
> not requiring the time of top kernel developers. If you are willing to add
> features suggested by others and they are willing to push a feature list
> to you maybe that could happen.

I have difficulty understanding why somebody would want to start
hacking on code that hasn't been maintained for ages. I'm certainly
not about to throw away years worth of bug fixes. I suspect there was
a failure to realize how much Craig and I had done over the years.
Then Jim Warner (new top author) and I (ps, skill, snice, half of
libproc, and now much of free and vmstat) were blown off for reasons
I can't figure. In spite of this, I would gladly consider patches
from Rik van Riel and Robert M. Love, neither of which had even
touched the procps source code until just recently. I try to keep
this civil while making it clear who has the continuously maintained
source tree with original authors still actively participating.

Oh well.

Are you a vmstat user? Suggestions are needed; it's getting a rewrite.
I may even change the default format, assuming people don't all
have scripts that parse the output. How do you like this?

procs ------------memory----------- ---swap-- ----io--- --system-- ----cpu----
r b swpd free buff cache act !act si so bi bo in cs us sy id wa
0 0 304k 14m 2.5m 27m 16m 23m 0 0 0 0 33 4 0 0 90 9
0 0 304k 14m 2.5m 27m 16m 23m 0 0 0 0 114 12 1 0 88 11
0 0 304k 14m 2.5m 27m 16m 23m 0 0 0 0 104 6 0 1 91 8

Let me know if any of that is junk, or if there is something you'd add.
Adding stuff means removing stuff, since blowing past 80 columns ins't
OK for most users. For ideas, see: /proc/vmstat, /proc/meminfo, /proc/stat

In case you happen to know where they are, I'm looking for these:

pages reclaimed
minor faults
COW faults
zero-page faults
anticipated short-term memory shortfall
pages freed
pages scanned by page-replacement algorithm
clock cycles by page replacement algorithm
number of system calls
number of forks (fork, vfork, & clone) and execs

This would be easy if every OS used the same terminology,
had the same stats, and had proper documentation.

2002-11-08 21:00:19

by Bill Davidsen

[permalink] [raw]
Subject: Re: ps performance sucks

Script started on Fri Nov 8 15:49:56 2002
newscon02:earthquake$ vmstat2 -tkfM 10 5
MemTotal: 1551840 kB
SwapTotal: 2048248 kB
time load free buffs swap pgin pgou dk0 dk1 dk2 dk3 ipkt opkt int ctx usr sys idl i_netK o_netK
15.839 8.25 5.1 1412 49.0 3861 8588 10 243 0 0 6092 6264 8216 3408 35 37 28 5722.2 5785.1
15.842 9.26 7.2 1409 49.0 4272 3764 11 171 0 0 7707 8124 9772 3585 43 43 14 7369.6 7748.6
15.844 8.21 5.5 1412 49.0 3942 7688 11 232 0 0 6351 6281 8554 3289 30 42 28 6412.7 5618.9
15.847 7.77 6.7 1410 49.0 4932 5029 15 202 0 0 7813 7886 10072 3469 42 44 14 7639.9 7342.0
15.850 7.99 7.0 1410 49.0 4367 5976 7 211 0 0 6907 6953 9045 3467 37 38 25 6721.5 6456.4
newscon02:earthquake$ exit

Script done on Fri Nov 8 15:56:08 2002