Mark Kettenis <[email protected]> writes:
> However, the "zombie problem" is caused by the way ptrace() interacts
> with clone()/exit()/wait(), which I consider to be a kernel bug.
[insightful analysis omitted]
I think you've hit the nail on the head, and I'm a bit frustrated that I never
noticed this problem even though I've spent quite a bit of time poring over
the code that makes ptrace work.
My limited mental abilities notwithstanding, I think this is one more reason
to ditch ptrace for a better method of process tracing/control. It's served
up to this point, but ptrace has a fundamental flaw, which is that it tries to
do a lot of interprocess signalling and interlocking in an in-band way, doing
process reparenting to try to take advantage of existing code. In the end
this seems to be resulting in an inscrutable, flaky mess.
What would a better process tracing facility be like? One key feature is
utter transparency. That is, it should be impossible for traced processes or
other processes that interact with them to be aware of whether or not tracing
is going on. This means that there should be no difference between the way a
process behaves under tracing versus how it would behave if it weren't being
traced, which is a key to faithful tracing/debugging and avoiding the
Heisenbug effect. (There does need to be some interface via which information
about tracing itself can be observed, but it should be hidden from the target
processes.)
It would also be nice to have something accessible via devices in the proc
filesystem. Maybe something like Solaris' "proc" debugging interface would be
a starting point:
http://docs.sun.com:80/ab2/coll.40.6/REFMAN4/@Ab2PageView/42351?DwebQuery=proc&Ab2Lang=C&Ab2Enc=iso-8859-1
--Mike
--
[O]ne of the features of the Internet [...] is that small groups of people can
greatly disturb large organizations. --Charles C. Mann