Hi,
I found that sometimes processes disappear on some heavily used system
of mine without any logging. So I've written a patch against 2.6.18.2
which emits logging when a process emits a fatal signal.
Signed-off-by: Folkert van Heusden <[email protected]>
--- linux-2.6.18.2/kernel/signal.c 2006-11-04 02:33:58.000000000 +0100
+++ linux-2.6.18.2.new/kernel/signal.c 2006-11-17 15:59:13.000000000 +0100
@@ -706,6 +706,15 @@
struct sigqueue * q = NULL;
int ret = 0;
+ if (sig == SIGQUIT || sig == SIGILL || sig == SIGTRAP ||
+ sig == SIGABRT || sig == SIGBUS || sig == SIGFPE ||
+ sig == SIGSEGV || sig == SIGXCPU || sig == SIGXFSZ ||
+ sig == SIGSYS || sig == SIGSTKFLT)
+ {
+ printk(KERN_WARNING "Sig %d send to %d owned by %d.%d (%s)\n",
+ sig, t -> pid, t -> uid, t -> gid, t -> comm);
+ }
+
/*
* fast-pathed signals for kernel-internal things like SIGSTOP
* or SIGKILL.
Folkert van Heusden
http://www.vanheusden.com/multitail - multitail is tail on steroids. multiple
windows, filtering, coloring, anything you can think of
----------------------------------------------------------------------
Phone: +31-6-41278122, PGP-key: 1F28D8AE, http://www.vanheusden.com
On 2006-11-18, Folkert van Heusden wrote:
> Hi,
>
> I found that sometimes processes disappear on some heavily used system
> of mine without any logging. So I've written a patch against 2.6.18.2
> which emits logging when a process emits a fatal signal.
Why not to patch default signal handlers in glibc, to have not only
stderr, but syslog, or /dev/kmsg copy of fatal messages?
> Signed-off-by: Folkert van Heusden <[email protected]>
>
> --- linux-2.6.18.2/kernel/signal.c 2006-11-04 02:33:58.000000000 +0100
> +++ linux-2.6.18.2.new/kernel/signal.c 2006-11-17 15:59:13.000000000 +0100
> @@ -706,6 +706,15 @@
> struct sigqueue * q = NULL;
> int ret = 0;
>
> + if (sig == SIGQUIT || sig == SIGILL || sig == SIGTRAP ||
> + sig == SIGABRT || sig == SIGBUS || sig == SIGFPE ||
> + sig == SIGSEGV || sig == SIGXCPU || sig == SIGXFSZ ||
> + sig == SIGSYS || sig == SIGSTKFLT)
> + {
> + printk(KERN_WARNING "Sig %d send to %d owned by %d.%d (%s)\n",
> + sig, t -> pid, t -> uid, t -> gid, t -> comm);
> + }
> +
> /*
> * fast-pathed signals for kernel-internal things like SIGSTOP
> * or SIGKILL.
>
>
> Folkert van Heusden
>
> http://www.vanheusden.com/multitail - multitail is tail on steroids. multiple
> windows, filtering, coloring, anything you can think of
> ----------------------------------------------------------------------
> Phone: +31-6-41278122, PGP-key: 1F28D8AE, http://www.vanheusden.com
Hi,
> > I found that sometimes processes disappear on some heavily used system
> > of mine without any logging. So I've written a patch against 2.6.18.2
> > which emits logging when a process emits a fatal signal.
> Why not to patch default signal handlers in glibc, to have not only
> stderr, but syslog, or /dev/kmsg copy of fatal messages?
Afaik when a proces gets shot because of a segfault, also the libraries
it used are shot so to say. iirc some of the more fatal signals are
handled directly by the kernel.
Folkert van Heusden
--
http://www.biglumber.com <- site where one can exchange PGP key signatures
----------------------------------------------------------------------
Phone: +31-6-41278122, PGP-key: 1F28D8AE, http://www.vanheusden.com
> > > I found that sometimes processes disappear on some heavily used system
> > > of mine without any logging. So I've written a patch against 2.6.18.2
> > > which emits logging when a process emits a fatal signal.
> > Why not to patch default signal handlers in glibc, to have not only
> > stderr, but syslog, or /dev/kmsg copy of fatal messages?
> Afaik when a proces gets shot because of a segfault, also the libraries
> it used are shot so to say. iirc some of the more fatal signals are
> handled directly by the kernel.
Also: what about statically build programs?
Folkert van Heusden
--
Feeling generous? -> http://www.vanheusden.com/wishlist.php
----------------------------------------------------------------------
Phone: +31-6-41278122, PGP-key: 1F28D8AE, http://www.vanheusden.com
On Sat, Nov 18, 2006 at 03:04:13AM +0100, Folkert van Heusden wrote:
> > > > I found that sometimes processes disappear on some heavily used system
> > > > of mine without any logging. So I've written a patch against 2.6.18.2
> > > > which emits logging when a process emits a fatal signal.
> > > Why not to patch default signal handlers in glibc, to have not only
> > > stderr, but syslog, or /dev/kmsg copy of fatal messages?
> > Afaik when a proces gets shot because of a segfault, also the libraries
> > it used are shot so to say. iirc some of the more fatal signals are
> > handled directly by the kernel.
Kernel sends signals, no doubt.
Then, who you think prints that "Killed" or "Segmentation fault"
messages in *stderr*?
[Hint: libc's default signal handler (man 2 signal).]
> Also: what about statically build programs?
"-lc" embeds libc in static binary, no?
IMHO it's not a lkml issue. Here are many who would say you, that userspace
preblems are userspace problems.
____
In article <[email protected]>,
Oleg Verych <[email protected]> wrote:
>On Sat, Nov 18, 2006 at 03:04:13AM +0100, Folkert van Heusden wrote:
>> > > > I found that sometimes processes disappear on some heavily used system
>> > > > of mine without any logging. So I've written a patch against 2.6.18.2
>> > > > which emits logging when a process emits a fatal signal.
>> > > Why not to patch default signal handlers in glibc, to have not only
>> > > stderr, but syslog, or /dev/kmsg copy of fatal messages?
>> > Afaik when a proces gets shot because of a segfault, also the libraries
>> > it used are shot so to say. iirc some of the more fatal signals are
>> > handled directly by the kernel.
>
>Kernel sends signals, no doubt.
>
>Then, who you think prints that "Killed" or "Segmentation fault"
>messages in *stderr*?
>[Hint: libc's default signal handler (man 2 signal).]
There is no such thing as a "libc default signal handler".
[Hint: waitpid (man 2 waitpid).]
Mike.
On Sat, 18 Nov 2006 02:09:46 +0100, Folkert van Heusden wrote:
>I found that sometimes processes disappear on some heavily used system
>of mine without any logging. So I've written a patch against 2.6.18.2
>which emits logging when a process emits a fatal signal.
>
>Signed-off-by: Folkert van Heusden <[email protected]>
>
>--- linux-2.6.18.2/kernel/signal.c 2006-11-04 02:33:58.000000000 +0100
>+++ linux-2.6.18.2.new/kernel/signal.c 2006-11-17 15:59:13.000000000 +0100
>@@ -706,6 +706,15 @@
> struct sigqueue * q = NULL;
> int ret = 0;
>
>+ if (sig == SIGQUIT || sig == SIGILL || sig == SIGTRAP ||
>+ sig == SIGABRT || sig == SIGBUS || sig == SIGFPE ||
>+ sig == SIGSEGV || sig == SIGXCPU || sig == SIGXFSZ ||
>+ sig == SIGSYS || sig == SIGSTKFLT)
>+ {
>+ printk(KERN_WARNING "Sig %d send to %d owned by %d.%d (%s)\n",
>+ sig, t -> pid, t -> uid, t -> gid, t -> comm);
>+ }
>+
> /*
> * fast-pathed signals for kernel-internal things like SIGSTOP
> * or SIGKILL.
NAK.
1. It lets any user DOS the system.
2. Your definition of "fatal" signals is wrong. Several of these
signals can be caught, and user-space sometimes does that for
good reason. FPE and SEGV are definitely often caught, BUS and
ILL may be caught, and I suspect TRAP to be common in debugging
sessions.
3. It puts policy in the kernel. Policy decisions belong in user-space.
In this case, the policy decision is whether this type of logging
is desired for a given process or not.
4. If this is about detecting the loss of specific processes
(network services say), then the problem can be solved in
user-space by using a separate monitor process, or by
controlling the processes via ptrace.
At a minimum, the logging needs to be conditionalised on a
per-process setting, and it should also be delayed until the
point where a signal causes a process to be killed.
/Mikael
> 4. If this is about detecting the loss of specific processes
> (network services say), then the problem can be solved in
> user-space by using a separate monitor process, or by
> controlling the processes via ptrace.
No not only for specific processes. It helps you detect problems with
processes you dind't know they have bugs and flakey hardware (sig 11).
Folkert van Heusden
--
Ever wonder what is out there? Any alien races? Then please support
the seti@home project: setiathome.ssl.berkeley.edu
----------------------------------------------------------------------
Phone: +31-6-41278122, PGP-key: 1F28D8AE, http://www.vanheusden.com
Nice to meet you, Miquel!
On 2006-11-18, Miquel van Smoorenburg wrote:
> In article <[email protected]>,
> Oleg Verych <[email protected]> wrote:
>>On Sat, Nov 18, 2006 at 03:04:13AM +0100, Folkert van Heusden wrote:
>>> > > > I found that sometimes processes disappear on some heavily used system
>>> > > > of mine without any logging. So I've written a patch against 2.6.18.2
>>> > > > which emits logging when a process emits a fatal signal.
>>> > > Why not to patch default signal handlers in glibc, to have not only
>>> > > stderr, but syslog, or /dev/kmsg copy of fatal messages?
>>> > Afaik when a proces gets shot because of a segfault, also the libraries
>>> > it used are shot so to say. iirc some of the more fatal signals are
>>> > handled directly by the kernel.
>>
>>Kernel sends signals, no doubt.
>>
>>Then, who you think prints that "Killed" or "Segmentation fault"
>>messages in *stderr*?
>>[Hint: libc's default signal handler (man 2 signal).]
>
> There is no such thing as a "libc default signal handler".
By that i mean SIG_DFL, even if that means signal masks,
shell/debuger/tracer/lib/whatever installed *actual* functions.
Maybe there isn't one for actual patching (if someone really wants to
patch something ;). One may add, just like in libSegFault.so.
There are many in-userspace solutions, that problem isn't kernel's one.
> [Hint: waitpid (man 2 waitpid).]
Thanks.
--
-o--=O`C info emacs : not found /. .\ (is there any reason to live?)
#oo'L O info make : not found o ( R.I.P )
<___=E M man gcc : not found .-- ( Debian Operating System )
On Nov 18 2006 02:38, Oleg Verych wrote:
>On Sat, Nov 18, 2006 at 03:04:13AM +0100, Folkert van Heusden wrote:
>> > > > I found that sometimes processes disappear on some heavily used system
>> > > > of mine without any logging. So I've written a patch against 2.6.18.2
>> > > > which emits logging when a process emits a fatal signal.
>> > > Why not to patch default signal handlers in glibc, to have not only
>> > > stderr, but syslog, or /dev/kmsg copy of fatal messages?
>> > Afaik when a proces gets shot because of a segfault, also the libraries
>> > it used are shot so to say. iirc some of the more fatal signals are
>> > handled directly by the kernel.
>
>Kernel sends signals, no doubt.
>
>Then, who you think prints that "Killed" or "Segmentation fault"
>messages in *stderr*?
>[Hint: libc's default signal handler (man 2 signal).]
Please enlighten us on how you plan to catch the uncatchable SIGKILL.
-`J'
--
>> 4. If this is about detecting the loss of specific processes
>> (network services say), then the problem can be solved in
>> user-space by using a separate monitor process, or by
>> controlling the processes via ptrace.
>
>No not only for specific processes. It helps you detect problems with
>processes you dind't know they have bugs and flakey hardware (sig 11).
Write an LSM module that hooks ->task_kill. It's probably the most
beautiful and non-intrusive solution in the set of possible solutions.
-`J'
--
On Sat, Nov 18, 2006 at 08:30:02PM +0100, Jan Engelhardt wrote:
>
> On Nov 18 2006 02:38, Oleg Verych wrote:
> >On Sat, Nov 18, 2006 at 03:04:13AM +0100, Folkert van Heusden wrote:
> >> > > > I found that sometimes processes disappear on some heavily used system
> >> > > > of mine without any logging. So I've written a patch against 2.6.18.2
> >> > > > which emits logging when a process emits a fatal signal.
> >> > > Why not to patch default signal handlers in glibc, to have not only
> >> > > stderr, but syslog, or /dev/kmsg copy of fatal messages?
> >> > Afaik when a proces gets shot because of a segfault, also the libraries
> >> > it used are shot so to say. iirc some of the more fatal signals are
> >> > handled directly by the kernel.
> >
> >Kernel sends signals, no doubt.
> >
> >Then, who you think prints that "Killed" or "Segmentation fault"
> >messages in *stderr*?
> >[Hint: libc's default signal handler (man 2 signal).]
>
>
> Please enlighten us on how you plan to catch the uncatchable SIGKILL.
Here's question of getting information. Collecting information is
possible by `waitpid()' from parent process as Miquel noted.
That man above, gave me impression, that SIG_DFL can not be changed in
case of KILL and STOP signals, what yields to "The signals SIGKILL and
SIGSTOP cannot be caught or ignored." Implementation of such no-action
can be different. In case if kernel just stops processing of task with
STOP, breaks with KILL, without giving a chance to flush any pending data
OK, if this is an assembler prorgam with just data segment and no
infrastructure at all. But i think (didn't read anything), it is bad, if
there's libc with standard stream I/O buffers and no callback is possible.
>
> -`J'
> --
On Nov 18 2006 21:51, Oleg Verych wrote:
>On Sat, Nov 18, 2006 at 08:30:02PM +0100, Jan Engelhardt wrote:
>> >Then, who you think prints that "Killed" or "Segmentation fault"
>> >messages in *stderr*?
>> >[Hint: libc's default signal handler (man 2 signal).]
>>
>> Please enlighten us on how you plan to catch the uncatchable SIGKILL.
>
>Here's question of getting information. Collecting information is
>possible by `waitpid()' from parent process as Miquel noted.
Yes, that is true. However that would involve adding support for This
Situation to the parent process. Which is where it becomes tricky. Patch
/sbin/init, in case the daemon runs like everything else. Or patch
xinetd, in case it is run from within that. Or, ...
The 'problem' with the waitpid solution is that you would need to
patch every possible parent that may become the owner of The Sigkilled
Target.
>That man above, gave me impression, that SIG_DFL can not be changed in
>case of KILL and STOP signals, what yields to "The signals SIGKILL and
>SIGSTOP cannot be caught or ignored." Implementation of such no-action
>can be different. In case if kernel just stops processing of task with
>STOP, breaks with KILL, without giving a chance to flush any pending data
>OK, if this is an assembler prorgam with just data segment and no
>infrastructure at all. But i think (didn't read anything), it is bad, if
>there's libc with standard stream I/O buffers and no callback is possible.
-`J'
--
On Sun, Nov 19, 2006 at 12:24:14AM +0100, Jan Engelhardt wrote:
>
> On Nov 18 2006 21:51, Oleg Verych wrote:
> >On Sat, Nov 18, 2006 at 08:30:02PM +0100, Jan Engelhardt wrote:
> >> >Then, who you think prints that "Killed" or "Segmentation fault"
> >> >messages in *stderr*?
> >> >[Hint: libc's default signal handler (man 2 signal).]
> >>
> >> Please enlighten us on how you plan to catch the uncatchable SIGKILL.
> >
> >Here's question of getting information. Collecting information is
> >possible by `waitpid()' from parent process as Miquel noted.
>
> Yes, that is true. However that would involve adding support for This
> Situation to the parent process. Which is where it becomes tricky. Patch
> /sbin/init, in case the daemon runs like everything else. Or patch
> xinetd, in case it is run from within that. Or, ...
> The 'problem' with the waitpid solution is that you would need to
> patch every possible parent that may become the owner of The Sigkilled
> Target.
I think this is pure userspace admin issue, one wrapper shell script
for not programmers.
I'm not sure about init, you've told. For example, in Debian daemons are
run by start-stop-daemon function in LSB package. And in proposed LSB
standard <http://refspecs.freestandards.org/LSB_3.1.0/LSB-Core-generic/>
portable start_daemon, killproc, pidofproc funcions are described.
____
On Nov 19 2006 07:13, Oleg Verych wrote:
>On Sun, Nov 19, 2006 at 12:24:14AM +0100, Jan Engelhardt wrote:
>> On Nov 18 2006 21:51, Oleg Verych wrote:
>> >On Sat, Nov 18, 2006 at 08:30:02PM +0100, Jan Engelhardt wrote:
>> >> >Then, who you think prints that "Killed" or "Segmentation fault"
>> >> >messages in *stderr*?
>> >> >[Hint: libc's default signal handler (man 2 signal).]
>> >>
>> >> Please enlighten us on how you plan to catch the uncatchable SIGKILL.
>> >
>> >Here's question of getting information. Collecting information is
>> >possible by `waitpid()' from parent process as Miquel noted.
>>
>> Yes, that is true. However that would involve adding support for This
>> Situation to the parent process. Which is where it becomes tricky. Patch
>> /sbin/init, in case the daemon runs like everything else. Or patch
>> xinetd, in case it is run from within that. Or, ...
>> The 'problem' with the waitpid solution is that you would need to
>> patch every possible parent that may become the owner of The Sigkilled
>> Target.
>
>I think this is pure userspace admin issue, one wrapper shell script
>for not programmers.
>
>I'm not sure about init, you've told. For example, in Debian daemons are
>run by start-stop-daemon function in LSB package. And in proposed LSB
>standard <http://refspecs.freestandards.org/LSB_3.1.0/LSB-Core-generic/>
>portable start_daemon, killproc, pidofproc funcions are described.
But usually the start wrapper will succeed, and the daemon will
eventually get reparented to init. At least this is the case in
LSB-compliant distros like opensuse and fedora.
-`J'
--
Folkert van Heusden <[email protected]> writes on Sat, 18 Nov 2006 03:02:
00 +0100
> Hi,
>
> > > I found that sometimes processes disappear on some heavily used system
> > > of mine without any logging. So I've written a patch against 2.6.18.2
> > > which emits logging when a process emits a fatal signal.
> > Why not to patch default signal handlers in glibc, to have not only
> > stderr, but syslog, or /dev/kmsg copy of fatal messages?
>
> Afaik when a proces gets shot because of a segfault, also the libraries
> it used are shot so to say. iirc some of the more fatal signals are
> handled directly by the kernel.
>
>
> Folkert van Heusden
This is a user-space issue, not kernel (carrying this forward, we can
say the "kernel should complain when programs have bugs").
Newer glibc has catchsegv (haven't found any documentation, but its
interesting) --
: leisner@gateway 10:28:16;catchsegv sleep 1d
*** Segmentation fault
Register dump:
EAX: fffffffc EBX: bfc302e4 ECX: 00000000 EDX: b7f0c690
ESI: bfc302e4 EDI: 00003800 EBP: bfc302f8 ESP: bfc302b8
EIP: 00c5a402 EFLAGS: 00200246
CS: 0073 DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b
Trap: 00000000 Error: 00000000 OldMask: 00000000
ESP/signal: bfc302b8 CR2: 00000000
FPUCW: ffff037f FPUSW: ffff4000 TAG: ffffffff
IPOFF: 080492e9 CSSEL: 0073 DATAOFF: bfc302bc DATASEL: 007b
ST(0) 0000 0000000000000000 ST(1) 0000 0000000000000000
ST(2) 0000 0000000000000000 ST(3) 0000 a8c0000000000000
ST(4) 0000 0000000000000000 ST(5) 0000 0000000000000000
ST(6) 0000 0000000000000000 ST(7) 0000 0000000000000000
Backtrace:
/lib/libSegFault.so[0x2e21ff]
??:0(??)[0xc5a420]
sleep[0x8048ddf]
/lib/libc.so.6(__libc_start_main+0xdc)[0xc8b7e4]
sleep[0x8048af1]
Memory map:
002e0000-002e3000 r-xp 00000000 08:15 91703 /lib/libSegFault.so
002e3000-002e4000 r-xp 00002000 08:15 91703 /lib/libSegFault.so
002e4000-002e5000 rwxp 00003000 08:15 91703 /lib/libSegFault.so
0036f000-0037a000 r-xp 00000000 08:15 244694 /lib/libgcc_s-4.1.0-20060304.so.1
0037a000-0037b000 rwxp 0000a000 08:15 244694 /lib/libgcc_s-4.1.0-20060304.so.1
003e9000-00402000 r-xp 00000000 08:15 183695 /lib/ld-2.4.so
00402000-00403000 r-xp 00018000 08:15 183695 /lib/ld-2.4.so
00403000-00404000 rwxp 00019000 08:15 183695 /lib/ld-2.4.so
00c5a000-00c5b000 r-xp 00c5a000 00:00 0 [vdso]
00c76000-00da2000 r-xp 00000000 08:15 244689 /lib/libc-2.4.so
00da2000-00da5000 r-xp 0012b000 08:15 244689 /lib/libc-2.4.so
00da5000-00da6000 rwxp 0012e000 08:15 244689 /lib/libc-2.4.so
00da6000-00da9000 rwxp 00da6000 00:00 0
08048000-0804b000 r-xp 00000000 08:18 1354883 /usr/local/gnu/coreutils-6.5/bin/sleep
0804b000-0804c000 rw-p 00003000 08:18 1354883 /usr/local/gnu/coreutils-6.5/bin/sleep
09c1b000-09c40000 rw-p 09c1b000 00:00 0 [heap]
b7f0c000-b7f0d000 rw-p b7f0c000 00:00 0
b7f31000-b7f32000 rw-p b7f31000 00:00 0
bfc1d000-bfc32000 rw-p bfc1d000 00:00 0 [stack]
Processes don't "disappear" -- the parent can track this...
Also, do you have core dumps turned on?
marty
On Mon, Nov 20, 2006 at 10:48:59PM -0500, Marty Leisner wrote:
[]
> > This is a user-space issue, not kernel (carrying this forward, we can
> say the "kernel should complain when programs have bugs").
>
> Newer glibc has catchsegv (haven't found any documentation, but its
> interesting) --
It's just LD_PRELOAD (man ld.so) and libSegFault.so, which installs
(among others) SIGSEGV handler, see "glibc/sysdeps/generic/segfault.c".
____