Linux logs almost everything, why not exceptions such as SIGSEGV in userspace which
may be very informative?
Regards
Michael
--
Powered by linux-2.6. Compiled with gcc-2.95-3 - mature and rock solid
2.4/2.6 kernel testing: ACPI PCI interrupt routing, PCI IRQ sharing, swsusp
2.6 kernel testing: PCMCIA yenta_socket, Suspend to RAM with ACPI S1-S3
More info on swsusp: http://sourceforge.net/projects/swsusp/
On Sun, 17 Aug 2003 04:10:30 +0800, Michael Frank <[email protected]> said:
> Linux logs almost everything, why not exceptions such as SIGSEGV in userspace which
> may be very informative?
Consider this code:
char *foo = 0;
sigset(SIGSEGV,SIG_IGNORE);
for(;;) { *foo = '\5'; }
Your logfiles just got DoS'ed....
(Your syslog will just print 'last message repeated 11934 times'? OK, put two
different signals in the loop.. ;)
And yes, I've worked on systems that will log SEGV... and the logs get ugly.
[email protected] wrote:
> On Sun, 17 Aug 2003 04:10:30 +0800, Michael Frank <[email protected]> said:
> > Linux logs almost everything, why not exceptions such as SIGSEGV
> > in userspace which may be very informative?
>
> [SIG_IGN]
Presumably only SIGSEGVs which kill a process would be logged...
Some programs actually _use_ SIGSEGV in a useful way, to manage memory.
Same for SIGBUS and other signals. It would be wrong to log them.
-- Jamie
[email protected] wrote:
> Consider this code:
>
> char *foo = 0;
> sigset(SIGSEGV,SIG_IGNORE);
> for(;;) { *foo = '\5'; }
>
> Your logfiles just got DoS'ed....
Why not then just log uncaught exceptions?
"David D. Hagood" <[email protected]> writes:
> [email protected] wrote:
>
> > Consider this code:
> > char *foo = 0;
> > sigset(SIGSEGV,SIG_IGNORE);
> > for(;;) { *foo = '\5'; }
> > Your logfiles just got DoS'ed....
>
>
> Why not then just log uncaught exceptions?
You can still DoS by forking repeatedly and having the child die with
SEGV...
-Doug
* Doug McNaught ([email protected]) wrote:
>
> You can still DoS by forking repeatedly and having the child die with
> SEGV...
>
True - but there are a million and one ways of similarly DoSing a box;
an option to log programs terminated by a bad uncaught signal would seem
useful to me; at least one of which could be as a sign of someone trying
out buffer overrun type attacks on a system; and just as a diagnostic
to catch apps that go pop when you aren't expecting them too.
(I seem to remember ARM Linux actually does log user exceptions and very
useful it is too).
Dave
-----Open up your eyes, open up your mind, open up your code -------
/ Dr. David Alan Gilbert | Running GNU/Linux on Alpha,68K| Happy \
\ gro.gilbert @ treblig.org | MIPS,x86,ARM,SPARC,PPC & HPPA | In Hex /
\ _________________________|_____ http://www.treblig.org |_______/
On Sul, 2003-08-17 at 00:06, David D. Hagood wrote:
> > Your logfiles just got DoS'ed....
>
>
> Why not then just log uncaught exceptions?
man acct
On Sun, Aug 17, 2003 at 12:41:04AM +0100, Dr. David Alan Gilbert wrote:
> (I seem to remember ARM Linux actually does log user exceptions and very
> useful it is too).
Of course, this is configurable via a debugging option...
--
Russell King ([email protected]) The developer of ARM Linux
http://www.arm.linux.org.uk/personal/aboutme.html
Doug McNaught wrote:
>
> You can still DoS by forking repeatedly and having the child die with
> SEGV...
>
We had a problems with synchronization between CPU and memory.
But the problem was showing up us random crashes of applications with
SIGSEGV and (rarely) SIGILL.
But still to prove bug is not in Linux kernel and not in software we
have killed three weeks, just to find out that Motorola has forgotten to
publish one errata for their CPU.
Probably to have an option to log this kind of signals would be
useful. Because just blindly killing applications - is not correct too.
I will vote for 'if unhandled -> log it' ;-)
Alan Cox wrote:
> On Sul, 2003-08-17 at 00:06, David D. Hagood wrote:
>
>>>Your logfiles just got DoS'ed....
>>
>>
>>Why not then just log uncaught exceptions?
>
>
> man acct
>
Sorry, probably I'm missing smth. man acct(2) says:
"DESCRIPTION
"When called with the name of an existing file as argument,
ccounting is turned on, records for each terminating process are
appended to filename as it terminates. An argument of NULL causes
accounting to be turned off".
I do not see how it relates to abends.
It logs _everything_, what is not that useful. Having some kind of
filter what to log - whould be just great. Or alternatively ability to
pass file descriptor - not file name.
And this mysterious NOTES:
"No accounting is produced for programs running when a crash
occurs. In particular, nonterminating processes are never accounted for".
Sounds like acct() does reverse? No crashes are logged.
Or it is about Linux crash?
On Sul, 2003-08-17 at 13:52, Ihar 'Philips' Filipau wrote:
> "When called with the name of an existing file as argument,
> ccounting is turned on, records for each terminating process are
> appended to filename as it terminates. An argument of NULL causes
> accounting to be turned off".
>
> I do not see how it relates to abends.
> It logs _everything_, what is not that useful. Having some kind of
> filter what to log - whould be just great. Or alternatively ability to
> pass file descriptor - not file name.
It generates a small record for each exit, its trivial to parse the exit
codes for exits caused by an exception.
> Sounds like acct() does reverse? No crashes are logged.
> Or it is about Linux crash?
Linux crash
Alan Cox wrote:
>>
>> I do not see how it relates to abends.
>> It logs _everything_, what is not that useful. Having some kind of
>>filter what to log - whould be just great. Or alternatively ability to
>>pass file descriptor - not file name.
>
>
> It generates a small record for each exit, its trivial to parse the exit
> codes for exits caused by an exception.
>
Silly question. Related.
Is it possible to make kernel to print oops when SIGSEGV/SIGILL is
registered, but execution was in kernel space?
I'm not sure about current status - but this /feature/ was advertised
for Linux kernels: when we have a crash in kernel space e.g. in sytem
call that calling user space application which will crash. And no notice
about the fact, that it was actually crash inside of Linux kernel.
Am I right or am I wrong?
On Sat, Aug 16, 2003 at 06:06:34PM -0500, David D. Hagood wrote:
> [email protected] wrote:
>
> >Consider this code:
> >
> > char *foo = 0;
> > sigset(SIGSEGV,SIG_IGNORE);
> > for(;;) { *foo = '\5'; }
> >
> >Your logfiles just got DoS'ed....
...
Consider this code:
for (;;) syslog(LOG_INFO, "root, hurt me please!");
My point being, that if a user wishes to spam the syslog he can.
Please read the syslogd man page - see under "SECURITY THREATS".
Especially option 5 in that section:
----------------
5. Use step 4 and if the problem persists and is not secondary to a rogue
program/daemon get a 3.5 ft (approx. 1 meter) length of sucker rod* and
have a chat with the user in question.
Sucker rod def. -- 3/4, 7/8 or 1in. hardened steel rod, male threaded
on each end. Primary use in the oil industry in Western North Dakota
and other locations to pump 'suck' oil from oil wells. Secondary uses
are for the construction of cattle feed lots and for dealing with the
occasional recalcitrant or belligerent individual.
----------------
--
................................................................
: [email protected] : And I see the elder races, :
:.........................: putrid forms of man :
: Jakob ?stergaard : See him rise and claim the earth, :
: OZ9ABN : his downfall is at hand. :
:.........................:............{Konkhra}...............:
On 2003-08-16, Michael Frank <mhf () linuxmail ! org> wrote:
> Linux logs almost everything, why not exceptions such as SIGSEGV in
> userspace which may be very informative?
If you really want this, patches to do so have been in hap-linux (2.2.x)
for a while, and they were picked up by grsecurity (2.4). We both log
SIGSEGV, SIGBUS, SIGABRT, SIGILL currently; you could add others if you
desired. The logging is rate-limited to reduce log-flood opportunities
(though as others have mentioned it's quite easy to flood logs through
other means).
--
Hank Leininger <[email protected]>
Thank you all for your valuable input,
I was chasing some data corruption testing swsusp.
This simple patch met my immediate needs (against 2.4.22-rc1)
diff -uN kernel/signal.c.orig kernel/signal.c
--- kernel/signal.c.orig 2003-08-16 22:08:57.000000000 +0800
+++ kernel/signal.c 2003-08-17 06:21:49.000000000 +0800
@@ -536,6 +536,11 @@
int ret;
+#ifdef CONFIG_SOFTWARE_SUSPEND_DEBUG
+ if (sig == 11 || sig == 13)
+ printk("Signal: %d\n",sig);
+#endif
+
#if DEBUG_SIG
printk("SIG queue (%s:%d): %d ", t->comm, t->pid, sig);
#endif
> ----------------
> 5. Use step 4 and if the problem persists and is not secondary to a
> rogue program/daemon get a 3.5 ft (approx. 1 meter) length of sucker rod*
> and have a chat with the user in question.
As to security concerns, I feel this being the appropriate approach ;)
Regards
Michael
--
Powered by linux-2.6. Compiled with gcc-2.95-3 - mature and rock solid
2.4/2.6 kernel testing: ACPI PCI interrupt routing, PCI IRQ sharing, swsusp
2.6 kernel testing: PCMCIA yenta_socket, Suspend to RAM with ACPI S1-S3
More info on swsusp: http://sourceforge.net/projects/swsusp/
> [email protected] wrote:
> > Consider this code:
> >
> > char *foo = 0;
> > sigset(SIGSEGV,SIG_IGNORE);
> > for(;;) { *foo = '\5'; }
> >
> > Your logfiles just got DoS'ed....
> Why not then just log uncaught exceptions?
Because deliberately creating an uncaught exception is a perfectly sane,
reasonable thing to do with well-defined semantics. Applications should feel
free to do such reasonable things without getting complaints from the system
administrator that their log is being flooded with garbage.
There is no mechanism that is guaranteed to terminate a process other than
sending yourself an exception that is not caught. So in cases where you must
guarantee that your process terminates, it is perfectly reasonable to send
yourself a SIGILL.
FreeBSD logs any number of normal things that sane, reasonable processes do
and it's very annoying. A very annoying example is FreeBSD's desire to log
calls to 'wait' functions with 'SIGCHLD' ignored. How else can portable
programs say, "I want you to automatically reap my zombies if you can, but
otherwise, I'll reap them if needed by calling waitpid(WNOHANG) every once
in a while".
DS
David Schwartz wrote:
>>>
>>> char *foo = 0;
>>> sigset(SIGSEGV,SIG_IGNORE);
>>> for(;;) { *foo = '\5'; }
>>>
>>>Your logfiles just got DoS'ed....
>>Why not then just log uncaught exceptions?
>
> Because deliberately creating an uncaught exception is a perfectly sane,
> reasonable thing to do with well-defined semantics. Applications should feel
> free to do such reasonable things without getting complaints from the system
> administrator that their log is being flooded with garbage.
>
> There is no mechanism that is guaranteed to terminate a process other than
> sending yourself an exception that is not caught. So in cases where you must
> guarantee that your process terminates, it is perfectly reasonable to send
> yourself a SIGILL.
>
You probably have missed some postings on this thread.
This one:
----------------------------------------------
Jakob Oestergaard wrote:
> On Sat, Aug 16, 2003 at 06:06:34PM -0500, David D. Hagood wrote:
>
>>[email protected] wrote:
>>
>>
>>>Consider this code:
>>>
>>> char *foo = 0;
>>> sigset(SIGSEGV,SIG_IGNORE);
>>> for(;;) { *foo = '\5'; }
>>>
>>>Your logfiles just got DoS'ed....
>
> ...
>
> Consider this code:
> for (;;) syslog(LOG_INFO, "root, hurt me please!");
>
> My point being, that if a user wishes to spam the syslog he can.
>
> Please read the syslogd man page - see under "SECURITY THREATS".
> Especially option 5 in that section:
>
> ----------------
> 5. Use step 4 and if the problem persists and is not secondary to a rogue
> program/daemon get a 3.5 ft (approx. 1 meter) length of sucker rod* and
> have a chat with the user in question.
>
> Sucker rod def. -- 3/4, 7/8 or 1in. hardened steel rod, male threaded
> on each end. Primary use in the oil industry in Western North Dakota
> and other locations to pump 'suck' oil from oil wells. Secondary uses
> are for the construction of cattle feed lots and for dealing with the
> occasional recalcitrant or belligerent individual.
> ----------------
>
----------------------------------------------
So you can flood syslog in any way. and syslog(2) I beleive is much
faster than SIGSEGV+kernel solution in this respect ;-)))
> FreeBSD logs any number of normal things that sane, reasonable processes do
> and it's very annoying. A very annoying example is FreeBSD's desire to log
> calls to 'wait' functions with 'SIGCHLD' ignored. How else can portable
> programs say, "I want you to automatically reap my zombies if you can, but
> otherwise, I'll reap them if needed by calling waitpid(WNOHANG) every once
> in a while".
>
If application cannot be responsible for its children - it is just
bad programming practice. Fix applications.
Reapping zombies 'just in case if any' sounds really bad.
On Sun, Aug 17, 2003 at 04:10:30AM +0800, Michael Frank wrote:
> Linux logs almost everything, why not exceptions such as SIGSEGV in
> userspace which may be very informative?
Such exceptions are part of the normal operation of certain kinds of
programs, such as ones using (nowadays unusual) certain garbage
collection algorithms. I actually installed such a beast (Lisp system)
in no small part so it would exercise "invalid" memory accesses and
test various bits of VM code related to such. For other VM people
interested in it, there's an sbcl debian package that recompiles a
moderately sized chunk of Lisp code and hence runs the system at
install-time, and so exercises the SIGSEGV path rather heavily on
32-bit systems and/or systems with <= 2GB of RAM. No particular
intervention apart from (re)installing it is required to pound the
SIGSEGV path like a wild monkey, so it's actually a very convenient
touch test for such things.
-- wli
> You probably have missed some postings on this thread.
> This one:
[snip]
If you think those posts are relvent, you misunderstand my point. Those
posts are from the view "if someone wants to DoS the log files, they already
can", my argument is more, "if someone doesn't want to DoS the log files,
but you make this patch, how can they avoid it?"
> If application cannot be responsible for its children - it is just
> bad programming practice. Fix applications.
> Reapping zombies 'just in case if any' sounds really bad.
If we were to write code to detect bad programming practices and syslog
them, it would be nearly impossible to prevent syslog DoSes. Looping on
'waitpid(WNOHANG)' periodically is a perfectly sane way to reap zombies,
especially in cases where there are issues with signal handling and in
multithreaded programs.
If an application does something that a programmer could sensibly decide to
and that solves problems that can't always be solved in another way, it
should not result in a syslog entry in the default configuration (unless
it's something the system administrator needs to keep track of for
security/audit reasons).
An application should be free to terminate however it likes without
programmers getting calls from sysadmins that the application is DoSing the
syslog. If you want a special debug mode, that's fine.
DS
On Monday 18 August 2003 22:31, William Lee Irwin III wrote:
> On Sun, Aug 17, 2003 at 04:10:30AM +0800, Michael Frank wrote:
> > Linux logs almost everything, why not exceptions such as SIGSEGV in
> > userspace which may be very informative?
>
> Such exceptions are part of the normal operation of certain kinds of
> programs, such as ones using (nowadays unusual) certain garbage
> collection algorithms. I actually installed such a beast (Lisp system)
> in no small part so it would exercise "invalid" memory accesses and
> test various bits of VM code related to such. For other VM people
> interested in it, there's an sbcl debian package that recompiles a
> moderately sized chunk of Lisp code and hence runs the system at
> install-time, and so exercises the SIGSEGV path rather heavily on
> 32-bit systems and/or systems with <= 2GB of RAM. No particular
> intervention apart from (re)installing it is required to pound the
> SIGSEGV path like a wild monkey, so it's actually a very convenient
> touch test for such things.
I am thinking along the line of "Exceptions" rather than "normal events"
by specific applications.
I tend to see segfaults only when something is broken or when my lapse of
attention perhaps should be rewarded by said "sucker rod".
The current application to trap SIGSEGV when something is badly broken
can be found here:
http://marc.theaimsgroup.com/?l=swsusp-devel&m=106121712521861&w=2
Regards
Michael
--
Powered by linux-2.6. Compiled with gcc-2.95-3 - mature and rock solid
2.4/2.6 kernel testing: ACPI PCI interrupt routing, PCI IRQ sharing, swsusp
2.6 kernel testing: PCMCIA yenta_socket, Suspend to RAM with ACPI S1-S3
More info on swsusp: http://sourceforge.net/projects/swsusp/
On 2003-08-18, Michael Frank <mhf () linuxmail ! org> wrote:
> I tend to see segfaults only when something is broken or when my lapse
> of attention perhaps should be rewarded by said "sucker rod".
As others have said some apps use "interesting" signals normally. For
instance probably the most common is vmware. vmware sends itself SIGSEGV
all the time (at startup, at least) as part of its memory-management foo:
Aug 12 14:11:23 foo kernel: grsec: signal 11 sent to (vmware-ui:12180) \
UID(XXXX) EUID(XXXX), parent (vmware:17653) UID(XXXX) EUID(XXXX)
Aug 12 14:11:23 foo kernel: grsec: signal 11 sent to (vmware-mks:25238) \
UID(XXXX) EUID(XXXX), parent (vmware:17653) UID(XXXX) EUID(XXXX)
Aug 12 14:11:23 foo kernel: grsec: signal 11 sent to (vmware:17653) \
UID(XXXX) EUID(XXXX), parent (bash:2883) UID(XXXX) EUID(XXXX)
..So not *all* such cases are cause for alarm. However, if you run one of
the patches enabling logging of this, you quickly learn what's normal for
the apps you run, and can teach your log-auditing tools and/or your brain
to ignore them.
--
Hank Leininger <[email protected]>
On Mon, Aug 18, 2003 at 04:50:49PM -0400, Hank Leininger wrote:
> ..So not *all* such cases are cause for alarm. However, if you run one of
> the patches enabling logging of this, you quickly learn what's normal for
> the apps you run, and can teach your log-auditing tools and/or your brain
> to ignore them.
And why not just catch the ones sent from the kernel? That's the one that
is killing the program because it crashed, and that's the one the origional
poster wants logged...
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
On Mon, 18 Aug 2003, Mike Fedyk wrote:
> On Mon, Aug 18, 2003 at 04:50:49PM -0400, Hank Leininger wrote:
> > ..So not *all* such cases are cause for alarm. However, if you run one of
> > the patches enabling logging of this, you quickly learn what's normal for
> > the apps you run, and can teach your log-auditing tools and/or your brain
> > to ignore them.
>
> And why not just catch the ones sent from the kernel? That's the one that
> is killing the program because it crashed,
Well, in my case at least, because if a network-listening daemon fell
over with sigsegv, sigill, etc I most definitely wanted to know about
it. But, you certainly could make a patch to do only that; it'd be
lower impact, less contraversial but probably still not accepted into
mainline (just a guess).
> and that's the one the origional poster wants logged...
Hm, I see Thar Filipau bringing that up specifically, and it does seem
like something that ought to generate some logs. (But I thought they
should already generate oops's? Apparently not.) The OP seemed to be
concerned with any SIGSEGV and SIGILL signals, not just in-kernel ones?
Hank Leininger <[email protected]>
E407 AEF4 761E D39C D401 D4F4 22F8 EF11 861A A6F1
-----BEGIN PGP SIGNATURE-----
iD8DBQE/QUKRIvjvEYYapvERAn28AJ9ELPYOXKOfcIjvzV88BRzOfde1mACfRbOx
zngdpycDsO4FZgcrilGRMQU=
=X+3w
-----END PGP SIGNATURE-----
On Mon, Aug 18, 2003 at 05:18:09PM -0400, Hank Leininger wrote:
> Hm, I see Thar Filipau bringing that up specifically, and it does seem
> like something that ought to generate some logs. (But I thought they
> should already generate oops's? Apparently not.) The OP seemed to be
> concerned with any SIGSEGV and SIGILL signals, not just in-kernel ones?
No, the crashes are in userspace apps, not the kernel. But when they
crash they get sent a signal from the kernel. That is what needs to be
logged, not the signals an app might send to itself.
On Mon, Aug 18, 2003 at 04:50:49PM -0400, Hank Leininger wrote:
>> ..So not *all* such cases are cause for alarm. However, if you run one of
>> the patches enabling logging of this, you quickly learn what's normal for
>> the apps you run, and can teach your log-auditing tools and/or your brain
>> to ignore them.
On Mon, Aug 18, 2003 at 02:02:38PM -0700, Mike Fedyk wrote:
> And why not just catch the ones sent from the kernel? That's the one that
> is killing the program because it crashed, and that's the one the origional
> poster wants logged...
They're almost all sent by the kernel. Very few represent kill(1).
-- wli
> And why not just catch the ones sent from the kernel? That's the one that
> is killing the program because it crashed, and that's the one the
> origional
> poster wants logged...
Because sometimes a program wants to terminate. And it is perfectly legal
for a programmer who needs to terminate his program as quickly as possible
to do this:
char *j=NULL;
signal(SIGSEGV, SIG_DFL);
*j++;
This is a perfectly sensible thing for a program to do with well-defined
semantics. If a program wants to create a child every minute like this and
kill it, that's perfectly fine. We should be able to do that in the default
configuration without a sysadmin complaining that we're DoSing his syslogs.
DS
On Mon, Aug 18, 2003 at 03:39:15PM -0700, David Schwartz wrote:
>
> > And why not just catch the ones sent from the kernel? That's the one that
> > is killing the program because it crashed, and that's the one the
> > origional
> > poster wants logged...
>
> Because sometimes a program wants to terminate. And it is perfectly legal
> for a programmer who needs to terminate his program as quickly as possible
> to do this:
>
> char *j=NULL;
> signal(SIGSEGV, SIG_DFL);
> *j++;
>
> This is a perfectly sensible thing for a program to do with well-defined
> semantics. If a program wants to create a child every minute like this and
> kill it, that's perfectly fine. We should be able to do that in the default
> configuration without a sysadmin complaining that we're DoSing his syslogs.
Are you saying that a signal requested from userspace uses the same code
path as the signal sent when a process has overstepped its bounds?
Surely some flag can be set so that we know the kernel is killing it because
it did something illegal...
> On Mon, Aug 18, 2003 at 03:39:15PM -0700, David Schwartz wrote:
> > > And why not just catch the ones sent from the kernel? That's
> > > the one that
> > > is killing the program because it crashed, and that's the one the
> > > origional
> > > poster wants logged...
> > Because sometimes a program wants to terminate. And it is
> > perfectly legal
> > for a programmer who needs to terminate his program as quickly
> > as possible
> > to do this:
> > char *j=NULL;
> > signal(SIGSEGV, SIG_DFL);
> > *j++;
> > This is a perfectly sensible thing for a program to do with
> > well-defined
> > semantics. If a program wants to create a child every minute
> > like this and
> > kill it, that's perfectly fine. We should be able to do that in
> > the default
> > configuration without a sysadmin complaining that we're DoSing
> > his syslogs.
> Are you saying that a signal requested from userspace uses the same code
> path as the signal sent when a process has overstepped its bounds?
It depends what you mean by "requested".
> Surely some flag can be set so that we know the kernel is killing
> it because
> it did something illegal...
It depends what you mean by "illegal".
Dereferencing a NULL pointer deliberately to induce the kernel to kill your
process is indistinguishable from dereferencing a NULL pointer accidentally
and forcing the kernel to kill your process.
These "illegal" operations have well-defined semantics that programmers can
use and rely on. Logging every such operation changes their semantics and
breaks programs that currently work -- breaks in the sense that they will
now DoS logs and result in admin complaints.
The kernel cannot determine whether a SEGV or ILL was the result of a
deliberate attempt on the part of the programmer to create such a signal or
whether it's due to a programming error. Even an uncaught exception can be
used as a good way to terminate a process immediately (is there another
portable way to do that?).
DS
Followup to: <[email protected]>
By author: "David Schwartz" <[email protected]>
In newsgroup: linux.dev.kernel
>
> There is no mechanism that is guaranteed to terminate a process other than
> sending yourself an exception that is not caught. So in cases where you must
> guarantee that your process terminates, it is perfectly reasonable to send
> yourself a SIGILL.
>
exit(2)?
-hpa
--
<[email protected]> at work, <[email protected]> in private!
If you send me mail in HTML format I will assume it's spam.
"Unix gives you enough rope to shoot yourself in the foot."
Architectures needed: ia64 m68k mips64 ppc ppc64 s390 s390x sh v850 x86-64
On 19 August 2003 01:39, David Schwartz wrote:
> > And why not just catch the ones sent from the kernel? That's the one that
> > is killing the program because it crashed, and that's the one the
> > origional
> > poster wants logged...
>
> Because sometimes a program wants to terminate. And it is perfectly legal
> for a programmer who needs to terminate his program as quickly as possible
> to do this:
>
> char *j=NULL;
> signal(SIGSEGV, SIG_DFL);
> *j++;
>
> This is a perfectly sensible thing for a program to do with well-defined
> semantics. If a program wants to create a child every minute like this and
> kill it, that's perfectly fine. We should be able to do that in the default
> configuration without a sysadmin complaining that we're DoSing his syslogs.
I disagree. _exit(2) is the most sensible way to terminate.
Logginh kernel-induced SEGVs and ILLs are definitely a help when you hunt
daemons mysteriously crashing. This outweighs DoS hazard.
--
vda
On Tue, Aug 19, 2003 at 09:54:17AM +0300, Denis Vlasenko wrote:
> On 19 August 2003 01:39, David Schwartz wrote:
...[snip]...
> > This is a perfectly sensible thing for a program to do with well-defined
> > semantics. If a program wants to create a child every minute like this and
> > kill it, that's perfectly fine. We should be able to do that in the default
> > configuration without a sysadmin complaining that we're DoSing his syslogs.
>
> I disagree. _exit(2) is the most sensible way to terminate.
>
> Logginh kernel-induced SEGVs and ILLs are definitely a help when you hunt
> daemons mysteriously crashing. This outweighs DoS hazard.
Ok guys - we will never come to an agreement on what would be the
sensible thing to do.
For good reasons, too: the purposes and uses of the systems out there,
and the minds of the people administering them, will be as different as
anything.
This reminds me of the "core naming wars", the "vm overcommit wars", and
other "big" (in the minds of people) issues that were solved to
everyones satisfaction with an entry in /proc.
May I suggest:
/proc/sys/kernel/log_signals
Semantics: Numbers can be written to log_signals - these are signal
numbers that will cause a log entry to be written, when the given signal
is delivered. The file can be read, in which case it will list the
signal numbers that cause log entries to be written.
Examples:
]$ cat /proc/sys/kernel/log_signals
4
7
]$ echo +15 > /proc/sys/kernel/log_signals
]$ cat /proc/sys/kernel/log_signals
4
7
15
]$ echo -4 > /proc/sys/kernel/log_signals
]$ cat /proc/sys/kernel/log_signals
7
15
]$
Possible extension:
]$ echo '*' > /proc/sys/kernel/log_signals
]$ cat /proc/sys/kernel/log_signals
... lists all signals ...
]$ echo '-*' > /proc/sys/kernel/log_signals
]$ cat /proc/sys/kernel/log_signals
]$
In my oppinion it does not make sense to distinguish between signals
sent from process to process, and from kernel to process. Some garbage
collectors, for example, depend on the kernel sending the SIGSEGV and do
their own handling of that - while for many other processes that
situation indicates a problem. Better to handle that kind of thing in
user space log auditing tools.
An implementation of the above is left as an exercise for the reader :)
Comments?
--
................................................................
: [email protected] : And I see the elder races, :
:.........................: putrid forms of man :
: Jakob ?stergaard : See him rise and claim the earth, :
: OZ9ABN : his downfall is at hand. :
:.........................:............{Konkhra}...............:
On Monday 18 August 2003 21:43, H. Peter Anvin wrote:
> Followup to: <[email protected]>
> By author: "David Schwartz" <[email protected]>
> In newsgroup: linux.dev.kernel
>
> > There is no mechanism that is guaranteed to terminate a process other
> > than sending yourself an exception that is not caught. So in cases where
> > you must guarantee that your process terminates, it is perfectly
> > reasonable to send yourself a SIGILL.
>
> exit(2)?
>
> -hpa
Nope... A monitoring process must send the exit to a different thread... which
may be being directed to generate a core dump.
On Tue, 19 Aug 2003 09:54:17 +0300, Denis Vlasenko said:
> > char *j=NULL;
> > signal(SIGSEGV, SIG_DFL);
> > *j++;
> I disagree. _exit(2) is the most sensible way to terminate.
Not if you want it *dead*, *now*, with a core dump, and with minimal disruption
of program state. Sometimes (especially when trying to shoot a race condition)
you just can't run the program under gdb - and if it calls _exit() there's not much
wreckage left for gdb to look at....
> Logginh kernel-induced SEGVs and ILLs are definitely a help when you hunt
> daemons mysteriously crashing. This outweighs DoS hazard.
Well, I can *see* the fact it exited with a signal in 'lastcomm' already. If that's all
the info you're providing, it's of no help.
Now, if you figure out how to read the module's -g data and give me a line number
it died at:
kprint(DEBUG "Process %d (%s) died on signal %d at line %d of function %s", ....
but that would involve a lot of file I/O from kernelspace, soo.....
> On 19 August 2003 01:39, David Schwartz wrote:
> > > And why not just catch the ones sent from the kernel? That's
> > > the one that
> > > is killing the program because it crashed, and that's the one the
> > > origional
> > > poster wants logged...
> > Because sometimes a program wants to terminate. And it is
> > perfectly legal
> > for a programmer who needs to terminate his program as quickly
> > as possible
> > to do this:
> >
> > char *j=NULL;
> > signal(SIGSEGV, SIG_DFL);
> > *j++;
> > This is a perfectly sensible thing for a program to do with
> > well-defined
> > semantics. If a program wants to create a child every minute
> > like this and
> > kill it, that's perfectly fine. We should be able to do that in
> > the default
> > configuration without a sysadmin complaining that we're DoSing
> > his syslogs.
> I disagree. _exit(2) is the most sensible way to terminate.
Read the documentation for _exit. You will see that it is useless in the
case of a portable program that needs to terminate as quickly as possible
and, in fact, isn't guaranteed to cause program termination at all:
The function _exit is like exit(), but does not call any
functions registered with the ANSI C atexit function, nor
any registered signal handlers. Whether it flushes stan-
dard I/O buffers and removes temporary files created with
tmpfile(3) is implementation-dependent. On the other
hand, _exit does close open file descriptors, and this may
cause an unknown delay, waiting for pending output to fin-
ish. If the delay is undesired, it may be useful to call
functions like tcflush() before calling _exit(). Whether
any pending I/O is cancelled, and which pending I/O may be
cancelled upon _exit(), is implementation-dependent.
One major problem with _exit() is that it touches various structures. If
the program's execution environment is no longer trusted, calling _exit()
can cause an endless loop. In multithreaded programs, _exit() may need to
acquire mutexes. This can take an indeterminate amount of time. Portable
programs cannot rely on _exit() in a case where they need to terminate as
soon as possible.
Now, if you have a better way for a portable program that needs to
terminate immediately to do so, that's fine, tell me what it is. Otherwise,
you are *forcing* people to DoS your syslog.
DS
> > There is no mechanism that is guaranteed to terminate a
> > process other than
> > sending yourself an exception that is not caught. So in cases
> > where you must
> > guarantee that your process terminates, it is perfectly
> > reasonable to send
> > yourself a SIGILL.
> exit(2)?
And what if a registered 'atexit' function needs to acquire a mutex that is
held by a thread that's in an endless loop? What if a standard I/O stream
has buffered data for a local disk that failed? I'm looking for a mechanism
that is guaranteed to terminate a process immediately.
DS
David Schwartz wrote:
>>> There is no mechanism that is guaranteed to terminate a
>>>process other than
>>>sending yourself an exception that is not caught. So in cases
>>>where you must
>>>guarantee that your process terminates, it is perfectly
>>>reasonable to send
>>>yourself a SIGILL.
>
>
>>exit(2)?
>
>
> And what if a registered 'atexit' function needs to acquire a mutex that is
> held by a thread that's in an endless loop? What if a standard I/O stream
> has buffered data for a local disk that failed? I'm looking for a mechanism
> that is guaranteed to terminate a process immediately.
>
Correction...
_exit(2).
There is no exit(2); I was talking about _exit(2) and you're talking
about exit(3).
_exit(2) *is* guaranteed to terminate a process immediately.
-hpa
> Correction...
>
> _exit(2).
>
> There is no exit(2); I was talking about _exit(2) and you're talking
> about exit(3).
>
> _exit(2) *is* guaranteed to terminate a process immediately.
>
> -hpa
If only it was so! I have direct practical experience that under
LinuxThreads, at least, it doesn't. SuSv3 allows _exit to flush open streams
and remove temporary files. Sadly, _exit, on many systems, acquire locks or
accesses process structures that might be corrupt whereas dereferencing a
NULL pointer does not.
I have portable code that has a 'terminate this process immediately'
function. It started out calling '_exit' until we found platforms where that
resulted in a hang (say the thread calling _exit holds a non-recursive mutex
that the _exit function tries to acquire). So we kept inching our way up to
more and more extreme termination methods.
DS