2003-05-19 19:15:51

by Andi Kleen

[permalink] [raw]
Subject: [PATCH] Exception trace for i386


x86-64 had printks for user level faults for a long time. This
proved to be very useful to trace otherwise hidden faults, e.g.
on a normal kernel there is no way to see a segfault in a process
that runs in a write protected directory, even when core dumps
are enabled. Also it's useful as an early warning that something
is wrong with your system.

There was a request to port this to i386. Done with this patch.

It is turned off by default because some programs do deliberate
segfaults (e.g. lmbench), but can be enabled with a sysctl.
int 3s are not logged because they are used gdb and debugging
becomes too noisy otherwise.

As a intended side effect this also exposes the hidden exception_trace
variable that already exists in arch/x86_64 as sysctl.

Patch for 2.5.69-CVS

-Andi

Index: linux/arch/i386/kernel/traps.c
===================================================================
RCS file: /home/cvs/linux-2.5/arch/i386/kernel/traps.c,v
retrieving revision 1.54
diff -u -u -r1.54 traps.c
--- linux/arch/i386/kernel/traps.c 13 May 2003 02:56:54 -0000 1.54
+++ linux/arch/i386/kernel/traps.c 19 May 2003 18:19:33 -0000
@@ -53,6 +53,8 @@

#include "mach_traps.h"

+extern int exception_trace;
+
asmlinkage int system_call(void);
asmlinkage void lcall7(void);
asmlinkage void lcall27(void);
@@ -303,6 +305,11 @@

trap_signal: {
struct task_struct *tsk = current;
+
+ if (exception_trace && trapnr != 3)
+ printk(KERN_INFO "%s[%d] trap %s at eip:%lx esp:%lx err:%lx\n",
+ tsk->comm, tsk->pid, str, regs->eip, regs->esp, error_code);
+
tsk->thread.error_code = error_code;
tsk->thread.trap_no = trapnr;
if (info)
@@ -372,15 +379,20 @@

asmlinkage void do_general_protection(struct pt_regs * regs, long error_code)
{
+ struct task_struct *tsk = current;
if (regs->eflags & VM_MASK)
goto gp_in_vm86;

if (!(regs->xcs & 3))
goto gp_in_kernel;

- current->thread.error_code = error_code;
- current->thread.trap_no = 13;
- force_sig(SIGSEGV, current);
+ if (exception_trace)
+ printk(KERN_INFO "%s[%d] gpf at eip:%lx esp:%lx err:%lx\n",
+ tsk->comm, tsk->pid, regs->eip, regs->esp, error_code);
+
+ tsk->thread.error_code = error_code;
+ tsk->thread.trap_no = 13;
+ force_sig(SIGSEGV, tsk);
return;

gp_in_vm86:
Index: linux/arch/i386/mm/fault.c
===================================================================
RCS file: /home/cvs/linux-2.5/arch/i386/mm/fault.c,v
retrieving revision 1.27
diff -u -u -r1.27 fault.c
--- linux/arch/i386/mm/fault.c 8 May 2003 05:19:20 -0000 1.27
+++ linux/arch/i386/mm/fault.c 19 May 2003 18:19:33 -0000
@@ -57,6 +57,8 @@

asmlinkage void do_invalid_op(struct pt_regs *, unsigned long);

+int exception_trace;
+
/*
* This routine handles page faults. It determines the address,
* and the problem, and then passes it off to one of the appropriate
@@ -200,6 +202,13 @@

/* User mode accesses just cause a SIGSEGV */
if (error_code & 4) {
+
+ if (exception_trace)
+ printk(KERN_INFO
+ "%s[%d] segfault at eip:%lx esp:%lx adr:%lx err:%lx\n",
+ tsk->comm, tsk->pid, regs->eip, regs->esp, address,
+ error_code);
+
tsk->thread.cr2 = address;
tsk->thread.error_code = error_code;
tsk->thread.trap_no = 14;
Index: linux/include/linux/sysctl.h
===================================================================
RCS file: /home/cvs/linux-2.5/include/linux/sysctl.h,v
retrieving revision 1.43
diff -u -u -r1.43 sysctl.h
--- linux/include/linux/sysctl.h 12 May 2003 18:25:53 -0000 1.43
+++ linux/include/linux/sysctl.h 19 May 2003 18:19:38 -0000
@@ -130,6 +130,7 @@
KERN_PIDMAX=55, /* int: PID # limit */
KERN_CORE_PATTERN=56, /* string: pattern for core-file names */
KERN_PANIC_ON_OOPS=57, /* int: whether we will panic on an oops */
+ KERN_EXCEPTION_TRACE=58, /* int: log user traps in kernel log */
};


Index: linux/kernel/sysctl.c
===================================================================
RCS file: /home/cvs/linux-2.5/kernel/sysctl.c,v
retrieving revision 1.42
diff -u -u -r1.42 sysctl.c
--- linux/kernel/sysctl.c 12 May 2003 18:25:53 -0000 1.42
+++ linux/kernel/sysctl.c 19 May 2003 18:19:39 -0000
@@ -57,6 +57,7 @@
extern int cad_pid;
extern int pid_max;
extern int sysctl_lower_zone_protection;
+extern int exception_trace;

/* this is needed for the proc_dointvec_minmax for [fs_]overflow UID and GID */
static int maxolduid = 65535;
@@ -265,6 +266,10 @@
0600, NULL, &proc_dointvec},
{KERN_PANIC_ON_OOPS,"panic_on_oops",
&panic_on_oops,sizeof(int),0644,NULL,&proc_dointvec},
+#ifdef CONFIG_X86
+ {KERN_EXCEPTION_TRACE,"exception_trace",
+ &exception_trace,sizeof(int),0644,NULL,&proc_dointvec},
+#endif
{0}
};


2003-05-19 20:43:54

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH] Exception trace for i386

In article <20030519192814.GA975@averell>, Andi Kleen <[email protected]> wrote:
>
>x86-64 had printks for user level faults for a long time. This
>proved to be very useful to trace otherwise hidden faults, e.g.
>on a normal kernel there is no way to see a segfault in a process
>that runs in a write protected directory, even when core dumps
>are enabled. Also it's useful as an early warning that something
>is wrong with your system.
>
>There was a request to port this to i386. Done with this patch.

Please don't do it this way. For one thing, there are valid uses where
you want to enable tracing for just one process. For another, there are
actually cases where you may want to trace all page faults, even the
ones that don't cause signals - kind of like normal system calls. After
all, from a behavioural standpoint, that is what they are: implied
system calls.

So I think you want to make it per-process, and expose it as a ptrace
thing (imaging seeing all the page faults a process is taking with
"strace". Potentially quite useful for performance tuning).

I don't think it's ever really valid to expose it as a global option, as
some programs use page faults (even the signalling kind) to do their own
memory management, and making it a global option just makes it hard to
work with such programs.

Linus

2003-05-19 21:08:50

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH] Exception trace for i386

[email protected] (Linus Torvalds) writes:

> Please don't do it this way. For one thing, there are valid uses where
> you want to enable tracing for just one process. For another, there are
> actually cases where you may want to trace all page faults, even the
> ones that don't cause signals - kind of like normal system calls. After

x86-64 has this, (pagefault_trace), but it's usually hidden in a debug
CONFIG and not too useful in normal operation (because it generates too
much output), so I didn't port this. At one point I also did a sysctl
to echo a string into and strcmp the process ->comm against this,
but it was a bit ugly so I removed it again.

> all, from a behavioural standpoint, that is what they are: implied
> system calls.
>
> So I think you want to make it per-process, and expose it as a ptrace
> thing (imaging seeing all the page faults a process is taking with
> "strace". Potentially quite useful for performance tuning).

Probably, but that's not the intention of this patch. From my
experience page fault tracing is too much for printk because it
produces too much information. It needs some specialized logging
module that is optimized for high frequency logging like SGI ktrace.
It could be surely implemented, but I didn't plan to do this work
currently. If you wanted it I would suggest considering kprobes or dprobes
instead which allows to define logging macros for such things nicely already.

> I don't think it's ever really valid to expose it as a global option, as
> some programs use page faults (even the signalling kind) to do their own
> memory management, and making it a global option just makes it hard to
> work with such programs.

I especially like it being a global option. It has catched bugs on x86-64
that were never noticed before (e.g. subprocesses silently segfaulting
that nobody noticed doing the same on i386). Clearly it's an debugging
thing and you definitely want an option to turn it off. But having
the global option is useful.

Also programs that break with this are much less frequent than you
probably think.

-Andi

2003-05-19 23:09:47

by Alan

[permalink] [raw]
Subject: Re: [PATCH] Exception trace for i386

On Llu, 2003-05-19 at 21:56, Linus Torvalds wrote:
> >proved to be very useful to trace otherwise hidden faults, e.g.
> >on a normal kernel there is no way to see a segfault in a process
> >that runs in a write protected directory, even when core dumps
> >are enabled. Also it's useful as an early warning that something
> >is wrong with your system.

You can change the segfault path for core dumps via /proc

> I don't think it's ever really valid to expose it as a global option, as
> some programs use page faults (even the signalling kind) to do their own
> memory management, and making it a global option just makes it hard to
> work with such programs.

The classic being the texas objectstore which I suspect would not be
happy if printk logged each fault 8)

2003-05-20 11:56:32

by Ingo Oeser

[permalink] [raw]
Subject: Re: [PATCH] Exception trace for i386

On Mon, May 19, 2003 at 11:21:42PM +0200, Andi Kleen wrote:
> I especially like it being a global option. It has catched bugs on x86-64
> that were never noticed before (e.g. subprocesses silently segfaulting
> that nobody noticed doing the same on i386). Clearly it's an debugging
> thing and you definitely want an option to turn it off. But having
> the global option is useful.

Would you consider doing the logging only, if the process has no
real handler for that? (so it's blocking, ignoring or not caring
about this signal)

This will reveal only the bugs and not disturb the applications,
which do sth. useful with segfaults.

Regards

Ingo Oeser

2003-05-20 11:58:53

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH] Exception trace for i386

On Tue, May 20, 2003 at 01:17:17PM +0200, Ingo Oeser wrote:
> On Mon, May 19, 2003 at 11:21:42PM +0200, Andi Kleen wrote:
> > I especially like it being a global option. It has catched bugs on x86-64
> > that were never noticed before (e.g. subprocesses silently segfaulting
> > that nobody noticed doing the same on i386). Clearly it's an debugging
> > thing and you definitely want an option to turn it off. But having
> > the global option is useful.
>
> Would you consider doing the logging only, if the process has no
> real handler for that? (so it's blocking, ignoring or not caring
> about this signal)

Yes, I had this idea too and it's already implemented for x86-64.

-Andi