LinuxLists.cc - Catching SIGSEGV with signal() in 2.6

2004-04-05 15:16:08

Subject: Catching SIGSEGV with signal() in 2.6

... doesn't seem to be possible anymore.

See
http://www.openoffice.org/issues/show_bug.cgi?id=27162

Is this change intentional, or a bug?

LLaP
bero

--
Ark Linux - Linux for the masses
http://www.arklinux.org/

Redistribution and processing of this message is subject to
http://www.arklinux.org/terms.php

2004-04-05 18:17:12

by Jamie Lokier

[permalink] [raw]

Subject: Re: Catching SIGSEGV with signal() in 2.6

[email protected] wrote:
> ... doesn't seem to be possible anymore.
>
> See http://www.openoffice.org/issues/show_bug.cgi?id=27162
>
> Is this change intentional, or a bug?

On 2.6.3, x86, SIGSEGV is being caught just fine in my test program,
with the correct fault address, with or without SA_SIGINFO.

-- Jamie

2004-04-05 19:16:51

by Chris Friesen

[permalink] [raw]

Subject: Re: Catching SIGSEGV with signal() in 2.6

Jamie Lokier wrote:
> [email protected] wrote:
>
>>... doesn't seem to be possible anymore.
>>
>>See http://www.openoffice.org/issues/show_bug.cgi?id=27162
>>
>>Is this change intentional, or a bug?
>
>
> On 2.6.3, x86, SIGSEGV is being caught just fine in my test program,
> with the correct fault address, with or without SA_SIGINFO.

SA_SIGINFO implies sigaction(). The original poster was talking about
signal().

That said, it seems to work with 2.6.4 on ppc32.

Chris

2004-04-05 20:40:37

by Jamie Lokier

[permalink] [raw]

Subject: Re: Catching SIGSEGV with signal() in 2.6

Chris Friesen wrote:
> SA_SIGINFO implies sigaction(). The original poster was talking about
> signal().
>
> That said, it seems to work with 2.6.4 on ppc32.

Just tried it with 2.6.3, x86 and signal(). Works fine.

-- Jamie

2004-04-05 20:58:59

by Richard B. Johnson

[permalink] [raw]

Subject: Re: Catching SIGSEGV with signal() in 2.6

On Mon, 5 Apr 2004, Jamie Lokier wrote:

> Chris Friesen wrote:
> > SA_SIGINFO implies sigaction(). The original poster was talking about
> > signal().
> >
> > That said, it seems to work with 2.6.4 on ppc32.
>
> Just tried it with 2.6.3, x86 and signal(). Works fine.
>
> -- Jamie

Are you using a longjump to get out of the signal handler?
You may find that you can trap SIGSEGV, but you can't exit
from it because it will return to the instruction that
caused the trap!!!

#include <stdio.h>
#include <signal.h>
void handler(int sig) {
fprintf(stderr, "Caught %d\n", sig);
}
int main() {
char *foo = NULL;
signal(SIGSEGV, handler);
fprintf(stderr, "Send a signal....\n");
kill(0, SIGSEGV);
fprintf(stderr, "Okay! That worked!\n");
// *foo = 0;
return 0;
}

Just un-comment the null-pointer de-reference and watch!

Cheers,
Dick Johnson
Penguin : Linux version 2.4.24 on an i686 machine (797.90 BogoMips).
Note 96.31% of all statistics are fiction.

2004-04-05 21:14:12

by Jamie Lokier

[permalink] [raw]

Subject: Re: Catching SIGSEGV with signal() in 2.6

Richard B. Johnson wrote:
> Are you using a longjump to get out of the signal handler?
> You may find that you can trap SIGSEGV, but you can't exit
> from it because it will return to the instruction that
> caused the trap!!!

Thanks for stating the obvious! :)

No, actually I'm changing memory protection with mprotect() inside the
handler, so when it returns the program can continue.

But that's not relevant to the OpenOffice problem. They have a
program which traps SIGSEGV with 2.4 and terminates suddenly with 2.6.
Obviously they aren't just returning else it wouldn't work with 2.4.

-- Jamie

2004-04-05 21:14:26

by Chris Friesen

[permalink] [raw]

Subject: Re: Catching SIGSEGV with signal() in 2.6

2004-04-05 21:17:09

by Bernhard Rosenkraenzer

[permalink] [raw]

Subject: Re: Catching SIGSEGV with signal() in 2.6

On Mon, 5 Apr 2004, Jamie Lokier wrote:

> > See http://www.openoffice.org/issues/show_bug.cgi?id=27162
> >
> > Is this change intentional, or a bug?
>
> On 2.6.3, x86, SIGSEGV is being caught just fine in my test program,
> with the correct fault address, with or without SA_SIGINFO.

Seems to be triggered only by some segfaults -- a simpler test app than
the one in the OpenOffice bug report works here too, the OpenOffice one
crashes.

I'll try to debug it some more when I have some time, but that could take
a while (busy ATM)

LLaP
bero

--
Ark Linux - Linux for the masses
http://www.arklinux.org/

Redistribution and processing of this message is subject to
http://www.arklinux.org/terms.php

2004-04-06 00:41:00

by Kevin B. Hendricks

[permalink] [raw]

Subject: Re: Catching SIGSEGV with signal() in 2.6

Hi,

Just in case this helps, this is a simplified testcase of the OpenOffice.org
code in question that always worked under 2.4 kernels on multiple
architectures but fails on 2.6.X kernels on those same multiple platforms.

For some reason, the segfault generated by trying to write to address 0 can
not be properly caught anymore (or at least it appears that way to me).

Hope this helps.

Kevin

[kbhend@base1 solar]$ cat testcase.c
#include <stdio.h>
#include <signal.h>
#include <setjmp.h>

typedef int (*TestFunc)( void* );
static jmp_buf check_env;
static int bSignal;

void SignalHdl( int sig )
{
bSignal = 1;
longjmp( check_env, sig );
}

int check( TestFunc func, void* p )
{
int result;
bSignal = 0;
if ( !setjmp( check_env ) )
{
signal( SIGSEGV, SignalHdl );
signal( SIGBUS, SignalHdl );
result = func( p );
signal( SIGSEGV, SIG_DFL );
signal( SIGBUS, SIG_DFL );
}
if ( bSignal )
return -1;
else
return 0;
}

int GetAtAddress( void* p )
{
return *((char*)p);
}

int SetAtAddress( void* p )
{
return *((char*)p) = 0;
}

int CheckGetAccess( void* p )
{
int b;
b = -1 != check( (TestFunc)GetAtAddress, p );
return b;
}

int CheckSetAccess( void* p )
{
int b;
b = -1 != check( (TestFunc)SetAtAddress, p );
return b;
}

void InfoMemoryAccess( char* p )
{
if ( CheckGetAccess( p ) )
printf( "can read address %p\n", p );
else
printf( "can not read address %p\n", p );

if ( CheckSetAccess( p ) )
printf( "can write address %p\n", p );
else
printf( "can not write address %p\n", p );
}

int
main( int argc, char* argv[] )
{
{
char* p = NULL;
InfoMemoryAccess( p );
p = (char*)&p;
InfoMemoryAccess( p );
}
exit( 0 );
}

2004-04-06 02:04:27

by Ulrich Drepper

[permalink] [raw]

Subject: Re: Catching SIGSEGV with signal() in 2.6

Kevin B. Hendricks wrote:

> For some reason, the segfault generated by trying to write to address 0 can
> not be properly caught anymore (or at least it appears that way to me).

If the code would be correct you'd see the expected behavior.

> void SignalHdl( int sig )
> {
> bSignal = 1;
> longjmp( check_env, sig );
> }

Since you jump out of a signal handling you must use siglongmp

> int check( TestFunc func, void* p )
> {
> int result;
> bSignal = 0;
> if ( !setjmp( check_env ) )

And sigsetjmp(check_env, 1) here.

--
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖

2004-04-06 03:01:34

by Kevin B. Hendricks

[permalink] [raw]

Subject: Re: Catching SIGSEGV with signal() in 2.6

Hi,

> If the code would be correct you'd see the expected behavior.
> Since you jump out of a signal handling you must use siglongmp
> And sigsetjmp(check_env, 1) here.

So the code has been wrong since the beginning and we were just "lucky" it
worked in all pre-2.6 kernels?

I have no doubt you are right but forgiving my ignorance here, please explain
why must we use siglongjmp when longjmping out of a signal handler given that

1. before the next use of the handler we use signal again to properly set the
signal handler (and the set of masked signals).

and

2. the mask of blocked signals will include sigsegv upon entry to the signal
handler and therefore it will be masked after the normal longjmp since a
normal longjmp wil not change the set of masked symbols.

What am I missing that makes sigsetjmp and siglongjmp a requirement, or is
this just part of some specification someplace?

Kevin

2004-04-06 04:08:48

by Ulrich Drepper

[permalink] [raw]

Subject: Re: Catching SIGSEGV with signal() in 2.6

Kevin B. Hendricks wrote:

> So the code has been wrong since the beginning and we were just "lucky" it
> worked in all pre-2.6 kernels?

The old code depended on undefined behavior.

> 1. before the next use of the handler we use signal again to properly set the
> signal handler (and the set of masked signals).

Where do you set the signal mask? That's the point. You don't. This
means jumping from the signal handler causes the signal to remain
blocked. And then

~~~~
If any of the SIGFPE, SIGILL, SIGSEGV, or SIGBUS signals are generated
while they are blocked, the result is undefined, unless the signal was
generated by the kill() function, the sigqueue() function, or the
raise() function.
~~~~

(see pthread_sigmask in POSIX) comes into play.

The second SIGSEGV signal is created with the signal blocked and since
it's neither of the functions mentioned in the text below which creates
the signal anything can happen. The old kernel queued the signal, the
new kernel terminates the process which is much better IMO. Try the
attached program to see why. Also note, the 2.4 behavior is
inconsistent. If no handler is installed the process is terminated,
regardless of the signal being masked.

--
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖

Attachments:

minsig.c (200.00 B)

2004-04-06 12:02:39

by Kevin B. Hendricks

[permalink] [raw]

Subject: Re: Catching SIGSEGV with signal() in 2.6

Hi Ulrich,

> The old code depended on undefined behavior.

Thanks for explanation. It makes perfect sense. I appreciate it.

Our bad assumption was that using signal to install a signalhandler on a
specific signal unblocked that specific signal, but as you show it does not.

I will try to get a fix using sigsetjmp/siglongjmp or fork/wait into the
forthcoming OOo 1.1.2 tree so that no more "problems" are reported.

Kevin

2004-04-06 15:53:22

by Edgar Toernig

[permalink] [raw]

Subject: Re: Catching SIGSEGV with signal() in 2.6

Ulrich Drepper wrote:
>
> Kevin B. Hendricks wrote:
>
> > So the code has been wrong since the beginning and we were just "lucky" it
> > worked in all pre-2.6 kernels?
>
> The old code depended on undefined behavior.

Maybe it's simply *old* code, possibly written under libc5.
There, signal() used SA_RESETHAND which implies SA_NODEFER
which in turn did not block the signal and exiting from the
signal handler via longjmp was OK.

With the new signal() behaviour in glibc2 one may get results
undefined by POSIX but it still worked as before because the
sigprocmask was ignored for SIGSEGV under Linux <2.6.

It's the combination of new glibc2 and new kernel that makes
code like the mentioned one break.

It has nothing to do with POSIX - for POSIX all of this is
"undefined/implementation defined behaviour". I had chosen
to stay compatible...

Ciao, ET.

--
Not every program claims to be POSIX compliant (who reads
3600 pages of difficult to obtain specs?) - some are simply
Linux programs...