I have received a bug report related to the si_code field of siginfo for
SIGFPE. The FPE_ values are (unfortunately) an enumeration rather than
a bitmask, so we can't just OR them together. Unfortunately when we get
multiple unmasked exceptions at least on x86 we leave info.si_code to
__SI_FAULT, which means it is returned to userspace as zero. This
violates POSIX, which states that an si_code <= 0 is a user-generated
signal.
Looking at the code in other architectures, it looks like most of them
prioritize the faults, but still end up with __SI_FAULT|0 if none of the
expected conditions are found (which may not be possible, of course.)
Prioritizing the faults seem like the reasonable thing to do in terms of
dealing with the multiple unmasked errors problem.
I am wondering if it would make sense to notice the combination
__SI_FAULT|0 or __SI_FAULT and (short)si_code < 0 and force SI_KERNEL
into the user-space code field in the generic code. I am also wondering
if there is any possibility that there is code out there which relies on
the current, buggy behaviour.
-hpa
--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.
H. Peter Anvin writes:
> I have received a bug report related to the si_code field of siginfo for
> SIGFPE. The FPE_ values are (unfortunately) an enumeration rather than
> a bitmask, so we can't just OR them together. Unfortunately when we get
> multiple unmasked exceptions at least on x86 we leave info.si_code to
> __SI_FAULT, which means it is returned to userspace as zero. This
> violates POSIX, which states that an si_code <= 0 is a user-generated
> signal.
>
> Looking at the code in other architectures, it looks like most of them
> prioritize the faults, but still end up with __SI_FAULT|0 if none of the
> expected conditions are found (which may not be possible, of course.)
> Prioritizing the faults seem like the reasonable thing to do in terms of
> dealing with the multiple unmasked errors problem.
>
> I am wondering if it would make sense to notice the combination
> __SI_FAULT|0 or __SI_FAULT and (short)si_code < 0 and force SI_KERNEL
> into the user-space code field in the generic code. I am also wondering
> if there is any possibility that there is code out there which relies on
> the current, buggy behaviour.
The SIGFPE handlers I've written for the Erlang VM (several CPU/OS
combinations and FPU variations where applicable) do not rely on
si_code. If they need to do autopsy they look at the fault-time FPU
status word in the ucontext.
So at least Erlang won't break if you change SIGFPE's si_code :-)