I just bought a VIA C3 866 processor, and under very special
circumstances some programs (e.g. mplayer, xmms) randomly crash with
trace/breakpoint trap or segmentation fault. Otherwise the system
seems stable even under high load. Tested under various kernels
(generic i386 2.2.19, 2.4.19, and 2.4.19 compiled for the C3), with
different memory modules (some known to be good) and various video
cards and X servers, but the result is always the same.
Can this be a software fault or is the CPU faulty? Can anything other
then a CPU fault cause programs to receive SIGTRAP?
The system config is:
cpu: C3 866MHz
mb: asus cuv4x-c (via vt82c694x chipset)
The BIOS recognises the CPU as "VIA Cyrix III 866A", which is not
exactly right but almost.
Any advice is greatly appreciated!
Miklos
On Wed, Jan 15, 2003 at 10:29:01AM +0100, Miklos Szeredi wrote:
>
> I just bought a VIA C3 866 processor, and under very special
> circumstances some programs (e.g. mplayer, xmms) randomly crash with
> trace/breakpoint trap or segmentation fault. Otherwise the system
> seems stable even under high load.
Be sure that those programs aren't compiled for 686. The C3 lacks
cmov, so it'll segfault when it hits that opcode. You can confirm
this by running it under gdb, and disassembling where it segv's to.
This is still a common problem thats biting some people. The debian
folks had a broken libssl for months up until recently.
Note to userspace developers: If you're compiling something as
a 686 binary, you *NEED* to check the feature flags (in an i386
compiled program) to see if the CPU has cmov before you load 686
optimised parts of your app. This is *NOT* a kernel problem,
it is *NOT* a CPU bug. The cmov extension is optional.
VIA chose to save silicon space by not implementing it.
Gcc unfortunatly always uses cmov when compiling for 686.
Dave
--
| Dave Jones. http://www.codemonkey.org.uk
| SuSE Labs
Thanks, I'll check that out, though I'm a bit sceptical since the
crashes occur randomly not predictably, and that makes me feel it's
not because of an unimplemented instruction.
Also what about trace/breakpoint trap? Can that be also generated by
an illegal instruction?
Thanks,
Miklos
> On Wed, Jan 15, 2003 at 10:29:01AM +0100, Miklos Szeredi wrote:
> >
> > I just bought a VIA C3 866 processor, and under very special
> > circumstances some programs (e.g. mplayer, xmms) randomly crash with
> > trace/breakpoint trap or segmentation fault. Otherwise the system
> > seems stable even under high load.
>
> Be sure that those programs aren't compiled for 686. The C3 lacks
> cmov, so it'll segfault when it hits that opcode. You can confirm
> this by running it under gdb, and disassembling where it segv's to.
> This is still a common problem thats biting some people. The debian
> folks had a broken libssl for months up until recently.
On Wed, Jan 15, 2003 at 01:38:58PM +0100, Miklos Szeredi wrote:
>
> Thanks, I'll check that out, though I'm a bit sceptical since the
> crashes occur randomly not predictably, and that makes me feel it's
> not because of an unimplemented instruction.
>
> Also what about trace/breakpoint trap? Can that be also generated by
> an illegal instruction?
Hmm. My theory would explain SIGILL's, but if you're seeing others
as well, it could be something else. Check power supply is rated
high enough, cooling, (though cooling is usually less of an issue with C3s)
A run with memtest86 may also be worth trying.
Dave
--
| Dave Jones. http://www.codemonkey.org.uk
| SuSE Labs
We're seeing the same thing on a mini-ITX based system.
init is segfaulting :(( . We've never seen this on our
other non-C3 systems running the same codebase. We've instrumented
the kernel to help catch the initial problem, hopefully it will
trigger soon.
Dave, will the cmov generate a segfault or illegal instr trap (SIGILL?) ?
thanks
larry
-----Original Message-----
From: Dave Jones [mailto:[email protected]]
Sent: Wednesday, January 15, 2003 7:23 AM
To: Miklos Szeredi
Cc: [email protected]
Subject: Re: VIA C3 and random SIGTRAP or segfault
On Wed, Jan 15, 2003 at 10:29:01AM +0100, Miklos Szeredi wrote:
>
> I just bought a VIA C3 866 processor, and under very special
> circumstances some programs (e.g. mplayer, xmms) randomly crash with
> trace/breakpoint trap or segmentation fault. Otherwise the system
> seems stable even under high load.
Be sure that those programs aren't compiled for 686. The C3 lacks
cmov, so it'll segfault when it hits that opcode. You can confirm
this by running it under gdb, and disassembling where it segv's to.
This is still a common problem thats biting some people. The debian
folks had a broken libssl for months up until recently.
Note to userspace developers: If you're compiling something as
a 686 binary, you *NEED* to check the feature flags (in an i386
compiled program) to see if the CPU has cmov before you load 686
optimised parts of your app. This is *NOT* a kernel problem,
it is *NOT* a CPU bug. The cmov extension is optional.
VIA chose to save silicon space by not implementing it.
Gcc unfortunatly always uses cmov when compiling for 686.
Dave
--
| Dave Jones. http://www.codemonkey.org.uk
| SuSE Labs
Larry Sendlosky wrote:
> We're seeing the same thing on a mini-ITX based system.
> init is segfaulting :(( . We've never seen this on our
> other non-C3 systems running the same codebase. We've instrumented
> the kernel to help catch the initial problem, hopefully it will
> trigger soon.
>
> Dave, will the cmov generate a segfault or illegal instr trap (SIGILL?) ?
segfault is what I saw. Something seems to be corrupted (by a cmov
SIGILL?) and from then the app will crash in the same
(arbitrary) place until the machine is restarted. Some apps
are more susceptible than others. Note a Samuel II would work fine?
Hmm, just checking an ssh binary and associated libs that I know
crashed every so often (only in interactive mode, not with ssh -c),
I noticed that libnsl.so.1 (network services lib (part of glibc))
had cmov instructions. Other things noticed to crash were bash,
vi, php, snmpd. So I guess libnsl could be the root of our probs?
Note we built the whole system from SRPMs with the appropriate
flags for C3, but obviously these were ignored for libnsl anyway!
Also possibly related is that most problematic binaries
(php/snmpd/ssh) were linked to libcrypto.so.2 which may be relevant?
To find if a binary has CMOV instructions:
objdump --disassemble binary | grep cmov
P?draig.
> segfault is what I saw. Something seems to be corrupted (by a cmov
> SIGILL?) and from then the app will crash in the same
> (arbitrary) place until the machine is restarted. Some apps
> are more susceptible than others. Note a Samuel II would work fine?
Do you mean that after a cmov is encountered other applications will
also randomly crash? That would explain what I've been seeing.
Miklos
On Wed, Jan 15 2003, Miklos Szeredi wrote:
>
> > segfault is what I saw. Something seems to be corrupted (by a cmov
> > SIGILL?) and from then the app will crash in the same
> > (arbitrary) place until the machine is restarted. Some apps
> > are more susceptible than others. Note a Samuel II would work fine?
>
> Do you mean that after a cmov is encountered other applications will
> also randomly crash? That would explain what I've been seeing.
No, it will SIGILL immediately.
--
Jens Axboe
[email protected] wrote:
>>segfault is what I saw. Something seems to be corrupted (by a cmov
>>SIGILL?) and from then the app will crash in the same
>>(arbitrary) place until the machine is restarted. Some apps
>>are more susceptible than others. Note a Samuel II would work fine?
>
> Do you mean that after a cmov is encountered other applications will
> also randomly crash? That would explain what I've been seeing.
Well I never got SIGILL as would be expected. I got SEGFAULTs
and I'm only speculating that a CMOV was encountered.
But yes that does seem to be what's happening, the
CMOV corrupts something global to many apps, and
"every now and then" SEGFAULT.
You could quickly check your system with something like:
find /bin -perm +111 -type f |
while read bin; do
objdump --disassemble $bin 2>/dev/null |
grep -q cmov && echo "$bin has cmov"
done
P?draig.
> Well I never got SIGILL as would be expected. I got SEGFAULTs
> and I'm only speculating that a CMOV was encountered.
> But yes that does seem to be what's happening, the
> CMOV corrupts something global to many apps, and
> "every now and then" SEGFAULT.
That is exactly the behavior I'm seeing. When xmms is run by one user
under gnome it crashes after some random amount of time. Other users
or under kde xmms _never_ crashes.
> You could quickly check your system with something like:
>
> find /bin -perm +111 -type f |
> while read bin; do
> objdump --disassemble $bin 2>/dev/null |
> grep -q cmov && echo "$bin has cmov"
> done
Thanks I will check for cmovs.
Miklos
Dave Jones wrote:
> On Wed, Jan 15, 2003 at 10:29:01AM +0100, Miklos Szeredi wrote:
> >
> > I just bought a VIA C3 866 processor, and under very special
> > circumstances some programs (e.g. mplayer, xmms) randomly crash with
> > trace/breakpoint trap or segmentation fault. Otherwise the system
> > seems stable even under high load.
>
> Be sure that those programs aren't compiled for 686. The C3 lacks
> cmov, so it'll segfault when it hits that opcode. You can confirm
> this by running it under gdb, and disassembling where it segv's to.
> This is still a common problem thats biting some people. The debian
> folks had a broken libssl for months up until recently.
>
> Note to userspace developers: If you're compiling something as
> a 686 binary, you *NEED* to check the feature flags (in an i386
> compiled program) to see if the CPU has cmov before you load 686
> optimised parts of your app. This is *NOT* a kernel problem,
> it is *NOT* a CPU bug. The cmov extension is optional.
> VIA chose to save silicon space by not implementing it.
> Gcc unfortunatly always uses cmov when compiling for 686.
Why not use a CMOV in a i686-specific crt0.c?
Then programs compiled for i686 but run on i586 will SIGILL
deterministically at program start-up. It seems to me that
the major problem with SIGILL is that it occurs depending
upon the program execution flow, and thus appears indeterministic
to the user.
This doesn't solve the problem of a i386 executable calling
a i686 library, but solving that problem deterministically
requires a lot of baggage:
- compiler to produce an object file header stating CPU
features used.
- run time linker to take union of all CPU features in
object file headers and check against CPU features
returned by CPUID.
Even this isn't perfect, consider multi-processor machines
with differing CPU feature sets or applications which attempt
to implement their own run-time checking:
get_cpu_features(&feature);
if (feature.cmov && feature.somethingelse && ...)
mytask_i686();
else
mytask_i386();
This leads inevitably more flags in the object file header
to instruct the run-time linker to skip particular CPU feature
checks
gcc -c -mdisable_cpu_feature_check=cmov -o mytask.o mytask.c
SIGILL starts to look lightweight :-)
--
Glen Turner (08) 8303 3936 or +61 8 8303 3936
Australian Academic and Research Network http://www.aarnet.edu.au
On Wed, 2003-01-15 at 14:15, Larry Sendlosky wrote:
> We're seeing the same thing on a mini-ITX based system.
> init is segfaulting :(( . We've never seen this on our
> other non-C3 systems running the same codebase. We've instrumented
> the kernel to help catch the initial problem, hopefully it will
> trigger soon.
I run Red Hat 8.x on both EPIA and EPIA-M boards without problems.
I have seen weird crashes on EPIA boards with marginal RAM (you
need the right cas for EPIA otherwise it will die under any kind
of bus mastering)