2001-02-07 01:08:02

by Jonathan Abbey

[permalink] [raw]
Subject: Hard system freeze in 2.2.17, 2.2.18, 2.4.1-AC3 VIA Athlon

I am having terribly frustrating system stability problems, and I
can't figure out whether I should suspect hardware or the kernel.

Software:

Any of Linux 2.2.17, 2.2.18, 2.4.1-AC3

Hardware:

Athlon Thunderbird 750mhz, running at rated speed
EPoX 8KTA2 VIA Athlon motherboard with VT82C686B Southbridge, VT8363 Northbridge

My system boots fine, and once it gets past the mandatory fsck, it
proceeds up to X just fine. I can pretty much log in, do web
browsing, play Unreal Tournament with accelerated OpenGL, burn cd's,
play music, whatever.

What I can't do is run XEmacs, either in X Windows or on a command
line window over ssh or mingetty. 9 times out of 10, as soon as I run
'xemacs', the system locks tight. No responsiveness to any keyboard
activity, no alt-SysRq, nothing. One time the system locked when I
was playing an mp3 with XMMS and after it locked the sound card kept
looping the same quarter-second of sound it had been playing when the
system locked. I have also seen what looks like this system hang
occur often when compiling a new kernel, and it has happened when my
housemate was running Netscape once, both of those under 2.2.17.

About a week ago, I decided to see about a BIOS update, and while I
went about getting my BIOS flashed, I also installed an IDE CD-RW
drive and updated my kernel to 2.2.18. All of this gave my system a
couple of hours of rest with the power off.

After I got everything back together and got the BIOS flashed,
everything seemed to work great. I built myself a 2.2.18 kernel with
the IDE-SCSI driver to support cdrecord on my new CDRW drive. For 5
days my system ran with excellent stability. I took to running
'xemacs' frequently, just to enjoy the thrill of not having to fsck my
drives.

Until a couple of nights ago. My friend the hard system freeze has
returned, with all of the old symptoms. I run xemacs, I lock, nearly
every time.

I have tried to check certain things. I set my system's BIOS up so
that it does the full POST check, including three passes over the RAM.
No problems reported at any time. I have tried running my PC133 RAM
clocked at 100mhz. I have commented out my hdparm lines in my boot
scripts. No effect.

I am really confused by this one. The fact that running xemacs can
reliably lock the system makes me think it is a kernel problem.
There's nothing about running xemacs that I would expect to be
particularly stressfull on the system. Running Unreal Tournament with
heavy 3d acceleration and sound I would expect to be much more
stressful on my system's power supply and RAM, but that's pretty safe
to do. XEmacs does do a funky unexec() thing to create its exec
image, and I imagine it does some things with pty's and the like that
many pieces of software does not.. plenty of opportunities to tickle
different parts of the kernel.

On the other hand, having the problem go away after a couple of hours
of down time for my system's components, and to have the problem come
back after five days usage and to then stay across several system
hangs and reboots makes me think it is a hardware problem.

So.. how in the world do I go about isolating this? When it hangs, it
hangs tight enough that alt-SysRq is of no use, so I can't get any
kind of kernel oops message or anything like that. The memory test
that the BIOS does seems to work fine. The video, ethernet, and sound
cards shouldn't be connected to this since I can run xemacs from
single user mode on the console and get this lock-up.

I have tried doing an strace on xemacs from the console, but the
system freeze and the resulting file system corruption and fsck on
reboot makes any snapshot of the strace output unreliable.

I have tried futzing with various BIOS settings in the hope of making
the system more stable, to no effect.

I'm wondering if there is a problem with how the kernel is interacting
with the VIA chipset, but that five day grace period really makes me
think of hardware.

--
-------------------------------------------------------------------------------
Jonathan Abbey [email protected]
Applied Research Laboratories The University of Texas at Austin
Ganymede, a GPL'ed metadirectory for UNIX http://www.arlut.utexas.edu/gash2



2001-02-07 03:25:15

by Jonathan Abbey

[permalink] [raw]
Subject: Re: Hard system freeze in 2.2.17, 2.2.18, 2.4.1-AC3 VIA Athlon

Mark Hahn wrote me and convinced me that the problem I described is a
hardware problem, probably related to heat.

Testing bears this out.

I am still mystified as to why xemacs in particular should stress the
system more than everything else, but I've got the sanity check I was
looking for, and will treat it as a hardware problem from here on
out.

| I am having terribly frustrating system stability problems, and I
| can't figure out whether I should suspect hardware or the kernel.

-------------------------------------------------------------------------------
Jonathan Abbey [email protected]
Applied Research Laboratories The University of Texas at Austin
Ganymede, a GPL'ed metadirectory for UNIX http://www.arlut.utexas.edu/gash2

2001-02-08 00:12:24

by List User

[permalink] [raw]
Subject: Mailing lists for Linux on OS/390?

We are starting to look at setting up a test environment
for Linux on OS/390 platforms (probably using VM/VSE). Does anyone know of
any good mail-lists or
usenet groups et al that would be specific to this type of porting/running
on this platform?

Steve