After a recent hard drive crash, I re-installed Linux to a new hard drive.
After about 2 weeks, my system now spontaneously reboots about once per 10
minutes (on avg.). I'm assuming I messed up something in my kernel
configuration as Windows is still stable. To verify that it wasn't the new
hard drive (or use of different controller) I formatted a segment of it
under Windows and copied 7+ gb of data onto it while doing other things
without problem.
The system will reboot as early as after detecting the hard drives and
before loading the root filesystem or anytime thereafter - sometimes in
logging into the console, sometimes in X.
My system configuration is as follows:
Microstar 694D-Pro-AR
Dual PIII - 800's - not overclocked
Nice 450 Watt PS
Onboard Promise PDC20265
Onboard AC97Audio - Disabled
Soundblaster Live
2 Hard drives
- 1=IBM-40gb on Promise Controller
- 1=WD-80gb on onboard UDMA/66 controller (previous configuration was also
on promise)
USB Keyboard/Mouse/Scanner
Intel EEPro100
NVidia TNT2 Utlra (considering the system sometimes crashes before I enter X
and before the NVidia driver is loaded, my kernel has not been tainted at
this point).
I don't get any messages is /var/log/... nor do I get an oops. I have tried
this under 2.4.19, 2.4.20, and 2.4.21-pre2 (all compiled with gcc-2.95.3)
and I get the same behavior. I have noticed no similarities between the
crashes. At this point, I have no idea how to isolate it other than to
start removing every single unnecessary kernel module/option from my .config
and recompiling. Any suggestions? Want to see a grep of my .config?
TIA,
--Kaleb
PS: Although I'm going to try to monitor the list for the next few days,
please CC me in case I miss it.
On Tue, 2003-01-07 at 06:57, Kaleb Pederson wrote:
> I don't get any messages is /var/log/... nor do I get an oops. I have tried
> this under 2.4.19, 2.4.20, and 2.4.21-pre2 (all compiled with gcc-2.95.3)
> and I get the same behavior. I have noticed no similarities between the
> crashes. At this point, I have no idea how to isolate it other than to
> start removing every single unnecessary kernel module/option from my .config
> and recompiling. Any suggestions? Want to see a grep of my .config?
Start with the easy bits. Check the CPU fans, run memtest86, reseat all the cards
In some situations Linux will stress the hardware differently to windows -
especially the RAM. Also if your windows test wasnt SMP its not going to have
tested much.
On Mon, Jan 06, 2003 at 10:57:03PM -0800, Kaleb Pederson wrote:
> After a recent hard drive crash, I re-installed Linux to a new hard drive.
> After about 2 weeks, my system now spontaneously reboots about once per 10
> minutes (on avg.). I'm assuming I messed up something in my kernel
> configuration as Windows is still stable. To verify that it wasn't the new
> hard drive (or use of different controller) I formatted a segment of it
> under Windows and copied 7+ gb of data onto it while doing other things
> without problem.
>
> The system will reboot as early as after detecting the hard drives and
> before loading the root filesystem or anytime thereafter - sometimes in
> logging into the console, sometimes in X.
>
Hmm, I've observed this behavior with apm on certian buggy systems, though
it was several versions ago. Are you using apm, acpi, or neither?
Considering both control power management, I would try disabling them as a
test.
Regards,
Adam
>Hmm, I've observed this behavior with apm on certian buggy systems, though
>it was several versions ago. Are you using apm, acpi, or neither?
ACPI causes my system to fail on bootup 1/3 of the time, although once
booted it does not result in crashes. However, I currently have both APM
and ACPI turned off.
Thanks for your suggestions.
--Kaleb
PS: Although I'm going to try to monitor the list for the next few days,
please CC me in case I miss it.
At 10:57 PM 1/6/2003 -0800, Kaleb Pederson wrote:
>After a recent hard drive crash, I re-installed Linux to a new hard drive.
was the drive that crashed a scsi, and the new one ide?
At 10:57 PM 1/6/2003 -0800, Kaleb Pederson wrote:
>Onboard Promise PDC20265
hence, ide...
sorry, i wasnt paying attention. if your previous drive was a
scsi, do you have an intel 82801DB chip onboard?
billy
On Mon, Jan 06, 2003 at 10:57:03PM -0800, Kaleb Pederson wrote:
> After a recent hard drive crash, I re-installed Linux to a new hard drive.
> After about 2 weeks, my system now spontaneously reboots about once per 10
> minutes (on avg.). I'm assuming I messed up something in my kernel
> configuration as Windows is still stable. To verify that it wasn't the new
> hard drive (or use of different controller) I formatted a segment of it
> under Windows and copied 7+ gb of data onto it while doing other things
> without problem.
>
> The system will reboot as early as after detecting the hard drives and
> before loading the root filesystem or anytime thereafter - sometimes in
> logging into the console, sometimes in X.
>
> My system configuration is as follows:
>
> Microstar 694D-Pro-AR
> Dual PIII - 800's - not overclocked
> Nice 450 Watt PS
> Onboard Promise PDC20265
> Onboard AC97Audio - Disabled
> Soundblaster Live
> 2 Hard drives
> - 1=IBM-40gb on Promise Controller
> - 1=WD-80gb on onboard UDMA/66 controller (previous configuration was also
> on promise)
> USB Keyboard/Mouse/Scanner
> Intel EEPro100
> NVidia TNT2 Utlra (considering the system sometimes crashes before I enter X
> and before the NVidia driver is loaded, my kernel has not been tainted at
> this point).
>
> I don't get any messages is /var/log/... nor do I get an oops. I have tried
> this under 2.4.19, 2.4.20, and 2.4.21-pre2 (all compiled with gcc-2.95.3)
> and I get the same behavior. I have noticed no similarities between the
> crashes. At this point, I have no idea how to isolate it other than to
> start removing every single unnecessary kernel module/option from my .config
> and recompiling. Any suggestions? Want to see a grep of my .config?
I have just replaced a power-source (the new one has higher power) and
the machine started crashing. Until I increased voltages for CPU in
bios. Maybe playing with bios in ways like changing core and IO voltage,
changing various timings etc. could help. You could also try disabling
the SCSI and IDE controlers in turn if one of them - or the linux driver
for one of them - is to blame.
-------------------------------------------------------------------------------
Jan 'Bulb' Hudec <[email protected]>
Hi,
what do you think about using knoppix and see if you have the same problems
there. Perhaps something during your installation went wrong.
http://www.knopper.net/knoppix/
Bernd
On Mon, 6 Jan 2003 22:57:03 -0800
"Kaleb Pederson" <[email protected]> wrote:
> After a recent hard drive crash, I re-installed Linux to a new hard
> drive. After about 2 weeks, my system now spontaneously reboots about
> once per 10 minutes (on avg.). I'm assuming I messed up something in
[snip]
> NVidia TNT2 Utlra (considering the system sometimes crashes before I
> enter X and before the NVidia driver is loaded, my kernel has not been
> tainted at this point).
I still thinks it's the nvidia module. I've had the same problems with
kernels >2.4.18. Someone told me that it has to do something with the
coherency bug. I suggest trying 2.4.18...
Best regards
Andreas
--
Andreas Tscharner [email protected]
----------------------------------------------------------------------
"Programming today is a race between software engineers striving to
build bigger and better idiot-proof programs, and the Universe trying
to produce bigger and better idiots. So far, the Universe is winning."
-- Rich Cook
I'm now pretty sure that it is a hardware failure of some type. Windows was
stable last night for about four hours of compiling, graphics manipulations,
etc. But, when I got home after being gone for several hours, Windows
started exhibiting the same behavior. I presume it is Linux sensitivity to
hardware that made it show up 5 days sooner. I'm presume, at this point,
that it is either the motherboard or one of the processors.
Thank you everyone for your suggestions. Of the many messages I received, the
following were good and relevant to my system and I will try them to see if
it does make a difference.
1) Try disabling apm/acpi in bios (I had done this in the kernel, not in
bios).
2) Try a uniprocessor kernel or booting with only one processor
3) mount /var synchronous to see if anything shows up in the logs (I had
checked the logs and nothing was getting written to it. I had forgotten that
you could make the whole file system synchronous; I'll try this.)
4) Increase voltage to the processors and see if it helps.
Per some other questions, I'm not using scsi nor do I have an intel 82801DB
chip onboard.
Thanks again for the help.
--Kaleb
PS: Please CC me any responses that go to the list.