2002-02-21 14:53:31

by David Burrows

[permalink] [raw]
Subject: baffling linux bug / hardware problem

Hi all,

I have a problem where my computer locks up during "Calibrating Delay
Loop..". I have been using Linux on this same hardware for many years,
and it only started doing this 2 days ago. It does not seem to matter
what kernel version (2.0, 2.2, 2.4.17) I use or what medium I boot from.

I'd write this off as a hardware problem but both Windows 98 and FreeBSD
4.5 seem to be able to boot and function properly. I've tried to debug
init/main.c myself and put printks in the loop calibration function. It
seems to go through the first while loop twice, then hang before getting
to the __delay part.

Could this be a timer interrupt problem? How do I diagnose this? Why do
other oses work and Linux (which previously worked fine) no longer does
no matter what version or where I boot it from.

My hardware is p166mmx, intel HX chipset. I have also tried a memory
tester which says that the memory is fine. I'm totally stumped.

Any feedback on this problem would be much appreciated. (CC me direct as
I'm not subscribed to this list).

Thanks in advance,

Dave.


2002-02-21 15:02:52

by Dave Jones

[permalink] [raw]
Subject: Re: baffling linux bug / hardware problem

On Fri, Feb 22, 2002 at 01:53:10AM +1100, David Burrows wrote:
> I have a problem where my computer locks up during "Calibrating Delay
> Loop..". I have been using Linux on this same hardware for many years,
> and it only started doing this 2 days ago. It does not seem to matter
> what kernel version (2.0, 2.2, 2.4.17) I use or what medium I boot from.

I had an old Winchip box that did this. Turned out to be a bad SIMM.
Try running memtest86 for a while.

--
| Dave Jones. http://www.codemonkey.org.uk
| SuSE Labs

2002-02-22 01:03:27

by David Burrows

[permalink] [raw]
Subject: Dodgey Linus BogoMIPS code ;) (was Re: baffling linux bug)

On Thu, 21 Feb 2002, Dave Jones wrote:
> On Fri, Feb 22, 2002 at 01:53:10AM +1100, David Burrows wrote:
> > I have a problem where my computer locks up during "Calibrating Delay
> > Loop..". I have been using Linux on this same hardware for many years,
> > and it only started doing this 2 days ago. It does not seem to matter
> > what kernel version (2.0, 2.2, 2.4.17) I use or what medium I boot from.
>
> I had an old Winchip box that did this. Turned out to be a bad SIMM.
> Try running memtest86 for a while.

I have ran memtest86 all the way through, and shuffled the memory around
(moved them to different slots) and it still crashes in calibrating delay
loop. FreeBSD and Windows still work. If I knew how init/main.c worked,
what jiffies are and how they are updated (timer interrupt?), then I would
have some idea of what I'm doing when I step through the BogoMip
calculation code.

Is there some sort of (or can there be) a safety check which tests to see
if the timer is functioning correctly and displays an error message such
as "Timer interrupt is broken, system halted." if it is not. I don't want
to give up on this hardware yet, ESPECIALLY considering that other
operating systems that I absolutely HATE still continue to work, and my
favourite one doesn't. :(

Thanks again for your time,

David.

2002-02-22 01:29:51

by David Burrows

[permalink] [raw]
Subject: Re: Dodgey Linus BogoMIPS code ;) (was Re: baffling linux bug)

On Thu, 21 Feb 2002, Mike Fedyk wrote:
> I didn't see one thing mentioning Linus in there... ;) I could sue if you
> were selling something. ;)

Kind of. Except Linus wrote the particular section of code in question.
=)

> Anyway, jiffies are same as HZ and on i386 100 jiffies/sec, and one timer
> interrupt per jiffie.

Or perhaps not in the case of my hardware functioning properly one day,
and never to boot linux (but fine with everything else) again..

I need a sure fire way of testing whether the timer interrupt works,
perhaps even a kernel patch to include such a check before initialising
the timers. Is there a possibility of working around such problem? I
would rather destroy this motherboard than sacrifice it to running
inferior operating systems for the remainder of its life. =)

Regards,

Dave.

2002-02-22 01:51:22

by David Burrows

[permalink] [raw]
Subject: Re: Dodgey Linus BogoMIPS code ;) (was Re: baffling linux bug)

On Thu, 21 Feb 2002, Mark Hahn wrote:
> > > > loop. FreeBSD and Windows still work. If I knew how init/main.c worked,
> > > > what jiffies are and how they are updated (timer interrupt?), then I would
> > > sorry if you've already mentioned this, but do you have
> > > CONFIG_X86_UP_*APIC selected (something like "use local apic
> > > for uniprocessors")? if so, it could explain why you
> > > might be failing to receive timer irq's.
> >
> > I've tried with 2.0 kernels, 2.2 kernels and 2.4 kernels. Some of them I
> > can actually get to oops, but most of the time just a hard lock. I
> > believe I have that option disabled, I can't verify that at the moment
> > because I can't even run Linux.
>
> and you said that you have in the past run linux on this machine?
> and that you haven't made any bios changes/upgrades/etc?
> and that your pre-hang (pre-calibrating-delay-loop) messages
> are completely uninformative? also, that you are not making the
> TSC-ful-cpu mistake? actually the latter would be my guess...

Yes, I have been running various versions of Linux without problem on this
same machine for a number of years. I had not changed any hardware or
software between when it was working and when it suddenly started failing.
It is for this reason alone I posted on linux-kernel to get some help.

> > Then why does FreeBSD still work?
>
> I was answering your hypothetical question. freeBSD works because
> it doesn't share any code with linux, of course, same with windows.

I would assume that FreeBSD and Windows would still use the timer for some
reason or another. Therefore whatever sudden quirk that appeared in my
hardware has revealed a bug in Linux..

> > There are folks who would claim that
> > FreeBSD is superior to Linux.... ;)
>
> yeah, and Bhuddism is better than Christianity. such claims tell
> you exactly one thing: the utterer is a twit.

Well I personally don't believe one way or another. Its apples and
oranges as far as I'm concerned, however in this case one could say that
FreeBSD is better because it actually works. ;)

Cheers,

David.

2002-02-22 02:48:25

by Mike Fedyk

[permalink] [raw]
Subject: Re: Dodgey Linus BogoMIPS code ;) (was Re: baffling linux bug)

David,

I doubt many people will be able to help you until you post information
about your hardware.

Try the latest 2.4.18-rc and post the .config, also post the output of lspci
-v

If you have a serial console, post what output you do get from a kernel
booting.

Mike

2002-02-22 09:03:15

by Zwane Mwaikambo

[permalink] [raw]
Subject: Re: Dodgey Linus BogoMIPS code ;) (was Re: baffling linux bug)

On Fri, 22 Feb 2002, David Burrows wrote:

> I would assume that FreeBSD and Windows would still use the timer for some
> reason or another. Therefore whatever sudden quirk that appeared in my
> hardware has revealed a bug in Linux..

Actually, FreeBSD frobs the timer (i8254) much more during its calibration
than linux, so my guess is that its not a broken timer or somesuch, and
they both use the TSC (if available) so perhaps its not that either.

Regards,
Zwane Mwaikambo

2002-02-25 09:44:56

by Helge Hafting

[permalink] [raw]
Subject: Re: Dodgey Linus BogoMIPS code ;) (was Re: baffling linux bug)

David Burrows wrote:
>
> On Thu, 21 Feb 2002, Mike Fedyk wrote:
> > I didn't see one thing mentioning Linus in there... ;) I could sue if you
> > were selling something. ;)
>
> Kind of. Except Linus wrote the particular section of code in question.
> =)
>
> > Anyway, jiffies are same as HZ and on i386 100 jiffies/sec, and one timer
> > interrupt per jiffie.
>
> Or perhaps not in the case of my hardware functioning properly one day,
> and never to boot linux (but fine with everything else) again..
>
> I need a sure fire way of testing whether the timer interrupt works,

I once used a printk in the keyboard irq handler to check a
nonstandard keyboard. You may put a printk() in the timer interrupt
handler.
That should show that the irq handler works. Of course you don't want
to
run such a log-filler for long... :-)

Maybe you can get the machine to boot by skipping the bogomips
calculation completely - by hardcoding the value your machine used to
come up with?
Not for production use - just to get a debugging kernel going.

Helge Hafting

2002-02-25 11:15:35

by Zwane Mwaikambo

[permalink] [raw]
Subject: Re: Dodgey Linus BogoMIPS code ;) (was Re: baffling linux bug)

On Mon, 25 Feb 2002, Helge Hafting wrote:

> Maybe you can get the machine to boot by skipping the bogomips
> calculation completely - by hardcoding the value your machine used to
> come up with?
> Not for production use - just to get a debugging kernel going.

One thing worth keeping in mind also is that Linux makes assumptions about
the i8254 timer interval, try booting verbose in FreeBSD and try and
see what you get.

Zwane


2002-02-25 12:34:10

by David Burrows

[permalink] [raw]
Subject: Re: Dodgey Linus BogoMIPS code ;) (solved!)

Thanks all who responded! It turned out I had a faulty power supply. It
was causing peculiar failures and then finally a total failure (black
screen). I have since changed to a new case and everything is fine.

The moral to the story is that computer hardware is not invulnerable. Try
not to point the finger too soon at software even if other software
"seems" to work properly.

Happy hacking, =)

David.

On Mon, 25 Feb 2002, Helge Hafting wrote:
> David Burrows wrote:
> >
> > On Thu, 21 Feb 2002, Mike Fedyk wrote:
> > > I didn't see one thing mentioning Linus in there... ;) I could sue if you
> > > were selling something. ;)
> >
> > Kind of. Except Linus wrote the particular section of code in question.
> > =)
> >
> > > Anyway, jiffies are same as HZ and on i386 100 jiffies/sec, and one timer
> > > interrupt per jiffie.
> >
> > Or perhaps not in the case of my hardware functioning properly one day,
> > and never to boot linux (but fine with everything else) again..
> >
> > I need a sure fire way of testing whether the timer interrupt works,
>
> I once used a printk in the keyboard irq handler to check a
> nonstandard keyboard. You may put a printk() in the timer interrupt
> handler.
> That should show that the irq handler works. Of course you don't want
> to
> run such a log-filler for long... :-)
>
> Maybe you can get the machine to boot by skipping the bogomips
> calculation completely - by hardcoding the value your machine used to
> come up with?
> Not for production use - just to get a debugging kernel going.
>
> Helge Hafting
>