2008-08-15 12:49:28

by David Witbrodt

[permalink] [raw]
Subject: Re: HPET regression in 2.6.26 versus 2.6.25 -- question about NMI watchdog



> > I found something very interesting about the commit that first causes
> > the lockup (3def3d6d...), and the very next commit (1e934dda...) -- if
> > I checkout 1e94... and try to revert the changes made in 3def..., the
> > kernel freezes in spite of the revert.
> >
> > Because of this, I would conclude that your patch for 2.6.27-rc3 was
> > doomed before you began, and we should look more carefully at the
> > commits from February instead of trying to revert at the 2.6.27 HEAD.
>
> i'm still wondering whether we could try to figure out something about
> the nature of the hard lockup itself.
>
> Have you tried to activate the NMI watchdog? It _usually_ works fine if
> you use a boot option along the lines of:
>
> "lapic nmi_watchdog=2 idle=poll"

I have to go to work for a few hours right now, but will try this out when
I get home. (Actually, I'm late for work as I type this... but I have my
priorities straight! ;)

Quick question: a quick browse of 'Documentation/nmi_watchdog.txt' suggests
that I should use "nmi_watchdog=1", since I have SMP (CPU = Athlon 64 X2,
with CONFIG_SMP=y). Should I follow your suggestion later, or follow the
recommendation of the 'nmi_watchdog.txt' doc?


Much thanks,
Dave W.


2008-08-15 13:27:37

by Ingo Molnar

[permalink] [raw]
Subject: Re: HPET regression in 2.6.26 versus 2.6.25 -- question about NMI watchdog


* David Witbrodt <[email protected]> wrote:

> Quick question: a quick browse of 'Documentation/nmi_watchdog.txt'
> suggests that I should use "nmi_watchdog=1", since I have SMP (CPU =
> Athlon 64 X2, with CONFIG_SMP=y). Should I follow your suggestion
> later, or follow the recommendation of the 'nmi_watchdog.txt' doc?

you could try both, starting with nmi_watchdog=2 - and trying
nmi_watchdog=1 if that doesnt work. The problem with nmi_watchdog=1 is
that it disables high-res timers. (because it has to - it piggy-backs on
the back of a periodic timer interrupt)

you might even want to test the NMI watchdog with an intentional
user-space hard lockup - with the attached lockupcli.c program.
(Warning: if you run it as root it will really lock up your box hard.
Run it from a VGA text mode console to see any console messages.)

Ingo


Attachments:
(No filename) (858.00 B)
lockupcli.c (46.00 B)
Download all attachments