2009-10-02 13:52:20

by Trevor Hemsley

[permalink] [raw]
Subject: Temperature above threshold loop with 2.6.31.1

Hi

I just downloaded and installed the latest 2.6.31.1 kernel on my
machine here and I think I found a small bug. Shortly after boot I
start to receive messages like

CPU0: Temperature above threshold, cpu clock throttled (total events =
21672)

This is on an Intel D975XBX2 motherboard with an Intel Xeon X3220
2.4GHz quad core chip installed (the Xeon equivalent of a Q6600). The
BIOS reports the cpu temperature as being consistently 51C which may be
a bit on the high side but not dangerously so.

And, yes, I'm aware that this might be a valid temperature warning and
I aim to dismantle the machaine and check everything out over the
weekend but...

More importantly, I get approximately 100,000 of these messages per
minute and the machine is completely unusable. All of these are for
CPU0 - at least all the ones that get written to /var/log/messages.

1,666 notifications a second seems a little on the 'too frequent' side
of things to me :-)

Please cc me on any replies as I'm not subscribed to the list. Thanks.


--
Trevor Hemsley, Brighton, UK
Trevor dot Hemsley at ntlworld dot com


2009-10-02 15:03:30

by Frans Pop

[permalink] [raw]
Subject: Re: Temperature above threshold loop with 2.6.31.1

Trevor Hemsley wrote:
> I just downloaded and installed the latest 2.6.31.1 kernel on my
> machine here and I think I found a small bug. Shortly after boot I
> start to receive messages like
>
> CPU0: Temperature above threshold, cpu clock throttled (total events =
> 21672)
[...]
> More importantly, I get approximately 100,000 of these messages per
> minute and the machine is completely unusable. All of these are for
> CPU0 - at least all the ones that get written to /var/log/messages.
>
> 1,666 notifications a second seems a little on the 'too frequent' side
> of things to me :-)

Looks like this may already be fixed in mainline by the following commit:
commit b417c9fd8690637f0c91479435ab3e2bf450c038
Author: Ingo Molnar <[email protected]>
Date: Tue Sep 22 15:50:24 2009 +0200
x86: mce: Fix thermal throttling message storm

Can you confirm that please, either by compiling current git or by applying
that commit on top of 2.6.31.1?

Ingo: is that patch already scheduled for stable?

Cheers,
FJP

2009-10-02 17:05:57

by Ingo Molnar

[permalink] [raw]
Subject: Re: Temperature above threshold loop with 2.6.31.1


* Frans Pop <[email protected]> wrote:

> Trevor Hemsley wrote:
> > I just downloaded and installed the latest 2.6.31.1 kernel on my
> > machine here and I think I found a small bug. Shortly after boot I
> > start to receive messages like
> >
> > CPU0: Temperature above threshold, cpu clock throttled (total events =
> > 21672)
> [...]
> > More importantly, I get approximately 100,000 of these messages per
> > minute and the machine is completely unusable. All of these are for
> > CPU0 - at least all the ones that get written to /var/log/messages.
> >
> > 1,666 notifications a second seems a little on the 'too frequent' side
> > of things to me :-)
>
> Looks like this may already be fixed in mainline by the following commit:
> commit b417c9fd8690637f0c91479435ab3e2bf450c038
> Author: Ingo Molnar <[email protected]>
> Date: Tue Sep 22 15:50:24 2009 +0200
> x86: mce: Fix thermal throttling message storm
>
> Can you confirm that please, either by compiling current git or by applying
> that commit on top of 2.6.31.1?
>
> Ingo: is that patch already scheduled for stable?

Not yet - i just forwarded it. Thanks for pointing it out,

Ingo

2009-10-02 17:38:09

by Frans Pop

[permalink] [raw]
Subject: Re: Temperature above threshold loop with 2.6.31.1

On Friday 02 October 2009, Frans Pop wrote:
> Trevor Hemsley wrote:
> > 1,666 notifications a second seems a little on the 'too frequent' side
> > of things to me :-)
>
> Looks like this may already be fixed in mainline by the following
> commit: commit b417c9fd8690637f0c91479435ab3e2bf450c038
> Author: Ingo Molnar <[email protected]>
> Date: Tue Sep 22 15:50:24 2009 +0200
> x86: mce: Fix thermal throttling message storm
>
> Can you confirm that please, either by compiling current git or by
> applying that commit on top of 2.6.31.1?

If you want to apply it to .31.1 you'll also need this commit:
commit 3967684006f30c253bc6d4a6604d1bad4a7fc672
Author: Ingo Molnar <[email protected]>
Date: Tue Sep 22 15:50:24 2009 +0200
x86: mce: Clean up thermal throttling state tracking code

2009-10-02 17:57:49

by Ingo Molnar

[permalink] [raw]
Subject: Re: Temperature above threshold loop with 2.6.31.1


* Frans Pop <[email protected]> wrote:

> On Friday 02 October 2009, Frans Pop wrote:
> > Trevor Hemsley wrote:
> > > 1,666 notifications a second seems a little on the 'too frequent' side
> > > of things to me :-)
> >
> > Looks like this may already be fixed in mainline by the following
> > commit: commit b417c9fd8690637f0c91479435ab3e2bf450c038
> > Author: Ingo Molnar <[email protected]>
> > Date: Tue Sep 22 15:50:24 2009 +0200
> > x86: mce: Fix thermal throttling message storm
> >
> > Can you confirm that please, either by compiling current git or by
> > applying that commit on top of 2.6.31.1?
>
> If you want to apply it to .31.1 you'll also need this commit:
> commit 3967684006f30c253bc6d4a6604d1bad4a7fc672
> Author: Ingo Molnar <[email protected]>
> Date: Tue Sep 22 15:50:24 2009 +0200
> x86: mce: Clean up thermal throttling state tracking code

Yes. The way to test this is to do this on top of a .31.1 tree:

git cherry-pick 3967684006f30c253bc6d4a6604d1bad4a7fc672
git cherry-pick b417c9fd8690637f0c91479435ab3e2bf450c038

Ingo

2009-10-03 19:30:48

by Trevor Hemsley

[permalink] [raw]
Subject: Re: Temperature above threshold loop with 2.6.31.1

On Fri, 2 Oct 2009 19:57:45 +0200, Ingo Molnar wrote:

>Yes. The way to test this is to do this on top of a .31.1 tree:
>
> git cherry-pick 3967684006f30c253bc6d4a6604d1bad4a7fc672
> git cherry-pick b417c9fd8690637f0c91479435ab3e2bf450c038

The fix was tested and the fix was good.

It's still hot and throttling but the messages are only issued every ~5
mins now.

Thanks.
--
Trevor Hemsley, Brighton, UK
Trevor dot Hemsley at ntlworld dot com