2009-01-20 14:43:42

by Mika Tiainen

[permalink] [raw]
Subject: Slow clock on AMD 740G chipset


--
Mika Tiainen Always be wary of any helpful item that
[email protected] weighs less than its operating manual.
http://mikat.iki.fi -- (Terry Pratchett, Jingo)


Attachments:
aeon_dmesg (24.65 kB)
dmesg
aeon_config (50.44 kB)
kernel config
Download all attachments

2009-01-20 18:46:19

by David Rees

[permalink] [raw]
Subject: Re: Slow clock on AMD 740G chipset

On Tue, Jan 20, 2009 at 6:16 AM, Mika Tiainen <[email protected]> wrote:
> I built a new machine with Gigabyte GA-MA74GM-S2H motherboard that ntpd
> can't keep synced. Could this be a kernel bug or is it a hardware
> problem?
>
> Installed with Debian 2.6.27 kernel and currently running a self built
> 2.6.28.1, both have the problem. It's falling behind over 2s/15min:

Hmm, I've got the same exact mobo running Fedora 10
2.6.27.9-159.fc10.x86_64 kernel - no issues with time syncrhonization
here. I have a 5050e processor. I'll try to run some diffs on my
dmesg against yours later...

-Dave

Subject: Re: Slow clock on AMD 740G chipset

On Tue, Jan 20, 2009 at 04:16:10PM +0200, Mika Tiainen wrote:
>
> Hi,
>
> I built a new machine with Gigabyte GA-MA74GM-S2H motherboard that ntpd
> can't keep synced. Could this be a kernel bug or is it a hardware
> problem?
>
> Installed with Debian 2.6.27 kernel and currently running a self built
> 2.6.28.1, both have the problem. It's falling behind over 2s/15min:
>
> Jan 19 22:08:23 aeon ntpd[31468]: time reset +2.226349 s
> Jan 19 22:24:11 aeon ntpd[31468]: time reset +2.185085 s
> Jan 19 22:40:08 aeon ntpd[31468]: time reset +2.308958 s
> Jan 19 22:56:23 aeon ntpd[31468]: time reset +2.253836 s
> Jan 19 23:13:03 aeon ntpd[31468]: time reset +2.291917 s
> Jan 19 23:28:14 aeon ntpd[31468]: time reset +2.091014 s
> Jan 19 23:43:47 aeon ntpd[31468]: time reset +2.209660 s
> Jan 19 23:59:09 aeon ntpd[31468]: time reset +2.150145 s
> Jan 20 00:15:44 aeon ntpd[31468]: time reset +2.256261 s
> Jan 20 00:31:47 aeon ntpd[31468]: time reset +2.253873 s
>
> I have tried different clocksources. The machine defaults to hpet,
> acpi_pm makes no difference and

Hi,

That's annoying but I can't really help you with this. Maybe using
adjtimex as described in section 9.1.6 in
http://support.ntp.org/bin/view/Support/KnownHardwareIssues is an
option for you.

> tsc is even slower, seems to be about
> half speed to realtime.

This is due to TSC not being P- or C- state invariant. I.e. TSC
frequency changes when processor P-state (frequency) or C-state
(e.g. C2, C3 or C1E) changes. On AMD K8 TSC is not a reliable
clocksource. (This has changed with AMD family 10h CPUs).

> When i set current_clocksource to jiffies the
> clock stopped completely and I had to reboot.

That's odd. I tried it on a test machine - it didn't hang but time
doesn't change anymore and I can't even modify the clocksource
afterwards.

So obviously there is a kernel bug when jiffies are used for
clocksource.


Regards,

Andreas

--
Operating | Advanced Micro Devices GmbH
System | Karl-Hammerschmidt-Str. 34, 85609 Dornach b. M?nchen, Germany
Research | Gesch?ftsf?hrer: Jochen Polster, Thomas M. McCoy, Giuliano Meroni
Center | Sitz: Dornach, Gemeinde Aschheim, Landkreis M?nchen
(OSRC) | Registergericht M?nchen, HRB Nr. 43632

Subject: Re: Slow clock on AMD 740G chipset

On Tue, Mar 10, 2009 at 11:18:07AM +0100, Andreas Herrmann wrote:
> On Tue, Jan 20, 2009 at 04:16:10PM +0200, Mika Tiainen wrote:
> >
> > Hi,
> >
> > I built a new machine with Gigabyte GA-MA74GM-S2H motherboard that ntpd
> > can't keep synced. Could this be a kernel bug or is it a hardware
> > problem?
> >
> > Installed with Debian 2.6.27 kernel and currently running a self built
> > 2.6.28.1, both have the problem. It's falling behind over 2s/15min:
> >
> > Jan 19 22:08:23 aeon ntpd[31468]: time reset +2.226349 s
> > Jan 19 22:24:11 aeon ntpd[31468]: time reset +2.185085 s
> > Jan 19 22:40:08 aeon ntpd[31468]: time reset +2.308958 s
> > Jan 19 22:56:23 aeon ntpd[31468]: time reset +2.253836 s
> > Jan 19 23:13:03 aeon ntpd[31468]: time reset +2.291917 s
> > Jan 19 23:28:14 aeon ntpd[31468]: time reset +2.091014 s
> > Jan 19 23:43:47 aeon ntpd[31468]: time reset +2.209660 s
> > Jan 19 23:59:09 aeon ntpd[31468]: time reset +2.150145 s
> > Jan 20 00:15:44 aeon ntpd[31468]: time reset +2.256261 s
> > Jan 20 00:31:47 aeon ntpd[31468]: time reset +2.253873 s
> >
> > I have tried different clocksources. The machine defaults to hpet,
> > acpi_pm makes no difference and

> That's annoying but I can't really help you with this. Maybe using
> adjtimex as described in section 9.1.6 in
> http://support.ntp.org/bin/view/Support/KnownHardwareIssues is an
> option for you.

BTW, I've played little bit with the adjtimex tool. Verifying the
tick value (with adjtimex --tick) is most probable the solution
for your. Manual calibration is described here
http://support.ntp.org/bin/view/Support/ManualCalibration

Adapting the tick value I managed that time on my test system fell
behind ntp-time by 20 seconds within some minutes. And the other way
round it worked, too.

So it's "mere a question" of calibration on your system. AFAIK
you have to increase the tick value to not fall behind ntp-time, e.g.
increasing tick value from 10000 to 10025

# adjtimex --tick 10025


Regards,
Andreas

--
Operating | Advanced Micro Devices GmbH
System | Karl-Hammerschmidt-Str. 34, 85609 Dornach b. M?nchen, Germany
Research | Gesch?ftsf?hrer: Jochen Polster, Thomas M. McCoy, Giuliano Meroni
Center | Sitz: Dornach, Gemeinde Aschheim, Landkreis M?nchen
(OSRC) | Registergericht M?nchen, HRB Nr. 43632

2009-03-11 12:08:27

by Michael Tokarev

[permalink] [raw]
Subject: Re: Slow clock on AMD 740G chipset

Andreas Herrmann wrote:
> On Tue, Mar 10, 2009 at 11:18:07AM +0100, Andreas Herrmann wrote:
>> On Tue, Jan 20, 2009 at 04:16:10PM +0200, Mika Tiainen wrote:
>>> Hi,
>>>
>>> I built a new machine with Gigabyte GA-MA74GM-S2H motherboard that ntpd
>>> can't keep synced. Could this be a kernel bug or is it a hardware
>>> problem?
>>>
>>> Installed with Debian 2.6.27 kernel and currently running a self built
>>> 2.6.28.1, both have the problem. It's falling behind over 2s/15min:
>>>
>>> Jan 19 22:08:23 aeon ntpd[31468]: time reset +2.226349 s
>>> Jan 19 22:24:11 aeon ntpd[31468]: time reset +2.185085 s
>>> Jan 19 22:40:08 aeon ntpd[31468]: time reset +2.308958 s
>>> Jan 19 22:56:23 aeon ntpd[31468]: time reset +2.253836 s
>>> Jan 19 23:13:03 aeon ntpd[31468]: time reset +2.291917 s
>>> Jan 19 23:28:14 aeon ntpd[31468]: time reset +2.091014 s
>>> Jan 19 23:43:47 aeon ntpd[31468]: time reset +2.209660 s
>>> Jan 19 23:59:09 aeon ntpd[31468]: time reset +2.150145 s
>>> Jan 20 00:15:44 aeon ntpd[31468]: time reset +2.256261 s
>>> Jan 20 00:31:47 aeon ntpd[31468]: time reset +2.253873 s
>>>
>>> I have tried different clocksources. The machine defaults to hpet,
>>> acpi_pm makes no difference and

Same here, also on 740G, but on other 2 machines with 780G
it also happens.

>> That's annoying but I can't really help you with this. Maybe using
>> adjtimex as described in section 9.1.6 in
>> http://support.ntp.org/bin/view/Support/KnownHardwareIssues is an
>> option for you.

And adjtimex helped me on all 3 machines.
Running it in self-calibrate mode (that 70 sec thing)
plus running ntpd was enough for me for now.

What's interesting is that some time ago it worked just fine,
and, which is even more interesting, windows on this very
hardware shows quite good time stability (WITHOUT setting
the time using [s]ntp, its ntp client is disabled).

[]
> So it's "mere a question" of calibration on your system. AFAIK
> you have to increase the tick value to not fall behind ntp-time, e.g.
> increasing tick value from 10000 to 10025

Well, sorta. Given the amount of hardware that exposes this
behavour, AND the fact that it worked fine with previous
kernels (I think it was ok with 2.6.26), AND that windows
works just fine, it's not "calibration of your system"
question anymore. It's now kernel bug question... ;)

BTW, the same Gigabyte GA-MA74GM-S2H mobo is here.
Other motherboards that exposes the same issue here:

M3A78-EM (Asus, AMD780G)
M3A-H/HDMI (Asus, AMD780G)

It never happened (so far) on M2N-SLI DELUXE (also Asus,
nVidia MCP55) and on M2N-VM DVI (nVidia MCP67).

/mjt

2009-03-11 14:53:23

by Mika Tiainen

[permalink] [raw]
Subject: Re: Slow clock on AMD 740G chipset

On 11 Mar 2009, Michael Tokarev wrote:

>> On Tue, Mar 10, 2009 at 11:18:07AM +0100, Andreas Herrmann wrote:
>>> On Tue, Jan 20, 2009 at 04:16:10PM +0200, Mika Tiainen wrote:
>>>> Hi,
>>>>
>>>> I built a new machine with Gigabyte GA-MA74GM-S2H motherboard that
>>>> ntpd can't keep synced. Could this be a kernel bug or is it a
>>>> hardware problem?
>>>>
>>>> Installed with Debian 2.6.27 kernel and currently running a self
>>>> built 2.6.28.1, both have the problem. It's falling behind over
>>>> 2s/15min:
>
>>> That's annoying but I can't really help you with this. Maybe using
>>> adjtimex as described in section 9.1.6 in
>>> http://support.ntp.org/bin/view/Support/KnownHardwareIssues is an
>>> option for you.
>
> And adjtimex helped me on all 3 machines.
> Running it in self-calibrate mode (that 70 sec thing)
> plus running ntpd was enough for me for now.

Yes, I'm also using adjtimex+ntpd now with 10024 tick for adjtimex.

> What's interesting is that some time ago it worked just fine,
> and, which is even more interesting, windows on this very
> hardware shows quite good time stability (WITHOUT setting
> the time using [s]ntp, its ntp client is disabled).

Something weird is definitely going on under Linux. I got it working by
chance in 2.6.28 _exactly_ once. Just booted normally and ntpd kept it
in time without any resets for the week that it was up, next boot with
the same kernel and it was again falling behind so I installed adjtimex.

There was no difference in dmesg between working and not working.

--
Mika Tiainen Always be wary of any helpful item that
[email protected] weighs less than its operating manual.
http://mikat.iki.fi -- (Terry Pratchett, Jingo)

2009-03-24 17:28:55

by Michael Tokarev

[permalink] [raw]
Subject: Re: Slow clock on AMD 740G chipset

[resurrecting (hopefully) an old thread.
Top-posting to keep old mess around for reference.]

To refresh what has been said. Several people observed slow clock
on their - mostly AMD 780g, 740g and 690g-based systems with 2.6.28
2.6.27 kernels. Slow to a point when ntpd wasn't successful to
keep up with the drift. It has been said that the motherboards are
flaky or something and that the clocks has to be calibrated, for
which there are known procedures available (adjtimex). Which helped.
Before the "calibration" the clock were off by ~15 minutes per day.

But today I tried newly released 2.6.29 kernel on one of the affected
systems - just because I wanted to test something else. And noticed
that the clock is running too fast. After some calculation I see that
it will run away for about 15 minutes per day, that is, exactly the
number which was used to compensate for slow clock on 2.6.2[78].

So it seems that with 2.6.29, all the motherboards suddenly become
non-flaky and the timers need no calibration anymore, working just
fine. Other operating systems and kernel versions also agree with
this conclusion of 2.6.29.

Any comments on this strange phenomenon ? :)

Thanks!

/mjt

Mika Tiainen wrote at Wed, 11 Mar 2009 16:43:11 +0200:
> On 11 Mar 2009, Michael Tokarev wrote:
>
>>> On Tue, Mar 10, 2009 at 11:18:07AM +0100, Andreas Herrmann wrote:
>>>> On Tue, Jan 20, 2009 at 04:16:10PM +0200, Mika Tiainen wrote:
>>>>> Hi,
>>>>>
>>>>> I built a new machine with Gigabyte GA-MA74GM-S2H motherboard that
>>>>> ntpd can't keep synced. Could this be a kernel bug or is it a
>>>>> hardware problem?
>>>>>
>>>>> Installed with Debian 2.6.27 kernel and currently running a self
>>>>> built 2.6.28.1, both have the problem. It's falling behind over
>>>>> 2s/15min:
>>>> That's annoying but I can't really help you with this. Maybe using
>>>> adjtimex as described in section 9.1.6 in
>>>> http://support.ntp.org/bin/view/Support/KnownHardwareIssues is an
>>>> option for you.
>> And adjtimex helped me on all 3 machines.
>> Running it in self-calibrate mode (that 70 sec thing)
>> plus running ntpd was enough for me for now.
>
> Yes, I'm also using adjtimex+ntpd now with 10024 tick for adjtimex.
>
>> What's interesting is that some time ago it worked just fine,
>> and, which is even more interesting, windows on this very
>> hardware shows quite good time stability (WITHOUT setting
>> the time using [s]ntp, its ntp client is disabled).
>
> Something weird is definitely going on under Linux. I got it working by
> chance in 2.6.28 _exactly_ once. Just booted normally and ntpd kept it
> in time without any resets for the week that it was up, next boot with
> the same kernel and it was again falling behind so I installed adjtimex.
>
> There was no difference in dmesg between working and not working.
>

2009-03-24 22:28:11

by john stultz

[permalink] [raw]
Subject: Re: Slow clock on AMD 740G chipset

On Tue, Mar 24, 2009 at 10:34 AM, Michael Tokarev <[email protected]> wrote:
> [resurrecting (hopefully) an old thread.
> ?Top-posting to keep old mess around for reference.]
>
> To refresh what has been said. ?Several people observed slow clock
> on their - mostly AMD 780g, 740g and 690g-based systems with 2.6.28
> 2.6.27 kernels. ?Slow to a point when ntpd wasn't successful to
> keep up with the drift. ?It has been said that the motherboards are
> flaky or something and that the clocks has to be calibrated, for
> which there are known procedures available (adjtimex). ?Which helped.
> Before the "calibration" the clock were off by ~15 minutes per day.
>
> But today I tried newly released 2.6.29 kernel on one of the affected
> systems - just because I wanted to test something else. ?And noticed
> that the clock is running too fast. ?After some calculation I see that
> it will run away for about 15 minutes per day, that is, exactly the
> number which was used to compensate for slow clock on 2.6.2[78].
>
> So it seems that with 2.6.29, all the motherboards suddenly become
> non-flaky and the timers need no calibration anymore, working just
> fine. ?Other operating systems and kernel versions also agree with
> this conclusion of 2.6.29.
>
> Any comments on this strange phenomenon ? :)

This sounds like the Fast PIT TSC calibration fix that Linus included
fairly late in 2.6.29-rc (commit
a6a80e1d8cf82b46a69f88e659da02749231eb36). The Fast PIT method was
causing some error on some systems, such that the TSC calibrated more
then 500ppm off of its actual value, causing NTP to not be able to
compensate (the adjtimex tick manipulation folks in this thread are
doing just pulls that value back into range where NTP can correct).

See the "Linux 2.6.29-rc6" thread for details.

thanks
-john

2009-04-25 01:46:34

by David Rees

[permalink] [raw]
Subject: Re: Slow clock on AMD 740G chipset

On Tue, Mar 24, 2009 at 10:34 AM, Michael Tokarev <[email protected]> wrote:
> To refresh what has been said. ?Several people observed slow clock
> on their - mostly AMD 780g, 740g and 690g-based systems with 2.6.28
> 2.6.27 kernels. ?Slow to a point when ntpd wasn't successful to
> keep up with the drift. ?It has been said that the motherboards are
> flaky or something and that the clocks has to be calibrated, for
> which there are known procedures available (adjtimex). ?Which helped.
> Before the "calibration" the clock were off by ~15 minutes per day.

This is really weird. I earlier posted to the thread saying things
were fine on a Fedora 10 2.6.27.9-159.fc10.x86_64 kernel.

Then mysteriously after a machine reboot to install new hardware[1] on
March 27th on kernel 2.6.27.19-170.2.35.fc10.x86_64 (previous running
kernel was the same), the clock started running slow to the tune of
ntpd resetting the time every 15-18 minutes forward about 2.3 seconds.

Fast forward to today (now running 2.6.29.1-30.fc10.x86_64) and the
clock is still running slow.

> So it seems that with 2.6.29, all the motherboards suddenly become
> non-flaky and the timers need no calibration anymore, working just
> fine. ?Other operating systems and kernel versions also agree with
> this conclusion of 2.6.29.

I don't know - my system (GA-MA74GM-S2 mobo) is still broken.

[1] So the hardware I installed was a SATA SSD (OCZ Vertex). Ever
since then, the clock has been running fast. Previously, the only
thing on the SATA bus was a DVD drive - it has two IDE drives, one
plugged in on board and the other into a Promise IDE card.

When doing so, the sata ports are now running in AHCI mode instead of
native mode. I'll have to try switching later.

-Dave

2009-04-30 23:17:55

by David Rees

[permalink] [raw]
Subject: Re: Slow clock on AMD 740G chipset

On Fri, Apr 24, 2009 at 6:45 PM, David Rees <[email protected]> wrote:
> On Tue, Mar 24, 2009 at 10:34 AM, Michael Tokarev <[email protected]> wrote:
>> To refresh what has been said. ?Several people observed slow clock
>> on their - mostly AMD 780g, 740g and 690g-based systems with 2.6.28
>> 2.6.27 kernels. ?Slow to a point when ntpd wasn't successful to
>> keep up with the drift. ?It has been said that the motherboards are
>> flaky or something and that the clocks has to be calibrated, for
>> which there are known procedures available (adjtimex). ?Which helped.
>> Before the "calibration" the clock were off by ~15 minutes per day.
>
> This is really weird. ?I earlier posted to the thread saying things
> were fine on a Fedora 10 2.6.27.9-159.fc10.x86_64 kernel.
>
> Then mysteriously after a machine reboot to install new hardware[1] on
> March 27th on kernel 2.6.27.19-170.2.35.fc10.x86_64 (previous running
> kernel was the same), the clock started running slow to the tune of
> ntpd resetting the time every 15-18 minutes forward about 2.3 seconds.
>
> Fast forward to today (now running 2.6.29.1-30.fc10.x86_64) and the
> clock is still running slow.
>
> I don't know - my system (GA-MA74GM-S2 mobo) is still broken.
>
> [1] So the hardware I installed was a SATA SSD (OCZ Vertex). Ever
> since then, the clock has been running fast. ?Previously, the only
> thing on the SATA bus was a DVD drive - it has two IDE drives, one
> plugged in on board and the other into a Promise IDE card.
>
> When doing so, the sata ports are now running in AHCI mode instead of
> native mode. ?I'll have to try switching later.

Another update. Two nights ago I set the SATA ports in IDE mode to
try to flash the OCZ Vertex to the latest firmware 1370, previously
running 1275 (The flash failed as the flashing utility could not
detect the drives in the system). Clock still ran slow. Last night,
I succeeded in flashing the drive after putting it in another system.

Since then, the clock has been keeping time very well - at least it's
not losing 2-3 seconds every 15 minutes.

So is it possible for a SATA drive to somehow affect the speed of the clock?

-Dave