My system looses about 8 seconds every 20 minutes. This is reported
by ntp and verified by comparing 'date' to 'hwclock --show' and a wall
clock.
My system is a x86 Dell laptop with HZ=1024.
I am quite certain that the issue is the System Management Interrupt
(SMI).
While doing latency tests I have observed 18ms delays every 2 seconds.
Like, as they say, clock work. Given that in 18ms with HZ==1024
roughly 18 timer interrupts should occur then 17 of them (I believe)
would be lost. Looking in the kernel sources I could find nothing
that adjusts for this.
Since I have defined HZ to be 1024, I miss lots of timer interrupts.
However, since the the processor spends 18ms at a time in SMM (System
Mangement Mode), then even the stock 10ms timer tick will sometimes
miss a tick. Thus the problem applies to non-hacked kernels also.
I don't know that there is a solution for all systems, however, at
least on pentium systems it seems possible to use the TSC to catch
this. However, even if I worked up a patch to do so, do_timer()
always increments jiffies by just 1 count and it isn't clear that its
safe to call it repeatedly to catch up with lost ticks. It also isn't
clear that it would be safe to modify jiffies directly in one of the
arch/i386/kernel/time.c functions.
In general, I'd like to try a solution that looks something like:
tsc_per_jiffie = cpu_khz * 1000 / HZ;
tsc_remainder += last_tsc_low-tsc_low;
jiffies_increment=0;
do {
tsc_remainder -= tsc_per_jiffie;
jiffies_increment++;
} while (tsc_remainder > tsc_per_jiffie);
do_timer(regs, jiffies_increment);
The above was created on the fly and completely untested. It needs
bits like making sure that the arithmetic works properly on overflow
of tsc_low. It also requires a patch to do_timer() and proper
structuring for portability.
One problem I see is that tsc_per_jiffie must be perfect or time will
drift. I think it might work to not carry over the remainder from
cycle to cycle under some conditions (no missed ticks) but I'd have
to think about that the effects of timing jitter on this.
Have attempts to address this problem been made before?
What are the problems with incrementing jiffies by more than 1?
What problems have I missed?
What strategies might be employed to prevent degraded system
performance since this code is in a criticle path?
Have I competely missed something, the kernel already takes care of
this and I have the problem all wrong?
This problem also comes up with IDE access with dma off and I've
seen reports of it when using frame buffers.
Thanks!
Ty
--
Tyson D Sawyer iRobot Corporation
Senior Systems Engineer Military Systems Division
[email protected] Robots for the Real World
603-532-6900 ext 206 http://www.irobot.com
Tyson D Sawyer wrote:
>
> My system looses about 8 seconds every 20 minutes. This is reported
> by ntp and verified by comparing 'date' to 'hwclock --show' and a wall
> clock.
>
> My system is a x86 Dell laptop with HZ=1024.
>
> I am quite certain that the issue is the System Management Interrupt
> (SMI).
>
> While doing latency tests I have observed 18ms delays every 2 seconds.
> Like, as they say, clock work. Given that in 18ms with HZ==1024
> roughly 18 timer interrupts should occur then 17 of them (I believe)
> would be lost. Looking in the kernel sources I could find nothing
> that adjusts for this.
>
> Since I have defined HZ to be 1024, I miss lots of timer interrupts.
> However, since the the processor spends 18ms at a time in SMM (System
> Mangement Mode), then even the stock 10ms timer tick will sometimes
> miss a tick. Thus the problem applies to non-hacked kernels also.
>
> I don't know that there is a solution for all systems, however, at
> least on pentium systems it seems possible to use the TSC to catch
> this. However, even if I worked up a patch to do so, do_timer()
> always increments jiffies by just 1 count and it isn't clear that its
> safe to call it repeatedly to catch up with lost ticks. It also isn't
> clear that it would be safe to modify jiffies directly in one of the
> arch/i386/kernel/time.c functions.
>
> In general, I'd like to try a solution that looks something like:
>
> tsc_per_jiffie = cpu_khz * 1000 / HZ;
>
> tsc_remainder += last_tsc_low-tsc_low;
> jiffies_increment=0;
> do {
> tsc_remainder -= tsc_per_jiffie;
> jiffies_increment++;
> } while (tsc_remainder > tsc_per_jiffie);
>
> do_timer(regs, jiffies_increment);
>
> The above was created on the fly and completely untested. It needs
> bits like making sure that the arithmetic works properly on overflow
> of tsc_low. It also requires a patch to do_timer() and proper
> structuring for portability.
You might take a look at the high-res-timers patch (see URL in
signature) where the timer interrupt is separated from the wall clock
computation. In that patch, do_timer() updates jiffies as needed and
then calls the wall clock update which can handle more than one jiffie
at a time.
One of the nasty problems, especially with machines such as yours (i.e.
lap tops), is the fact that TSC is NOT clocked at a fixed rate. It is
affected by throttling (reduced in 12.5% increments) and by power
management. The patch attempts to find a way thru these problems by
making the ACPI pm timer one of the options for keeping wall clock.
This timer is clocked at a constant rate regardless of power management,
indeed, it was created to address just these concerns. The down side is
that it accessed via an I/O instruction and thus adds overhead to the
tick processing and also to all attempts to read system time to a finer
level than the jiffie (most of which are internal, i.e. not from user
land).
>
> One problem I see is that tsc_per_jiffie must be perfect or time will
> drift. I think it might work to not carry over the remainder from
> cycle to cycle under some conditions (no missed ticks) but I'd have
> to think about that the effects of timing jitter on this.
>
> Have attempts to address this problem been made before?
>
> What are the problems with incrementing jiffies by more than 1?
>
> What problems have I missed?
>
> What strategies might be employed to prevent degraded system
> performance since this code is in a criticle path?
>
> Have I competely missed something, the kernel already takes care of
> this and I have the problem all wrong?
>
> This problem also comes up with IDE access with dma off and I've
> seen reports of it when using frame buffers.
The IDE issue is correctly address by using DMA.
I think the real problem needs to be addressed, i.e. why does the SMI
(and/ or other code) keep the interrupt system off so long. Most
interrupts are completed in micro seconds, not milliseconds, lets fix
the real problem.
>
> Thanks!
> Ty
>
> --
> Tyson D Sawyer iRobot Corporation
> Senior Systems Engineer Military Systems Division
> [email protected] Robots for the Real World
> 603-532-6900 ext 206 http://www.irobot.com
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
--
George [email protected]
High-res-timers: http://sourceforge.net/projects/high-res-timers/
Real time sched: http://sourceforge.net/projects/rtsched/
On Sat, Feb 16, 2002 at 08:05:19AM -0800, george anzinger wrote:
> I think the real problem needs to be addressed, i.e. why does the SMI
> (and/ or other code) keep the interrupt system off so long. Most
> interrupts are completed in micro seconds, not milliseconds, lets fix
> the real problem.
The SMI is an unbearable abomination and it is an issue that even Microsoft
has been unable to make Intel respond to properly. It makes Rambus seem brilliant.
The basic idea: take a high speed well optimized processor that is the most
critical performance component of your system and arbitrarily divert it to managing
fans completely outside of OS control is so unbearably stupid, arrogant, ugly and
nauseating as to be hard to believe even in this industry.
--
---------------------------------------------------------
Victor Yodaiken
Finite State Machine Labs: The RTLinux Company.
http://www.fsmlabs.com http://www.rtlinux.com
> My system looses about 8 seconds every 20 minutes. This is reported
> by ntp and verified by comparing 'date' to 'hwclock --show' and a wall
> clock.
>
> My system is a x86 Dell laptop with HZ=1024.
>
> I am quite certain that the issue is the System Management Interrupt
> (SMI).
Possibly and if it is you can't really do much about it.
> I don't know that there is a solution for all systems, however, at
> least on pentium systems it seems possible to use the TSC to catch
Most vendor systems don't have SMI problems that bad, you can normally hit
the 100Hz or 1Khz tick quite reliably.
> tsc_remainder += last_tsc_low-tsc_low;
The tsc is not a constant on some laptops, and may not be present, or not
be reliable.
> What strategies might be employed to prevent degraded system
> performance since this code is in a criticle path?
Adding a time slew is a well understood problem - the NTP code and papers
cover some very efficient implementation techniques. If you can work out
the drift then drifting back is extremely efficient and the kernel already
implements the needed PLL.
> Have I competely missed something, the kernel already takes care of
> this and I have the problem all wrong?
You have it pretty much right. We have several time sources on a PC -
the rtc (variable rate, not always present), the cmos clock (low res and
on many modern machines horribly inaccurate due to the use of low grade
components), the acpi timers (on newer machines only, high resolution
constant rate, unknown accuracy).
ACPI may help here but lots of vendors implement their ACPI subsystem using
I/O cycles to jump into SMM mode so its game over again.
> This problem also comes up with IDE access with dma off and I've
> seen reports of it when using frame buffers.
The frame buffer one is fixed for newer kernels. The IDE one is a physical
constraint on some older IDE controllers. See man hdparm.
Alan
> The SMI is an unbearable abomination and it is an issue that even Microsoft
> has been unable to make Intel respond to properly. It makes Rambus seem brilliant.
To be fair Intel have made it possible to pull this out into processor
control with ACPI. ACPI isnt the greatest bit of design I've ever seen (not
by far) but they addressed the problem.
The BIOS vendors duely decided that the problem was one they didnt mind
having, and it saved code. Especially anyone still doing APM support where
SMI is the only sane implementation approach
Alan Cox <[email protected]> writes:
> > My system looses about 8 seconds every 20 minutes. This is reported
> > by ntp and verified by comparing 'date' to 'hwclock --show' and a wall
> > clock.
> >
> > My system is a x86 Dell laptop with HZ=1024.
> >
> > I am quite certain that the issue is the System Management Interrupt
> > (SMI).
>
> Possibly and if it is you can't really do much about it.
Except usually the truly annoyed can reprogram the chipset so an SMI
interrupt is not generated. But I doubt that is practical on a
laptop.
> ACPI may help here but lots of vendors implement their ACPI subsystem using
> I/O cycles to jump into SMM mode so its game over again.
Hmm. I wonder if this is a simple transition technique or if it is
their long term strategy.
Now I'm going to research and see if SMM mode is supported on with
x86-64 and ia64. With ACPI the case can at least be made that SMM
mode is not strictly necessary and should be dropped. I'm dreaming
but if the processors didn't have this super protected mode, BIOS
vendors and operating system vendors would be force to cooperate on
these issues.
Eric
Alan Cox wrote:
>
> > My system looses about 8 seconds every 20 minutes. This is reported
> > by ntp and verified by comparing 'date' to 'hwclock --show' and a wall
> > clock.
> >
> > My system is a x86 Dell laptop with HZ=1024.
> >
> > I am quite certain that the issue is the System Management Interrupt
> > (SMI).
>
> Possibly and if it is you can't really do much about it.
>
> > I don't know that there is a solution for all systems, however, at
> > least on pentium systems it seems possible to use the TSC to catch
>
> Most vendor systems don't have SMI problems that bad, you can normally hit
> the 100Hz or 1Khz tick quite reliably.
>
> > tsc_remainder += last_tsc_low-tsc_low;
>
> The tsc is not a constant on some laptops, and may not be present, or not
> be reliable.
>
> > What strategies might be employed to prevent degraded system
> > performance since this code is in a criticle path?
>
> Adding a time slew is a well understood problem - the NTP code and papers
> cover some very efficient implementation techniques. If you can work out
> the drift then drifting back is extremely efficient and the kernel already
> implements the needed PLL.
>
> > Have I competely missed something, the kernel already takes care of
> > this and I have the problem all wrong?
>
> You have it pretty much right. We have several time sources on a PC -
> the rtc (variable rate, not always present), the cmos clock (low res and
> on many modern machines horribly inaccurate due to the use of low grade
> components), the acpi timers (on newer machines only, high resolution
> constant rate, unknown accuracy).
I rather thought that since it runs at exactly 3 times the PIC clock
rate that it used the same "rock". Andrew, any thoughts here?
>
> ACPI may help here but lots of vendors implement their ACPI subsystem using
> I/O cycles to jump into SMM mode so its game over again.
>
> > This problem also comes up with IDE access with dma off and I've
> > seen reports of it when using frame buffers.
>
> The frame buffer one is fixed for newer kernels. The IDE one is a physical
> constraint on some older IDE controllers. See man hdparm.
>
> Alan
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
--
George [email protected]
High-res-timers: http://sourceforge.net/projects/high-res-timers/
Real time sched: http://sourceforge.net/projects/rtsched/
sorry disregard message about 2.5.4 not compiling I have patch and its
fine <G>
lee
Followup to: <[email protected]>
By author: george anzinger <[email protected]>
In newsgroup: linux.dev.kernel
>
> One of the nasty problems, especially with machines such as yours (i.e.
> lap tops), is the fact that TSC is NOT clocked at a fixed rate. It is
> affected by throttling (reduced in 12.5% increments) and by power
> management.
If the TSC is affected by HLT, throttling, or C2 power management, the
TSC is broken (as it is on Cyrix chips, for example.) The TSC usually
*is* affected by C3 power management, but the OS should be aware of
C3.
-hpa
--
<[email protected]> at work, <[email protected]> in private!
"Unix gives you enough rope to shoot yourself in the foot."
http://www.zytor.com/~hpa/puzzle.txt <[email protected]>
"H. Peter Anvin" wrote:
>
> Followup to: <[email protected]>
> By author: george anzinger <[email protected]>
> In newsgroup: linux.dev.kernel
> >
> > One of the nasty problems, especially with machines such as yours (i.e.
> > lap tops), is the fact that TSC is NOT clocked at a fixed rate. It is
> > affected by throttling (reduced in 12.5% increments) and by power
> > management.
>
> If the TSC is affected by HLT, throttling, or C2 power management, the
> TSC is broken (as it is on Cyrix chips, for example.) The TSC usually
> *is* affected by C3 power management, but the OS should be aware of
> C3.
>
> -hpa
Gosh I would LIKE to think this is true. Could you give a reference? I
believe Andrew Grover thinks that what I have stated is true. If I am
wrong, it will make the high-res-timers MUCH more acceptable as the TSC
overhead is MUCH lower that the ACPI pm timer.
Do I have this right Andrew?
--
George [email protected]
High-res-timers: http://sourceforge.net/projects/high-res-timers/
Real time sched: http://sourceforge.net/projects/rtsched/
> If the TSC is affected by HLT, throttling, or C2 power management, the
> TSC is broken (as it is on Cyrix chips, for example.) The TSC usually
> *is* affected by C3 power management, but the OS should be aware of
> C3.
ACPI is irrelevant to the machines in question. This is all APM era stuff
and yes some people change the base clock, not stpclk. The cyrix hlt is
already handled
george anzinger wrote:
> "H. Peter Anvin" wrote:
>
>>Followup to: <[email protected]>
>>By author: george anzinger <[email protected]>
>>In newsgroup: linux.dev.kernel
>>
>>>One of the nasty problems, especially with machines such as yours (i.e.
>>>lap tops), is the fact that TSC is NOT clocked at a fixed rate. It is
>>>affected by throttling (reduced in 12.5% increments) and by power
>>>management.
>>>
>>If the TSC is affected by HLT, throttling, or C2 power management, the
>>TSC is broken (as it is on Cyrix chips, for example.) The TSC usually
>>*is* affected by C3 power management, but the OS should be aware of
>>C3.
>>
> Gosh I would LIKE to think this is true. Could you give a reference? I
> believe Andrew Grover thinks that what I have stated is true. If I am
> wrong, it will make the high-res-timers MUCH more acceptable as the TSC
> overhead is MUCH lower that the ACPI pm timer.
>
> Do I have this right Andrew?
>
What I have defined above is what Linux considers a "working" TSC. I
belive this to be functional on Intel, AMD and Transmeta CPUs.
However, there are some systems -- especially using older chips with less
PLL delays -- which change CLKIN on the fly.
-hpa
Hi!
> > lap tops), is the fact that TSC is NOT clocked at a fixed rate. It is
> > affected by throttling (reduced in 12.5% increments) and by power
> > management.
>
> If the TSC is affected by HLT, throttling, or C2 power management, the
> TSC is broken (as it is on Cyrix chips, for example.) The TSC usually
> *is* affected by C3 power management, but the OS should be aware of
> C3.
Add thinkpad 560X (pentium/MMX) and toshiba 4030cdt (celeron) to your
blacklist, then. I believe that by your definition *many* sstems are
broken.
Pavel
--
Philips Velo 1: 1"x4"x8", 300gram, 60, 12MB, 40bogomips, linux, mutt,
details at http://atrey.karlin.mff.cuni.cz/~pavel/velo/index.html.
Hi!
> My system looses about 8 seconds every 20 minutes. This is reported
> by ntp and verified by comparing 'date' to 'hwclock --show' and a wall
> clock.
>
> My system is a x86 Dell laptop with HZ=1024.
...
> Since I have defined HZ to be 1024, I miss lots of timer interrupts.
> However, since the the processor spends 18ms at a time in SMM (System
> Mangement Mode), then even the stock 10ms timer tick will sometimes
> miss a tick. Thus the problem applies to non-hacked kernels also.
Kernel cancompensate for one lost tick, AFAIR, so go back to HZ=100.
Pavel
--
Philips Velo 1: 1"x4"x8", 300gram, 60, 12MB, 40bogomips, linux, mutt,
details at http://atrey.karlin.mff.cuni.cz/~pavel/velo/index.html.
Pavel Machek wrote:
> Hi!
>
>
>>>lap tops), is the fact that TSC is NOT clocked at a fixed rate. It is
>>>affected by throttling (reduced in 12.5% increments) and by power
>>>management.
>>>
>>If the TSC is affected by HLT, throttling, or C2 power management, the
>>TSC is broken (as it is on Cyrix chips, for example.) The TSC usually
>>*is* affected by C3 power management, but the OS should be aware of
>>C3.
>>
>
> Add thinkpad 560X (pentium/MMX) and toshiba 4030cdt (celeron) to your
> blacklist, then. I believe that by your definition *many* sstems are
> broken.
> Pavel
It's sad but true. Unfortunately the TSC seems to be considered a
low-priority operation. It's for systems like the above you need the
"no-tsc" option.
-hpa