My old 486 test box is losing time at an alarming rate
when running 2.6.0-test kernels. It loses almost 2 minutes
per hour, less if it sits idle. This problem does not
occur when it's running a 2.4 kernel.
There's nothing noteworthy in dmesg.
This has been going on since at least the 2.5.7x kernels,
and possible also the 2.5.6x kernels. I strongly suspect
a bug in the time-keeping changes in late 2.5 kernels.
The 486 has no TSC, and I don't have an NTP server to
keep my machines' times in sync.
/Mikael
On Tue, 2003-07-29 at 10:34, Mikael Pettersson wrote:
> My old 486 test box is losing time at an alarming rate
> when running 2.6.0-test kernels. It loses almost 2 minutes
> per hour, less if it sits idle. This problem does not
> occur when it's running a 2.4 kernel.
>
> There's nothing noteworthy in dmesg.
>
> This has been going on since at least the 2.5.7x kernels,
> and possible also the 2.5.6x kernels. I strongly suspect
> a bug in the time-keeping changes in late 2.5 kernels.
> The 486 has no TSC, and I don't have an NTP server to
> keep my machines' times in sync.
Hmm. Sounds like you're loosing interrupts. This can happen due to
poorly behaving drivers (disabling interrupts for too long), or odd
hardware. The change from HZ=100 to HZ=1000 probably made this more
visible on your box, so could you try setting HZ back to 100 and see if
that helps (you may still lose time, but at a much slower rate).
Also what drivers are you running with?
thanks
-john
> My old 486 test box is losing time at an alarming rate
try changing the line in "include/asm-i386/param.h":
# define HZ 1000
to
# define HZ 100
and see if the problem remains after recompiling.
Regards,
Sean
On Tue, Jul 29, 2003 at 07:34:43PM +0200, Mikael Pettersson wrote:
> My old 486 test box is losing time at an alarming rate
> when running 2.6.0-test kernels. It loses almost 2 minutes
> per hour, less if it sits idle. This problem does not
> occur when it's running a 2.4 kernel.
I recently saw a patch to compensate for the 1000/1024 ratio,
HZ being 1000 nowadays. I'm not sure if it is already in test2.
It wasn't in test1. There is a similar compensation for 100/128
in case HZ == 100. Search for HZ == 100 in kernel/timer.c in
second_overflow()
I'm not sure what this fix does does but 2 minutes per hour is close to
1000/1024 ratio.
--
Frank
On 29 Jul 2003 11:59:06 -0700, john stultz wrote:
>On Tue, 2003-07-29 at 10:34, Mikael Pettersson wrote:
>> My old 486 test box is losing time at an alarming rate
>> when running 2.6.0-test kernels. It loses almost 2 minutes
>> per hour, less if it sits idle. This problem does not
>> occur when it's running a 2.4 kernel.
>>
>> There's nothing noteworthy in dmesg.
>>
>> This has been going on since at least the 2.5.7x kernels,
>> and possible also the 2.5.6x kernels. I strongly suspect
>> a bug in the time-keeping changes in late 2.5 kernels.
>> The 486 has no TSC, and I don't have an NTP server to
>> keep my machines' times in sync.
>
>Hmm. Sounds like you're loosing interrupts. This can happen due to
>poorly behaving drivers (disabling interrupts for too long), or odd
>hardware. The change from HZ=100 to HZ=1000 probably made this more
>visible on your box, so could you try setting HZ back to 100 and see if
>that helps (you may still lose time, but at a much slower rate).
Yep, reducing HZ to 100 in param.h eliminated the time losses.
>Also what drivers are you running with?
IDE, no chipset driver, NE2000 ISA NIC (no traffic during the
tests), AT keyboard + PS/2 mouse (unused during the tests).
The only things I can think of are:
- a 486 simply cannot keep up with HZ=1000
- the plain IDE driver w/o chipset & DMA support somehow
is much worse in 2.5/2.6 than in 2.4
- the "no TSC" time-keeping code is broken
/Mikael
On Wed, 2003-07-30 at 12:19, Mikael Pettersson wrote:
> On 29 Jul 2003 11:59:06 -0700, john stultz wrote:
> >Hmm. Sounds like you're loosing interrupts. This can happen due to
> >poorly behaving drivers (disabling interrupts for too long), or odd
> >hardware. The change from HZ=100 to HZ=1000 probably made this more
> >visible on your box, so could you try setting HZ back to 100 and see if
> >that helps (you may still lose time, but at a much slower rate).
>
> Yep, reducing HZ to 100 in param.h eliminated the time losses.
Ok, that's what I figured.
> >Also what drivers are you running with?
>
> IDE, no chipset driver, NE2000 ISA NIC (no traffic during the
> tests), AT keyboard + PS/2 mouse (unused during the tests).
>
> The only things I can think of are:
> - a 486 simply cannot keep up with HZ=1000
> - the plain IDE driver w/o chipset & DMA support somehow
> is much worse in 2.5/2.6 than in 2.4
> - the "no TSC" time-keeping code is broken
Well, I suspect its just the first. If you're not generating interrupts
then I'm doubtful the IDE driver is at fault (although I'd believe it if
you were losing time under load). Also the PIT based time source is
pretty simple and hasn't functionally changed much (well, it has been
moved around a bit).
It may be the timer interrupt has grown in cost since the argument to
change HZ to 1000 was made. Although using the PIT there isn't much we
do from a time of day perspective. If I can find a second, I'll see if I
can compare interrupt overhead between 2.4 and 2.5. But I'd imagine the
box would barely be usable if we're wasting all our time handling timer
interrupts (is it usable??).
thanks
-john
On 30 Jul 2003 13:08:44 -0700, john stultz <[email protected]> wrote:
>On Wed, 2003-07-30 at 12:19, Mikael Pettersson wrote:
>> On 29 Jul 2003 11:59:06 -0700, john stultz wrote:
>> >Hmm. Sounds like you're loosing interrupts. This can happen due to
>> >poorly behaving drivers (disabling interrupts for too long), or odd
>> >hardware. The change from HZ=100 to HZ=1000 probably made this more
>> >visible on your box, so could you try setting HZ back to 100 and see if
>> >that helps (you may still lose time, but at a much slower rate).
>>
>> Yep, reducing HZ to 100 in param.h eliminated the time losses.
>
>Ok, that's what I figured.
>
>> >Also what drivers are you running with?
>>
>> IDE, no chipset driver, NE2000 ISA NIC (no traffic during the
>> tests), AT keyboard + PS/2 mouse (unused during the tests).
>>
>> The only things I can think of are:
>> - a 486 simply cannot keep up with HZ=1000
>> - the plain IDE driver w/o chipset & DMA support somehow
>> is much worse in 2.5/2.6 than in 2.4
>> - the "no TSC" time-keeping code is broken
>
>Well, I suspect its just the first. If you're not generating interrupts
>then I'm doubtful the IDE driver is at fault (although I'd believe it if
>you were losing time under load). Also the PIT based time source is
>pretty simple and hasn't functionally changed much (well, it has been
>moved around a bit).
>
>It may be the timer interrupt has grown in cost since the argument to
>change HZ to 1000 was made. Although using the PIT there isn't much we
>do from a time of day perspective. If I can find a second, I'll see if I
>can compare interrupt overhead between 2.4 and 2.5. But I'd imagine the
>box would barely be usable if we're wasting all our time handling timer
>interrupts (is it usable??).
Well, the test the box was running (recompile 2.4.22-pre) generates
a lot of disk traffic, including swapping, since the box has so little
RAM (only 28M). So IDE interrupts are frequent and the box is both
CPU and I/O bound. I can still log in to it, type shell commands and
so on, but starting emacs would be a bad idea...
To test the "486 can't cope with HZ=1000" thesis I tried a RedHat
2.4.18-27.8 kernel which has a CONFIG_HZ option. Using 2.4.18-27.8
with CONFIG_HZ=1000, the box still lost time during the "recompile
2.4.22-pre" test, but only about 15 seconds per hour instead of 2
minutes per hour as it does with 2.6-test.
/Mikael
On Wed, 2003-07-30 at 15:52, Mikael Pettersson wrote:
> On 30 Jul 2003 13:08:44 -0700, john stultz <[email protected]> wrote:
> >Well, I suspect its just the first. If you're not generating interrupts
> >then I'm doubtful the IDE driver is at fault (although I'd believe it if
> >you were losing time under load). Also the PIT based time source is
> >pretty simple and hasn't functionally changed much (well, it has been
> >moved around a bit).
> >
> >It may be the timer interrupt has grown in cost since the argument to
> >change HZ to 1000 was made. Although using the PIT there isn't much we
> >do from a time of day perspective. If I can find a second, I'll see if I
> >can compare interrupt overhead between 2.4 and 2.5. But I'd imagine the
> >box would barely be usable if we're wasting all our time handling timer
> >interrupts (is it usable??).
>
> Well, the test the box was running (recompile 2.4.22-pre) generates
> a lot of disk traffic, including swapping, since the box has so little
> RAM (only 28M). So IDE interrupts are frequent and the box is both
> CPU and I/O bound. I can still log in to it, type shell commands and
> so on, but starting emacs would be a bad idea...
Oh, if you're compiling then IDE is probably contributing to the
problem. However, I thought you said you lost time when idling as well?
> To test the "486 can't cope with HZ=1000" thesis I tried a RedHat
> 2.4.18-27.8 kernel which has a CONFIG_HZ option. Using 2.4.18-27.8
> with CONFIG_HZ=1000, the box still lost time during the "recompile
> 2.4.22-pre" test, but only about 15 seconds per hour instead of 2
> minutes per hour as it does with 2.6-test.
Ah, good call testing 2.4 w/ HZ=1000. Yea, as for the difference between
2.4 and 2.6-test, I'm guessing something in do_timer_interrupt_hook()
has grown. Booting a 586+ system w/ "clock=pit" and instrumenting that
function w/ rdtsc calls would probably show what has slowed down.
Regardless, as you've demonstrated, it seems 486s just can't keep up w/
HZ=1000. Maybe we need to look into some sort of processor specific HZ
config option?
thanks
-john
On Wed, 2003-07-30 16:16:59 -0700, john stultz <[email protected]>
wrote in message <[email protected]>:
> On Wed, 2003-07-30 at 15:52, Mikael Pettersson wrote:
> > On 30 Jul 2003 13:08:44 -0700, john stultz <[email protected]> wrote:
> Regardless, as you've demonstrated, it seems 486s just can't keep up w/
> HZ=1000. Maybe we need to look into some sort of processor specific HZ
> config option?
I'd like to see that. Eventually, I'll post some patch to do that, but
first, I need to make Debian to support i386 again (since libstdc++5 in
unstable is now compiled for i486, some apps (apt-get is one of
those...) will SIGILLed to death). I do have some of those boxes (Am386
with SIMM RAM, i386SX-16 with SIPP modules + single ICs :)
MfG, JBG
--
Jan-Benedict Glaw [email protected] . +49-172-7608481
"Eine Freie Meinung in einem Freien Kopf | Gegen Zensur | Gegen Krieg
fuer einen Freien Staat voll Freier B?rger" | im Internet! | im Irak!
ret = do_actions((curr | FREE_SPEECH) & ~(IRAQ_WAR_2 | DRM | TCPA));