2002-08-25 10:50:48

by Volker Kuhlmann

[permalink] [raw]
Subject: kernel losing time

Gidday,

I am stuck with a kernel problem someone can hopefully shed some light
on. It's also a bug report.

Symptoms: at some stage the kernel is unable to keep time. The time
(output of date) slows right down to about 1/5th speed, or much less
with disk activity. Terminal bell is much longer duration, and lower
pitch. All timings everywhere are causing trouble, e.g. screen blanker
activating all the time. Happens with both reiserfs and ext2. It's
impossible to fix, requires a reboot. There seems to be no data
corruption on disk. The machine ismuch more sluggish, at a wild guess,
killing time in an interrupt routine and missing the ticker interrupts.

All 2.4 kernels seem to be affected, tried 2.4.10, 16, 18 (all SuSE
versions) and vanilla 2.4.19. 2.2.19 is fine.

Happens with and without running X, and also without the 8139too driver
being loaded.

Hardware: Pentium-233 MMX, Octek mainboard model Rhine 12+, VIA
chipset, by lspci:

00:00.0 Host bridge: VIA Technologies, Inc. VT82C585VP [Apollo VP1/VPX] (rev 23)
00:07.0 ISA bridge: VIA Technologies, Inc. VT82C586/A/B PCI-to-ISA [Apollo VP] (rev 27)
00:07.1 IDE interface: VIA Technologies, Inc. Bus Master IDE (rev 06)

hda: Seagate 4G, ST34321A
hdc: AOpen 52x cdrom, no difference if this is hdb

Turning disk dma and unmaskirq on or off with hdparm makes little to no
difference. booting with ide=nodma apm=off acpi=off doesn't help.

Another peculiar observation: hdparm -d1 /dev/hda, hdparm -t gives
0.98M/s (seems very low even for this machine), with -d0 it gives
3.7M/s. That is, turning dma off makes the disk almost 4 times as
fast(!!). This for vanilla 2.4.19.

Volker

--
Volker Kuhlmann is possibly list0570 with the domain in header
http://volker.orcon.net.nz/ Please do not CC list postings to me.


2002-08-25 10:57:14

by Thunder from the hill

[permalink] [raw]
Subject: Re: kernel losing time

Hi,

On Sun, 25 Aug 2002, Volker Kuhlmann wrote:
> I am stuck with a kernel problem someone can hopefully shed some light
> on. It's also a bug report.

And it's already known. It's VIA chipset which obviously can't read the
clock ;-) Chipset kicking wrong interrupts, timer can't help it.

Thunder
--
--./../...-/. -.--/---/..-/.-./..././.-../..-. .---/..-/.../- .-
--/../-./..-/-/./--..-- ../.----./.-../.-.. --./../...-/. -.--/---/..-
.- -/---/--/---/.-./.-./---/.--/.-.-.-
--./.-/-.../.-./.././.-../.-.-.-

2002-08-25 11:42:52

by Volker Kuhlmann

[permalink] [raw]
Subject: Re: kernel losing time

On Sun 25 Aug 2002 23:01:19 NZST +1200, Thunder from the hill wrote:

> And it's already known. It's VIA chipset which obviously can't read the
> clock ;-) Chipset kicking wrong interrupts, timer can't help it.

Ah, thanks! I did use google, but it's difficult to find the correct
words and numbers to enter...

Where can I find more information about it? Mainly, is this a problem
which can be solved at all, or do I need to get rid of the mobo
altogether?

Thanks,

Volker

--
Volker Kuhlmann is possibly list0570 with the domain in header
http://volker.orcon.net.nz/ Please do not CC list postings to me.

2002-08-25 21:51:00

by erik

[permalink] [raw]
Subject: Re: kernel losing time

On Sun, Aug 25, 2002 at 05:01:19AM -0600, Thunder from the hill wrote:
> Hi,
>
> On Sun, 25 Aug 2002, Volker Kuhlmann wrote:
> > I am stuck with a kernel problem someone can hopefully shed some light
> > on. It's also a bug report.
>
> And it's already known. It's VIA chipset which obviously can't read the
> clock ;-) Chipset kicking wrong interrupts, timer can't help it.
>

Would this explain my computer losing 2-3 minutes of time while
ripping a cd? Normally it's dead on (w/ ntpd running to guarantee
that) but while ripping or burning it loses so badly ntpd can't keep
up.

Gig athlon/VIA chipset w/ burner on a Promise PCI card.


Erik

2002-08-25 21:58:00

by Thunder from the hill

[permalink] [raw]
Subject: Re: kernel losing time

Hi,

On Sun, 25 Aug 2002 [email protected] wrote:
> Would this explain my computer losing 2-3 minutes of time while
> ripping a cd? Normally it's dead on (w/ ntpd running to guarantee
> that) but while ripping or burning it loses so badly ntpd can't keep
> up.

We call it 'heavy interrupt load'. The more errors can happen, the more
errors do happen. (And by the way, I've already seen VIA chipsets jump off
by hours.)

Thunder
--
--./../...-/. -.--/---/..-/.-./..././.-../..-. .---/..-/.../- .-
--/../-./..-/-/./--..-- ../.----./.-../.-.. --./../...-/. -.--/---/..-
.- -/---/--/---/.-./.-./---/.--/.-.-.-
--./.-/-.../.-./.././.-../.-.-.-

2002-08-25 23:59:28

by Alan

[permalink] [raw]
Subject: Re: kernel losing time

On Sun, 2002-08-25 at 22:55, [email protected] wrote:
> Would this explain my computer losing 2-3 minutes of time while
> ripping a cd? Normally it's dead on (w/ ntpd running to guarantee
> that) but while ripping or burning it loses so badly ntpd can't keep
> up.

Could be - does hdparm -u1 on that device fix it ?

2002-08-26 01:09:18

by Volker Kuhlmann

[permalink] [raw]
Subject: Re: kernel losing time

Ok, Linux doesn't work with with a VIA 82C586 etc chipset. I plonked a
16 bit ISA multi-I/O card with IDE interface into the box, some kind of
winbond chip. Both integrated IDE interfaces in the BIOS are disabled.
The problem was worst than ever. How does this happen?

Volker

--
Volker Kuhlmann is possibly list0570 with the domain in header
http://volker.orcon.net.nz/ Please do not CC list postings to me.

2002-08-26 03:00:56

by erik

[permalink] [raw]
Subject: Re: kernel losing time

On Mon, Aug 26, 2002 at 01:05:14AM +0100, Alan Cox wrote:
> On Sun, 2002-08-25 at 22:55, [email protected] wrote:
> > Would this explain my computer losing 2-3 minutes of time while
> > ripping a cd? Normally it's dead on (w/ ntpd running to guarantee
> > that) but while ripping or burning it loses so badly ntpd can't keep
> > up.
>
> Could be - does hdparm -u1 on that device fix it ?

nope. I still lost a minute or so ripping a single disc. This is
using 2.4.19-rc3, though I've not seen a kernel where it /didn't/
happen.


Erik


2002-08-26 07:26:59

by Vojtech Pavlik

[permalink] [raw]
Subject: Re: kernel losing time

On Sun, Aug 25, 2002 at 04:02:07PM -0600, Thunder from the hill wrote:
> Hi,
>
> On Sun, 25 Aug 2002 [email protected] wrote:
> > Would this explain my computer losing 2-3 minutes of time while
> > ripping a cd? Normally it's dead on (w/ ntpd running to guarantee
> > that) but while ripping or burning it loses so badly ntpd can't keep
> > up.
>
> We call it 'heavy interrupt load'. The more errors can happen, the more
> errors do happen. (And by the way, I've already seen VIA chipsets jump off
> by hours.)

It's always the same amount - about four hours - there is an underflow to
negative values with unsigned ints.

--
Vojtech Pavlik
SuSE Labs

2002-08-26 10:09:31

by Alan

[permalink] [raw]
Subject: Re: kernel losing time

On Mon, 2002-08-26 at 02:13, Volker Kuhlmann wrote:
> Ok, Linux doesn't work with with a VIA 82C586 etc chipset. I plonked a
> 16 bit ISA multi-I/O card with IDE interface into the box, some kind of
> winbond chip. Both integrated IDE interfaces in the BIOS are disabled.
> The problem was worst than ever. How does this happen?

ISA multi I/O without hdparm -u has sometimes done this kind of thing
(since its slow and PIO). Thats seperate to the weird jumps we have seen
from the VIA clock

2002-08-26 19:33:38

by Richard Z

[permalink] [raw]
Subject: Re: kernel losing time

On Mon, Aug 26, 2002 at 11:15:36AM +0100, Alan Cox wrote:

> ISA multi I/O without hdparm -u has sometimes done this kind of thing
> (since its slow and PIO).

actually it still does that in 2.4.18 at least, even with
hdparm -u. The effect is not dramatic though, less than
a second/day with normal activity. I've only noticed it
while debugging the genrtc driver.

Richard

2002-08-28 01:28:34

by Volker Kuhlmann

[permalink] [raw]
Subject: Re: kernel losing time

Nobody seems to have a solution and I figure that these chipsets (VIA
Technologies, Inc. VT82C585VP + VT82C586/A/B) are buggy. However,
kernel 2.2.19 works fine, 2.4 doesn't at all.
And -d1 -u1 gives 1.1MB/s, -d0 -u1 gives 3.5M/s, which seems funny.

Can someone please say where exactly the problem is, and which part of
the kernel deals with it? The IDE driver? The timer? How would I go
about fixing it again?

These machines are supposed to make good firewalls!

Thanks all,

Volker

--
Volker Kuhlmann is possibly list0570 with the domain in header
http://volker.orcon.net.nz/ Please do not CC list postings to me.

2002-08-28 16:20:35

by George Anzinger

[permalink] [raw]
Subject: Re: kernel losing time

This bit of code was in 2.4.19 in
.../arch/i386/kernel/time.c

The suggestion (from the code) is that the PIT does not
reset to the proper value and that reprogramming it fixes
the problem. At the same time, this being in the interrupt
handler, it does generate at least one interrupt at or after
it fails to do the right thing.

Notes: 1.) This fix, each time it reprograms the PIT, will
loose (leak) a bit of time.
2.) The three I/O instructions to read the latch are
slow, AND this is done each interrupt.
3.) This version does not have a way to eliminate
the code on machines that don't have the problem.
4.) I reserve judgment on the comment that the spin
lock is not needed. It, I think, assumes that the PIT is
only accessed from the timer code, but this is not really
true (it ought to be true but is not :()


#if 0 /*
* SUBTLE: this is not necessary from here because
it's implicit in the
* write xtime_lock.
*/
spin_lock(&i8253_lock);
#endif
outb_p(0x00, 0x43); /* latch the count ASAP */

count = inb_p(0x40); /* read the latched count */
count |= inb(0x40) << 8;

/* VIA686a test code... reset the latch if count > max */
if (count > LATCH-1) {
static int last_whine;
outb_p(0x34, 0x43);
outb_p(LATCH & 0xff, 0x40);
outb(LATCH >> 8, 0x40);
count = LATCH - 1;
if(time_after(jiffies, last_whine))
{
printk(KERN_WARNING "probable hardware bug: clock timer
configuration lost - probably a VIA686a.\n");
printk(KERN_WARNING "probable hardware bug: restoring
chip configuration.\n");
last_whine = jiffies + HZ;
}
}

#if 0
spin_unlock(&i8253_lock);
#endif

--
George Anzinger [email protected]
High-res-timers:
http://sourceforge.net/projects/high-res-timers/
Preemption patch:
http://www.kernel.org/pub/linux/kernel/people/rml