2002-11-02 06:30:36

by Jim Paris

[permalink] [raw]
Subject: time() glitch on 2.4.18 at 177 days uptime?

I'm running Linux 2.4.18 on an Athlon (i386).

My uptime is 182 days. About five days ago, I started noticing
strange date effects. It may be hardware related, but I'm also
suspecting that it may be due to the long uptime, and so I'm hesitant
to reboot, in case there is a bug here that needs to get fixed.

The problem is with time(). Every second, for approximately 1.1ms,
time() reports a value that is about 2^32 microseconds (4295 seconds,
or about an hour and a quarter) in the future. The glitches always
occur between a change of seconds. Look at this:

$ while true; do paste <( cat /proc/uptime ) <( date ) ; done | grep -A 1 -B 1 ' 02:' | head -11
15765463.31 15455589.84 Sat Nov 2 00:59:16 EST 2002
15765463.32 15455589.84 Sat Nov 2 02:10:51 EST 2002
15765463.33 15455589.84 Sat Nov 2 00:59:17 EST 2002
--
15765465.31 15455589.84 Sat Nov 2 00:59:18 EST 2002
15765465.32 15455589.84 Sat Nov 2 02:10:53 EST 2002
15765465.34 15455589.84 Sat Nov 2 00:59:19 EST 2002
--
15765466.30 15455589.84 Sat Nov 2 00:59:19 EST 2002
15765466.32 15455589.84 Sat Nov 2 02:10:54 EST 2002
15765466.33 15455589.84 Sat Nov 2 00:59:20 EST 2002

The first two values on each line are from /proc/uptime, and the rest
is of course from "date". This bash script runs a bit slow, which is
why it missed the glitch between :17 and :18, but by watching time()s
more frequently, I can definitely see that they're there. I don't
know for certain whether this happens at all times during the day, but
I believe it does. I can do some more logging to figure that out if
necessary.

I've been running ntpdate (and therefore calling do_adjtime()) every
half hour since the system booted, in case that might affect
something.

This glitch does _not_ occur on a 2.2.? system on some x86 processor
with about 256 days uptime (I unfortunately don't know any more
details than that; I only have very limited access to that machine).
I also don't have access to any other 2.4 systems with an uptime of
greater than about 80 days.

This is not a glibc bug; the erroneous values are being returned by
the kernel. Perhaps someone can help me track this down. I've put my
kernel config and relevant info from /proc at

http://neurosis.mit.edu/~jim/time-glitch/

(and would appreciate a CC of any replies)

-jim


2002-11-03 19:25:44

by Jim Paris

[permalink] [raw]
Subject: Re: time() glitch on 2.4.18: solved

> The problem is with time(). Every second, for approximately 1.1ms,
> time() reports a value that is about 2^32 microseconds (4295 seconds,
> or about an hour and a quarter) in the future. The glitches always
> occur between a change of seconds.

I finally found the problem.

My i8253 was out of phase. The 16-bit timer value is read in two
8-bit reads from the 8253 in arch/i386/kernel/time.c, and this value
should be between 0 and LATCH-1. My kernel was getting the MSB and
LSB reversed, and so the read values were usually too high, and
delay_at_last_interrupt ended up negative. This caused some small
random negative amount to be subtracted from usecs during
do_gettimeofday, and so my clock was always making small jumps
backwards, and occasionally jumping forward 2^32 when usecs was small.

To fix it, I just read a single byte from port 0x40. If I do it
again, the problem returns (and I've tested that this is the case on
multiple systems, so it's not just a problem with my 8253).

After 180 days of uptime, it's not surprising that there would have
been one read of the port that failed, triggering the problem, so I
think the kernel should detect and fix this. We could just check for
it: if the returned count > LATCH, read an extra byte from port 0x40,
as I did. Or, use the method in do_slow_gettimeoffset, which
basically resets the 8253's counter if count > LATCH.

Any comments?

-jim

2002-11-03 20:02:31

by Alan

[permalink] [raw]
Subject: Re: time() glitch on 2.4.18: solved

On Sun, 2002-11-03 at 19:32, Jim Paris wrote:
> After 180 days of uptime, it's not surprising that there would have
> been one read of the port that failed, triggering the problem, so I
> think the kernel should detect and fix this. We could just check for
> it: if the returned count > LATCH, read an extra byte from port 0x40,
> as I did. Or, use the method in do_slow_gettimeoffset, which
> basically resets the 8253's counter if count > LATCH.

We have locking that ought to get that all correct nowdays but I've seen
at least one bios generate half a read in SMM mode

> Any comments?

Have a play with it, if your idea works when you deliberately disturb it
then send in a patch

2002-11-05 16:23:46

by Jim Paris

[permalink] [raw]
Subject: [PATCH] Re: time() glitch on 2.4.18: solved

> > Any comments?
>
> Have a play with it, if your idea works when you deliberately disturb it
> then send in a patch

This works well.

-jim

diff -urN linux-2.4.18/arch/i386/kernel/time.c linux-2.4.18-jim/arch/i386/kernel/time.c
--- linux-2.4.18/arch/i386/kernel/time.c Fri Mar 15 18:28:53 2002
+++ linux-2.4.18-jim/arch/i386/kernel/time.c Tue Nov 5 11:22:02 2002
@@ -501,6 +501,16 @@

count = inb_p(0x40); /* read the latched count */
count |= inb(0x40) << 8;
+
+ /* Any unpaired read will cause the above to swap MSB/LSB
+ forever. Try to detect this and reset the counter. */
+ if (count > LATCH) {
+ outb_p(0x34, 0x43);
+ outb_p(LATCH & 0xff, 0x40);
+ outb(LATCH >> 8, 0x40);
+ count = LATCH - 1;
+ }
+
spin_unlock(&i8253_lock);

count = ((LATCH-1) - count) * TICK_SIZE;

2002-11-05 16:48:18

by George Anzinger

[permalink] [raw]
Subject: Re: [PATCH] Re: time() glitch on 2.4.18: solved

Jim Paris wrote:
>
> > > Any comments?
> >
> > Have a play with it, if your idea works when you deliberately disturb it
> > then send in a patch
>
> This works well.

But it does introduce a hiccup in time. One could just do
an odd read IF that is all that is wrong. Your code could
also be correcting for other ills...

-g
>
> -jim
>
> diff -urN linux-2.4.18/arch/i386/kernel/time.c linux-2.4.18-jim/arch/i386/kernel/time.c
> --- linux-2.4.18/arch/i386/kernel/time.c Fri Mar 15 18:28:53 2002
> +++ linux-2.4.18-jim/arch/i386/kernel/time.c Tue Nov 5 11:22:02 2002
> @@ -501,6 +501,16 @@
>
> count = inb_p(0x40); /* read the latched count */
> count |= inb(0x40) << 8;
> +
> + /* Any unpaired read will cause the above to swap MSB/LSB
> + forever. Try to detect this and reset the counter. */
> + if (count > LATCH) {
> + outb_p(0x34, 0x43);
> + outb_p(LATCH & 0xff, 0x40);
> + outb(LATCH >> 8, 0x40);
> + count = LATCH - 1;
> + }
> +
> spin_unlock(&i8253_lock);
>
> count = ((LATCH-1) - count) * TICK_SIZE;
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

--
George Anzinger [email protected]
High-res-timers:
http://sourceforge.net/projects/high-res-timers/
Preemption patch:
http://www.kernel.org/pub/linux/kernel/people/rml

2002-11-05 17:04:13

by Willy Tarreau

[permalink] [raw]
Subject: Re: [PATCH] Re: time() glitch on 2.4.18: solved

On Tue, Nov 05, 2002 at 11:30:20AM -0500, Jim Paris wrote:
> + if (count > LATCH) {

may be (count >= LATCH) would be even better ?

Willy

2002-11-05 17:04:16

by Alan

[permalink] [raw]
Subject: Re: [PATCH] Re: time() glitch on 2.4.18: solved

On Tue, 2002-11-05 at 16:30, Jim Paris wrote:
> This works well.
>
> -jim
>
> diff -urN linux-2.4.18/arch/i386/kernel/time.c linux-2.4.18-jim/arch/i386/kernel/time.c
> --- linux-2.4.18/arch/i386/kernel/time.c Fri Mar 15 18:28:53 2002
> +++ linux-2.4.18-jim/arch/i386/kernel/time.c Tue Nov 5 11:22:02 2002
> @@ -501,6 +501,16 @@
>
> count = inb_p(0x40); /* read the latched count */
> count |= inb(0x40) << 8;
> +
> + /* Any unpaired read will cause the above to swap MSB/LSB
> + forever. Try to detect this and reset the counter. */
> + if (count > LATCH) {
> + outb_p(0x34, 0x43);
> + outb_p(LATCH & 0xff, 0x40);
> + outb(LATCH >> 8, 0x40);
> + count = LATCH - 1;
> + }
> +
> spin_unlock(&i8253_lock);
>
> count = ((LATCH-1) - count) * TICK_SIZE;

Add a printk to address the fact we want to know about this and I think
thats enough to meet George's objection ?

2002-11-05 17:09:33

by Willy Tarreau

[permalink] [raw]
Subject: Re: [PATCH] Re: time() glitch on 2.4.18: solved

BTW, why not trying to resync, with something like :

if (count >= LATCH)
count = (count >> 8) | inb(0x40) << 8;

Cheers,
Willy

> count = inb_p(0x40); /* read the latched count */
> count |= inb(0x40) << 8;
> +
> + /* Any unpaired read will cause the above to swap MSB/LSB
> + forever. Try to detect this and reset the counter. */
> + if (count > LATCH) {
> + outb_p(0x34, 0x43);
> + outb_p(LATCH & 0xff, 0x40);
> + outb(LATCH >> 8, 0x40);
> + count = LATCH - 1;
> + }
> +
> spin_unlock(&i8253_lock);
>
> count = ((LATCH-1) - count) * TICK_SIZE;
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

2002-11-05 17:36:13

by Jim Paris

[permalink] [raw]
Subject: Re: [PATCH] Re: time() glitch on 2.4.18: solved

> BTW, why not trying to resync, with something like :
>
> if (count >= LATCH)
> count = (count >> 8) | inb(0x40) << 8;

That's best if we really are out of sync, but it's hard to tell. It
could be that the 8253's latch value got clobbered somehow, in which
case we definitely want to fix that or our timer interrupts will come
out at the wrong rate. We also still need to double-check that
count < LATCH after your code.

Unless we assume that an unpaired read is more common than having the
latch value clobbered in some other way, then I think just resetting
the counter should be okay. Since none of this _should_ ever happen
(but did on my system, grr), it's probably not worth making it too
complicated just to try to figure out what went wrong.

Updated patch with the printk and corrected conditional is below.

-jim

diff -urN linux-2.4.18/arch/i386/kernel/time.c linux-2.4.18-jim/arch/i386/kernel/time.c
--- linux-2.4.18/arch/i386/kernel/time.c Fri Mar 15 18:28:53 2002
+++ linux-2.4.18-jim/arch/i386/kernel/time.c Tue Nov 5 12:39:38 2002
@@ -501,6 +501,18 @@

count = inb_p(0x40); /* read the latched count */
count |= inb(0x40) << 8;
+
+ /* Any unpaired read will cause the above to swap MSB/LSB
+ forever. Try to detect this and reset the counter. */
+ if (count >= LATCH) {
+ printk(KERN_WARNING
+ "i8253 count too high! resetting..\n");
+ outb_p(0x34, 0x43);
+ outb_p(LATCH & 0xff, 0x40);
+ outb(LATCH >> 8, 0x40);
+ count = LATCH - 1;
+ }
+
spin_unlock(&i8253_lock);

count = ((LATCH-1) - count) * TICK_SIZE;

2002-11-05 17:47:51

by Alan

[permalink] [raw]
Subject: Re: [PATCH] Re: time() glitch on 2.4.18: solved

On Tue, 2002-11-05 at 17:10, Willy Tarreau wrote:
> On Tue, Nov 05, 2002 at 11:30:20AM -0500, Jim Paris wrote:
> > + if (count > LATCH) {
>
> may be (count >= LATCH) would be even better ?

Some PIT clones seem to hold the LATCH value momentarily judging by
other things that were triggered wrongly by >=

2002-11-05 18:10:03

by Alan

[permalink] [raw]
Subject: Re: [PATCH] Re: time() glitch on 2.4.18: solved

On Tue, 2002-11-05 at 18:02, Jim Paris wrote:
> > > > + if (count > LATCH) {
> > >
> > > may be (count >= LATCH) would be even better ?
> >
> > Some PIT clones seem to hold the LATCH value momentarily judging by
> > other things that were triggered wrongly by >=
>
> If so, then that's a separate problem: the later code
>
> count = ((LATCH-1) - count) * TICK_SIZE;
> delay_at_last_interrupt = (count + LATCH/2) / LATCH;
>

It might be interesting to catch that case with a printk too and put
both in 2.5 and see what comes out in the wash yes

2002-11-05 17:56:18

by Jim Paris

[permalink] [raw]
Subject: Re: [PATCH] Re: time() glitch on 2.4.18: solved

> > > + if (count > LATCH) {
> >
> > may be (count >= LATCH) would be even better ?
>
> Some PIT clones seem to hold the LATCH value momentarily judging by
> other things that were triggered wrongly by >=

If so, then that's a separate problem: the later code

count = ((LATCH-1) - count) * TICK_SIZE;
delay_at_last_interrupt = (count + LATCH/2) / LATCH;

will cause delay_at_last_interrupt to be negative, so you'll see
backwards jumps in time and occasional wraparound of usecs as I did.
Perhaps a

if (count == LATCH)
count--;

after the reset code?

-jim

2002-11-05 18:14:09

by Willy Tarreau

[permalink] [raw]
Subject: Re: [PATCH] Re: time() glitch on 2.4.18: solved

On Tue, Nov 05, 2002 at 06:37:57PM +0000, Alan Cox wrote:
> On Tue, 2002-11-05 at 18:02, Jim Paris wrote:
> > > > > + if (count > LATCH) {
> > > >
> > > > may be (count >= LATCH) would be even better ?
> > >
> > > Some PIT clones seem to hold the LATCH value momentarily judging by
> > > other things that were triggered wrongly by >=
> >
> > If so, then that's a separate problem: the later code
> >
> > count = ((LATCH-1) - count) * TICK_SIZE;
> > delay_at_last_interrupt = (count + LATCH/2) / LATCH;
> >
>
> It might be interesting to catch that case with a printk too and put
> both in 2.5 and see what comes out in the wash yes

could that be the reason a few people have experienced occasionnal jumps
backwards in gettimeofday() a few months ago, which many others could never
reproduce ? Just because of buggy hardware ?

If so, I think the printk patch should be proposed to Marcelo before
2.4.20-final.

Regards,
Willy

2002-11-05 19:37:58

by Alan

[permalink] [raw]
Subject: Re: [PATCH] Re: time() glitch on 2.4.18: solved

On Tue, 2002-11-05 at 18:57, Richard B. Johnson wrote:
> No! You will break many machines. You cannot use out_p() when
> writing the latch it __must__ be out(). the "_p" puts a write
> to another port between the two writes. That will screw up
> the internal state-machine of most PITs including AMD-SC520.

The delay is required for the PIT. Your AMD-SC520 is simply a bit wacko.
The right way to fix this is actually to switch inb_p to use udelay(8)
and to load a conservative bogomip number at boot time (we inb_p before
we compute udelay values in a few spots)

BTW if your SC520 is broken that way I just broke 2.5.4x support for it
fixing some more standards compliant platforms 8)


2002-11-05 19:38:52

by Alan

[permalink] [raw]
Subject: Re: [PATCH] Re: time() glitch on 2.4.18: solved

On Tue, 2002-11-05 at 19:29, Richard B. Johnson wrote:
> The only hardware a modern PC needs to use "slow-down_io" on is
> the RTC CMOS device. Since we need to support older boards, you
> don't want to remove the _p options indiscriminately, but you do
> not want them ever between two consecutive writes to the same device-
> port.

I own at least one that needs the _p on the DMA controller and at one
that needs _p on the PIT


2002-11-05 19:02:43

by Willy Tarreau

[permalink] [raw]
Subject: Re: [PATCH] Re: time() glitch on 2.4.18: solved

On Tue, Nov 05, 2002 at 01:57:16PM -0500, Richard B. Johnson wrote:

> No! You will break many machines. You cannot use out_p() when
> writing the latch it __must__ be out(). the "_p" puts a write
> to another port between the two writes. That will screw up
> the internal state-machine of most PITs including AMD-SC520.

You make an interesting point. Have you checked if an inb_p() on this hardware
could unlock the latch when reading the first byte ? It may also be one cause
of time jumps if the counter is just about to wrap when it's being read, and
gets unlocked by an out to port 0x80 (the "_p").

> count = inb_p(0x40); /* read the latched count */
> count |= inb(0x40) << 8;

Or perhaps it will be time to change port 0x80 to something else, that no
chipset will use. I've seen 0xED in a bios somewhere, but I don't remember
where.

Cheers,
Willy

2002-11-05 19:23:14

by Richard B. Johnson

[permalink] [raw]
Subject: Re: [PATCH] Re: time() glitch on 2.4.18: solved

On Tue, 5 Nov 2002, Willy Tarreau wrote:

> On Tue, Nov 05, 2002 at 01:57:16PM -0500, Richard B. Johnson wrote:
>
> > No! You will break many machines. You cannot use out_p() when
> > writing the latch it __must__ be out(). the "_p" puts a write
> > to another port between the two writes. That will screw up
> > the internal state-machine of most PITs including AMD-SC520.
>
> You make an interesting point. Have you checked if an inb_p() on this hardware
> could unlock the latch when reading the first byte ? It may also be one cause
> of time jumps if the counter is just about to wrap when it's being read, and
> gets unlocked by an out to port 0x80 (the "_p").
>
> > count = inb_p(0x40); /* read the latched count */
> > count |= inb(0x40) << 8;
>
> Or perhaps it will be time to change port 0x80 to something else, that no
> chipset will use. I've seen 0xED in a bios somewhere, but I don't remember
> where.
>
> Cheers,
> Willy
>

It may well be that the inb_p() also causes the same problem. You
don't want any port read/write between those required consecutive
reads or writes. You just don't want to use the _p option there --ever.

The _p option is not a panacea that one uses to make sure that the
port read/writes go okay. It's also not a "bus settle time" as IBM
said in their BIOS. It is to give the internal state-machine of
some devices enough time to set up. For instance, the CMOS clock
runs with low-power CMOS. Its internal state-machine is low-power.
It does not run as fast as an external bus can run. Therefore, when
you set the next address to be read or written, with an out to the
port at 0x70, you need to wait for the internal low-power CMOS state
machine to actually latch the internal address. Then you can read
or write from 0x71.

There are only a few devices that require the _p. They are not the
PIT, any UARTS, or the DMA controller. Sometimes you need to play
nurse-maid to the FDC, but modern ones in super-IO chips are fine.
The only hardware a modern PC needs to use "slow-down_io" on is
the RTC CMOS device. Since we need to support older boards, you
don't want to remove the _p options indiscriminately, but you do
not want them ever between two consecutive writes to the same device-
port.


Cheers,
Dick Johnson
Penguin : Linux version 2.4.18 on an i686 machine (797.90 BogoMips).
Bush : The Fourth Reich of America


2002-11-05 18:50:41

by Richard B. Johnson

[permalink] [raw]
Subject: Re: [PATCH] Re: time() glitch on 2.4.18: solved

On Tue, 5 Nov 2002, Jim Paris wrote:

> > BTW, why not trying to resync, with something like :
> >
> > if (count >= LATCH)
> > count = (count >> 8) | inb(0x40) << 8;
>
> That's best if we really are out of sync, but it's hard to tell. It
> could be that the 8253's latch value got clobbered somehow, in which
> case we definitely want to fix that or our timer interrupts will come
> out at the wrong rate. We also still need to double-check that
> count < LATCH after your code.
>
> Unless we assume that an unpaired read is more common than having the
> latch value clobbered in some other way, then I think just resetting
> the counter should be okay. Since none of this _should_ ever happen
> (but did on my system, grr), it's probably not worth making it too
> complicated just to try to figure out what went wrong.
>
> Updated patch with the printk and corrected conditional is below.
>
> -jim
>
> diff -urN linux-2.4.18/arch/i386/kernel/time.c linux-2.4.18-jim/arch/i386/kernel/time.c
> --- linux-2.4.18/arch/i386/kernel/time.c Fri Mar 15 18:28:53 2002
> +++ linux-2.4.18-jim/arch/i386/kernel/time.c Tue Nov 5 12:39:38 2002
> @@ -501,6 +501,18 @@
>
> count = inb_p(0x40); /* read the latched count */
> count |= inb(0x40) << 8;
> +
> + /* Any unpaired read will cause the above to swap MSB/LSB
> + forever. Try to detect this and reset the counter. */
> + if (count >= LATCH) {
> + printk(KERN_WARNING
> + "i8253 count too high! resetting..\n");
> + outb_p(0x34, 0x43);
> + outb_p(LATCH & 0xff, 0x40);
> + outb(LATCH >> 8, 0x40);
> + count = LATCH - 1;
> + }
> +
> spin_unlock(&i8253_lock);
>
> count = ((LATCH-1) - count) * TICK_SIZE;
> -

No! You will break many machines. You cannot use out_p() when
writing the latch it __must__ be out(). the "_p" puts a write
to another port between the two writes. That will screw up
the internal state-machine of most PITs including AMD-SC520.

Again, any time you must make two consecutive writes to the
same device port to set it, you must not use the "_p" version.
The above latch setting must be:

outb(LATCH & 0xff, 0x40);
outb(LATCH >> 8, 0x40);

Of course, the second write could have a "_p" option, but
you won't need it.



Cheers,
Dick Johnson
Penguin : Linux version 2.4.18 on an i686 machine (797.90 BogoMips).
Bush : The Fourth Reich of America


2002-11-05 18:44:01

by Jim Paris

[permalink] [raw]
Subject: Re: [PATCH] Re: time() glitch on 2.4.18: solved

> could that be the reason a few people have experienced occasionnal jumps
> backwards in gettimeofday() a few months ago, which many others could never
> reproduce ? Just because of buggy hardware ?

An unpaired read on port 0x40 is almost certainly the cause of e.g.

http://www.uwsg.iu.edu/hypermail/linux/kernel/0206.0/1505.html

and we would expect to see the problem with count==LATCH about 1/11932
of the time, about 0.008% -- almost exactly the value reported in:

http://www.uwsg.iu.edu/hypermail/linux/kernel/0206.3/0043.html

I believe that by adding both the (count > LATCH) check and a second
(count == LATCH) check, we can fix both of these.

-jim

2002-11-05 20:16:52

by Richard B. Johnson

[permalink] [raw]
Subject: Re: [PATCH] Re: time() glitch on 2.4.18: solved

On 5 Nov 2002, Alan Cox wrote:

> On Tue, 2002-11-05 at 19:29, Richard B. Johnson wrote:
> > The only hardware a modern PC needs to use "slow-down_io" on is
> > the RTC CMOS device. Since we need to support older boards, you
> > don't want to remove the _p options indiscriminately, but you do
> > not want them ever between two consecutive writes to the same device-
> > port.
>
> I own at least one that needs the _p on the DMA controller and at one
> that needs _p on the PIT
>

Hey, look. I can only warn. You do what you want. As far as I'm
concerned support stopped at Linux 2.4.19 when poll got trashed.
Nobody can use 2.4.19 or probably anything later unless they have
powerful CPUs that can spin with 1000 SIGPOLL signals per second.

Like you have said, that's the nature of free software.


Cheers,
Dick Johnson
Penguin : Linux version 2.4.18 on an i686 machine (797.90 BogoMips).
Bush : The Fourth Reich of America


2002-11-05 20:08:15

by Willy Tarreau

[permalink] [raw]
Subject: Re: [PATCH] Re: time() glitch on 2.4.18: solved

On Tue, Nov 05, 2002 at 08:07:26PM +0000, Alan Cox wrote:
> On Tue, 2002-11-05 at 19:29, Richard B. Johnson wrote:
> > The only hardware a modern PC needs to use "slow-down_io" on is
> > the RTC CMOS device. Since we need to support older boards, you
> > don't want to remove the _p options indiscriminately, but you do
> > not want them ever between two consecutive writes to the same device-
> > port.
>
> I own at least one that needs the _p on the DMA controller and at one
> that needs _p on the PIT

Well, in fact, Intel's 82C54 datasheet says that this chip needs at least
165 ns between two consecutive operations, either read or write. So with a
8 Mhz bus, you may effectively need to insert fake accesses, although most
modern chipsets certainly have better specs.

But the spec clearly states that you can interleave accesses to other
counters between the first and second bytes without problem, so good
implementations should see no side effect.

Richard, if your PIT doesn't support accesses to port 80, could you try to
use other ports ?

Cheers,
Willy

2002-11-05 20:31:10

by Richard B. Johnson

[permalink] [raw]
Subject: Re: [PATCH] Re: time() glitch on 2.4.18: solved

On Tue, 5 Nov 2002, Willy Tarreau wrote:

> On Tue, Nov 05, 2002 at 08:07:26PM +0000, Alan Cox wrote:
> > On Tue, 2002-11-05 at 19:29, Richard B. Johnson wrote:
> > > The only hardware a modern PC needs to use "slow-down_io" on is
> > > the RTC CMOS device. Since we need to support older boards, you
> > > don't want to remove the _p options indiscriminately, but you do
> > > not want them ever between two consecutive writes to the same device-
> > > port.
> >
> > I own at least one that needs the _p on the DMA controller and at one
> > that needs _p on the PIT
>
> Well, in fact, Intel's 82C54 datasheet says that this chip needs at least
> 165 ns between two consecutive operations, either read or write. So with a
> 8 Mhz bus, you may effectively need to insert fake accesses, although most
> modern chipsets certainly have better specs.
>

Its not a clocked bus. The 8 MHz means nothing. It's an async bus
and if the setup timing of the data/address-decode/chip-select sequence
is not violated, you get ~200ns between accesses so chip-to-chip
select time comes automatically. But, some older motherboards violated
all kinds of specs. They did this deliberately so you could plug
RAM boards into the ISA bus and get reasonable performance. If we still
support them, you need some artifical waits.

> But the spec clearly states that you can interleave accesses to other
> counters between the first and second bytes without problem, so good
> implementations should see no side effect.
>
> Richard, if your PIT doesn't support accesses to port 80, could you try to
> use other ports ?
>
> Cheers,
> Willy
>

It's not the port address. It's the bus state. The chip requires two
consecutive writes when setting that latch. It's just like a 32-bit
write to a 32-bit port (like PCI access), you can't write to the
low-port, do something else, then write to the high port. It needs to be
done with no intervening bus states. In the AMD case, it's probably
because the address is ignored on the second write.

Cheers,
Dick Johnson
Penguin : Linux version 2.4.18 on an i686 machine (797.90 BogoMips).
Bush : The Fourth Reich of America


2002-11-05 20:34:14

by Richard B. Johnson

[permalink] [raw]
Subject: Re: [PATCH] Re: time() glitch on 2.4.18: solved

On 5 Nov 2002, Alan Cox wrote:

> On Tue, 2002-11-05 at 20:23, Richard B. Johnson wrote:
> > Hey, look. I can only warn. You do what you want. As far as I'm
> > concerned support stopped at Linux 2.4.19 when poll got trashed.
> > Nobody can use 2.4.19 or probably anything later unless they have
> > powerful CPUs that can spin with 1000 SIGPOLL signals per second.
>
> My 486SLC33 (running at 8MHz on battery mode) doesn't believe you.
>

Enable SIGPOLL on STDIN_FILENO and connect from the network as previously
shown. You will get 100% CPU time on the task with this enabled.

Cheers,
Dick Johnson
Penguin : Linux version 2.4.18 on an i686 machine (797.90 BogoMips).
Bush : The Fourth Reich of America


2002-11-05 20:28:28

by Alan

[permalink] [raw]
Subject: Re: [PATCH] Re: time() glitch on 2.4.18: solved

On Tue, 2002-11-05 at 20:23, Richard B. Johnson wrote:
> Hey, look. I can only warn. You do what you want. As far as I'm
> concerned support stopped at Linux 2.4.19 when poll got trashed.
> Nobody can use 2.4.19 or probably anything later unless they have
> powerful CPUs that can spin with 1000 SIGPOLL signals per second.

My 486SLC33 (running at 8MHz on battery mode) doesn't believe you.

2002-11-05 23:09:08

by Willy Tarreau

[permalink] [raw]
Subject: Re: [PATCH] Re: time() glitch on 2.4.18: solved

On Tue, Nov 05, 2002 at 03:38:00PM -0500, Richard B. Johnson wrote:
> On Tue, 5 Nov 2002, Willy Tarreau wrote:
>
> > On Tue, Nov 05, 2002 at 08:07:26PM +0000, Alan Cox wrote:
> > > On Tue, 2002-11-05 at 19:29, Richard B. Johnson wrote:
> > > > The only hardware a modern PC needs to use "slow-down_io" on is
> > > > the RTC CMOS device. Since we need to support older boards, you
> > > > don't want to remove the _p options indiscriminately, but you do
> > > > not want them ever between two consecutive writes to the same device-
> > > > port.
> > >
> > > I own at least one that needs the _p on the DMA controller and at one
> > > that needs _p on the PIT
> >
> > Well, in fact, Intel's 82C54 datasheet says that this chip needs at least
> > 165 ns between two consecutive operations, either read or write. So with a
> > 8 Mhz bus, you may effectively need to insert fake accesses, although most
> > modern chipsets certainly have better specs.
> >
>
> Its not a clocked bus. The 8 MHz means nothing. It's an async bus
> and if the setup timing of the data/address-decode/chip-select sequence
> is not violated, you get ~200ns between accesses so chip-to-chip
> select time comes automatically. But, some older motherboards violated
> all kinds of specs. They did this deliberately so you could plug
> RAM boards into the ISA bus and get reasonable performance. If we still
> support them, you need some artifical waits.

Well, some older (386) motherboards had the 8254 directly connected to the ISA
bus. And since Linux is 386-compatible, I think we should still support those
boards.

> It's not the port address. It's the bus state. The chip requires two
> consecutive writes when setting that latch. It's just like a 32-bit
> write to a 32-bit port (like PCI access), you can't write to the
> low-port, do something else, then write to the high port.

Yes you can ! the chip uses a CS (chip select) from an address decoder,
and sees the reads/writes only when both CS and either RD or WR are low.
I understand that all this circuitry may be simplified a bit in recent
chipsets (one unified decoder, fewer glue logic between chips), but there's
no reason that this should not be respected if the chip makers claim a bit
of compatibility.

> It needs to be done with no intervening bus states. In the AMD case, it's
> probably because the address is ignored on the second write.

Then it's buggy. This chip was originaly designed to work for 8086/80186,
which barely means work when directly connected to an ISA bus. If AMD make
their chip rely on PCI-like back-to-back behaviour, I don't agree for the
compatibility.

Now, since there's at least one buggy implementation, the solution is udelay().
But do you think that your chip may be confused by a memory access, when
udelay() fetches the count variable from memory, for example ?

The last other solution may be to simply use jmp $+2, which will be slow
enough on old sensible hardware, and imperceptible on more recent hardware
which doesn't require this.

Since Alan and you have one piece of hardware which show opposite behaviour,
we should get an ever working solution quickly :-)

Cheers,
Willy

2002-11-06 12:39:06

by Richard B. Johnson

[permalink] [raw]
Subject: Re: [PATCH] Re: time() glitch on 2.4.18: solved

On Wed, 6 Nov 2002, Willy Tarreau wrote:

> On Tue, Nov 05, 2002 at 03:38:00PM -0500, Richard B. Johnson wrote:
> > On Tue, 5 Nov 2002, Willy Tarreau wrote:
> >
> > > On Tue, Nov 05, 2002 at 08:07:26PM +0000, Alan Cox wrote:
> > > > On Tue, 2002-11-05 at 19:29, Richard B. Johnson wrote:
> > > > > The only hardware a modern PC needs to use "slow-down_io" on is
> > > > > the RTC CMOS device. Since we need to support older boards, you
> > > > > don't want to remove the _p options indiscriminately, but you do
> > > > > not want them ever between two consecutive writes to the same device-
> > > > > port.
> > > >
> > > > I own at least one that needs the _p on the DMA controller and at one
> > > > that needs _p on the PIT
> > >
> > > Well, in fact, Intel's 82C54 datasheet says that this chip needs at least
> > > 165 ns between two consecutive operations, either read or write. So with a
> > > 8 Mhz bus, you may effectively need to insert fake accesses, although most
> > > modern chipsets certainly have better specs.
> > >
> >
> > Its not a clocked bus. The 8 MHz means nothing. It's an async bus
> > and if the setup timing of the data/address-decode/chip-select sequence
> > is not violated, you get ~200ns between accesses so chip-to-chip
> > select time comes automatically. But, some older motherboards violated
> > all kinds of specs. They did this deliberately so you could plug
> > RAM boards into the ISA bus and get reasonable performance. If we still
> > support them, you need some artifical waits.
>
> Well, some older (386) motherboards had the 8254 directly connected to the ISA
> bus. And since Linux is 386-compatible, I think we should still support those
> boards.
>
> > It's not the port address. It's the bus state. The chip requires two
> > consecutive writes when setting that latch. It's just like a 32-bit
> > write to a 32-bit port (like PCI access), you can't write to the
> > low-port, do something else, then write to the high port.
>
> Yes you can ! the chip uses a CS (chip select) from an address decoder,
> and sees the reads/writes only when both CS and either RD or WR are low.
> I understand that all this circuitry may be simplified a bit in recent
> chipsets (one unified decoder, fewer glue logic between chips), but there's
> no reason that this should not be respected if the chip makers claim a bit
> of compatibility.
>
> > It needs to be done with no intervening bus states. In the AMD case, it's
> > probably because the address is ignored on the second write.
>
> Then it's buggy. This chip was originaly designed to work for 8086/80186,
> which barely means work when directly connected to an ISA bus. If AMD make
> their chip rely on PCI-like back-to-back behaviour, I don't agree for the
> compatibility.
>
> Now, since there's at least one buggy implementation, the solution is udelay().
> But do you think that your chip may be confused by a memory access, when
> udelay() fetches the count variable from memory, for example ?
>
> The last other solution may be to simply use jmp $+2, which will be slow
> enough on old sensible hardware, and imperceptible on more recent hardware
> which doesn't require this.
>
> Since Alan and you have one piece of hardware which show opposite behaviour,
> we should get an ever working solution quickly :-)
>

udelay() would work memory/read/write is entirely different from I/O
port read/write even though PWB traces are shared. But that might
result in wasted CPU cycles.

This works in all cases in machines I have tested. I can't test everything
but, depending upon whether or not the forces a cache-line refill, the
delay can be from 200 ns to over 800 ns. I have a single instance of this,
and call it, rather than doing in-line. This adds a further delay.

#
.section .text
.global cflush
.type cflush,@function

cflush: pushl %cs # Put code segment on the stack
pushl $goto # Put offset on the stack
lret # Do a 'long' return (reloads cs)
goto: ret
.end




Cheers,
Dick Johnson
Penguin : Linux version 2.4.18 on an i686 machine (797.90 BogoMips).
Bush : The Fourth Reich of America


2002-11-06 12:56:29

by Alan

[permalink] [raw]
Subject: Re: [PATCH] Re: time() glitch on 2.4.18: solved

On Wed, 2002-11-06 at 12:47, Richard B. Johnson wrote:
> udelay() would work memory/read/write is entirely different from I/O
> port read/write even though PWB traces are shared. But that might
> result in wasted CPU cycles.

udelay actually makes a lot lot more sense than the current _p stuff.
Legacy-free hardware might not do what is expected with port 0x80
eventually and still have stuff using _p
>
> This works in all cases in machines I have tested. I can't test everything
> but, depending upon whether or not the forces a cache-line refill, the
> delay can be from 200 ns to over 800 ns. I have a single instance of this,
> and call it, rather than doing in-line. This adds a further delay.

Thats a call/return stack break. That does delay damage in excess of
what you actually want and the wrong places. udelay does the right thing

2002-11-06 15:02:51

by Christer Weinigel

[permalink] [raw]
Subject: Re: [PATCH] Re: time() glitch on 2.4.18: solved

"Richard B. Johnson" <[email protected]> writes:

> > On Tue, 2002-11-05 at 20:23, Richard B. Johnson wrote:
> > > Hey, look. I can only warn. You do what you want. As far as I'm
> > > concerned support stopped at Linux 2.4.19 when poll got trashed.
> > > Nobody can use 2.4.19 or probably anything later unless they have
> > > powerful CPUs that can spin with 1000 SIGPOLL signals per second.
>
> Enable SIGPOLL on STDIN_FILENO and connect from the network as previously
> shown. You will get 100% CPU time on the task with this enabled.

Can you please stop harping about this? You have written essentially
the same mail over and over again without adding any new information.

First of all, your problem has nothing to do with networking, the same
thing will happen if you run your program from within screen, so it
seems to have more to do with ptys and how both screen and telnetd
sets them up.

Second, it seems that your program does exactly what is expected. Try
applying this patch to your example program and see what happens:

diff -u ./polltest.c.orig ./polltest.c
--- ./polltest.c.orig Wed Nov 6 15:36:09 2002
+++ ./polltest.c Wed Nov 6 15:35:54 2002
@@ -47,7 +47,7 @@
{
struct pollfd pf;
pf.fd = STDIN_FILENO;
- pf.events = POLLIN;
+ pf.events = POLLIN | POLLOUT | POLLERR;
pf.revents = 0;
(void)poll(&pf, 1, 0);
if(pf.revents & POLLIN)

Each time you print the "POLLOUT = 3 POLLERR = 0" message, you fill up
the write buffer, and each time the write queue empties you will get
another SIGIO. Now, this may not be what you want, but it seems to be
entirely within specs.

If you remove the fprintf(stderr, "POLLOUT... from your signal handler
you will not see the same effect as you are complaining about. Your
measurements are interfering with what you are trying to measure.

/Christer

--
"Just how much can I get away with and still go to heaven?"

Freelance consultant specializing in device driver programming for Linux
Christer Weinigel <[email protected]> http://www.weinigel.se

2002-11-06 15:42:42

by Richard B. Johnson

[permalink] [raw]
Subject: Re: [PATCH] Re: time() glitch on 2.4.18: solved

On 6 Nov 2002, Christer Weinigel wrote:

> "Richard B. Johnson" <[email protected]> writes:
>
> > > On Tue, 2002-11-05 at 20:23, Richard B. Johnson wrote:
> > > > Hey, look. I can only warn. You do what you want. As far as I'm
> > > > concerned support stopped at Linux 2.4.19 when poll got trashed.
> > > > Nobody can use 2.4.19 or probably anything later unless they have
> > > > powerful CPUs that can spin with 1000 SIGPOLL signals per second.
> >
> > Enable SIGPOLL on STDIN_FILENO and connect from the network as previously
> > shown. You will get 100% CPU time on the task with this enabled.
>
> Can you please stop harping about this? You have written essentially
> the same mail over and over again without adding any new information.
>
> First of all, your problem has nothing to do with networking, the same
> thing will happen if you run your program from within screen, so it
> seems to have more to do with ptys and how both screen and telnetd
> sets them up.
>
> Second, it seems that your program does exactly what is expected. Try
> applying this patch to your example program and see what happens:
>
> diff -u ./polltest.c.orig ./polltest.c
> --- ./polltest.c.orig Wed Nov 6 15:36:09 2002
> +++ ./polltest.c Wed Nov 6 15:35:54 2002
> @@ -47,7 +47,7 @@
> {
> struct pollfd pf;
> pf.fd = STDIN_FILENO;
> - pf.events = POLLIN;
> + pf.events = POLLIN | POLLOUT | POLLERR;
> pf.revents = 0;
> (void)poll(&pf, 1, 0);
> if(pf.revents & POLLIN)
>
> Each time you print the "POLLOUT = 3 POLLERR = 0" message, you fill up
> the write buffer, and each time the write queue empties you will get
> another SIGIO. Now, this may not be what you want, but it seems to be
> entirely within specs.
>
> If you remove the fprintf(stderr, "POLLOUT... from your signal handler
> you will not see the same effect as you are complaining about. Your
> measurements are interfering with what you are trying to measure.
>

No they are not. You can no longer `strace` because of the error reported.
`strace` will continually write the signal information to the screen
(forever) because every time it writes, another signal is generated.

All you did was get rid of the method used to show the error. User mode
code does not care, and should not care about kernel internals. Whether
or not some "write buffer" emptied is (and must not be) any concern to
the user. The user needs to know if a read or write to or from a device
will block. There are some other things a user might want to know such
as "out-of-band data", "hangups", and other protocol-specific things that
are signaled in the error bit.

A signal must be raised when conditions change. Currently, a signal
is raised for every byte transmitted. This is a gross error and cannot
be interpreted as "entirely within specs".

If a task has attached the SIGIO signal, but the handler does nothing,
the handler still gets executed. It gets executed on every byte sent.
This makes output "cost" 100 percent of the CPU as seen on `top`. This
is simply wrong. There is no possible explaination that will make it
right.

Everybody, so far, who has answered this thread assumes that I don't
know what I am doing. In avery case, they modify the code to remove
the forcing function that I have used to make it so they can duplicate
the problem on their system. This is not productive and will not
fix the problem. The problem is real and needs to be fixed.

Cheers,
Dick Johnson
Penguin : Linux version 2.4.18 on an i686 machine (797.90 BogoMips).
Bush : The Fourth Reich of America


2002-11-06 23:15:19

by Jim Paris

[permalink] [raw]
Subject: Re: [PATCH] Re: time() glitch on 2.4.18: solved

The recommendations about outb versus outb_p may be valid, but I think
this patch should go in anyway in its current form. I'm resetting the
i8253 in the same way it was initially set in i8259.c, and so Dick, if
you want to change how that's done, it needs to get changed in both
places; that's outside the scope of what I'm trying to fix right now.

To summarize, this patch will fix:

1) An unpaired read to the i8253 causes time to jump around forever
2) A buggy i8253 causes time to jump around 0.008% of the time

#1 is reported whenever it occurs. #2 is only reported the first few
times because it's likely to occur every two minutes or so on affected
chips.

Patch is against 2.4.18. It applies cleanly (but is untested) on 2.4.19
and 2.4.20-rc1.

-jim

diff -urN linux-2.4.18/arch/i386/kernel/time.c linux-2.4.18-jim/arch/i386/kernel/time.c
--- linux-2.4.18/arch/i386/kernel/time.c Fri Mar 15 18:28:53 2002
+++ linux-2.4.18-jim/arch/i386/kernel/time.c Wed Nov 6 17:18:28 2002
@@ -501,7 +501,30 @@

count = inb_p(0x40); /* read the latched count */
count |= inb(0x40) << 8;
+
+ /* Any unpaired read will cause the above to swap MSB/LSB
+ forever. Try to detect this and reset the counter. */
+ if (count > LATCH) {
+ printk(KERN_WARNING
+ "i8253 count too high! resetting..\n");
+ outb_p(0x34, 0x43);
+ outb_p(LATCH & 0xff, 0x40);
+ outb(LATCH >> 8, 0x40);
+ count = LATCH - 1;
+ }
+
spin_unlock(&i8253_lock);
+
+ /* Buggy i8253s might hold the LATCH value. */
+ if (count == LATCH) {
+ static int reported = 0;
+ if(reported<3) {
+ printk(KERN_WARNING
+ "i8253 count too high! adjusting..\n");
+ reported++;
+ }
+ count--;
+ }

count = ((LATCH-1) - count) * TICK_SIZE;
delay_at_last_interrupt = (count + LATCH/2) / LATCH;