2002-06-24 08:54:34

by Salvatore D'Angelo

[permalink] [raw]
Subject: gettimeofday problem

Hi,

I am writing a small piece of code that use the gettimeofday routine and
I have noticed a very strange behaviour. If I call the routine two times
in sequence I expect that the second value is greater than or equal to
the first one, but it is not true.

please check the following code.

Sometimes happen that the string "Strange Behaviour" is printed with
kernel 2.4.18.

I tried to find in the Linux Archive patches to solve this problem, but
I didn't find anything (there are emails that talk about gettimeofday,
but probably they do not answer to my questions).

The same thing happen in Java using the System.currentTimeMillis() routine.

#include <fstream.h>
#include <sys/time.h>

// this routine calculate the current time returning its value in long
long format.
long long currentTimeMillis() {
long long t;
struct timeval tv;

gettimeofday(&tv, 0);

t = tv.tv_sec;
t = (t *1000) + (tv.tv_usec/1000);

return t;
}

void main() {
for (;;) {
long long a = currentTimeMillis();
long long b = currentTimeMillis();

if (a>b) {
cout << "Strange Behaviour" << endl;
}
}
}




2002-06-24 10:47:26

by Salvatore D'Angelo

[permalink] [raw]
Subject: Re: gettimeofday problem

In this piece of code I convert seconds and microseconds in
milliseconds. I think the problem is not in my code, in fact I wrote the
following piece of code in Java, and it does not work too. In the for
loop the 90% of times b > a while for 10% of times not.

class Prova {
public static void main(....) {
for (;;) {
long a = System.currentTimeMillis();
long b = System.currentTimeMillis();

if (a > b) {
System.out.println("Wrong!!!!!!!!!!!!!");
}
}
}
}


Chris McDonald wrote:

>In linux.kernel you write:
>
>
>
>>Hi,
>>
>>
>
>
>
>>I am writing a small piece of code that use the gettimeofday routine and
>>I have noticed a very strange behaviour. If I call the routine two times
>>in sequence I expect that the second value is greater than or equal to
>>the first one, but it is not true.
>>
>>
>
>Perhaps because tv_usecs are MICROsecs, not MILLIsecs ?
>
>_______________________________________________________________________________
> Dr Chris McDonald EMAIL: [email protected]
> Department of Computer Science & Software Engineering
>The University of Western Australia WWW: http://www.csse.uwa.edu.au/~chris
> Crawley, Western Australia, 6009 PH: +61 8 9380 2533, FAX: +61 8 9380 1089
>
>
>
>
>>please check the following code.
>>
>>
>
>
>
>>Sometimes happen that the string "Strange Behaviour" is printed with
>>kernel 2.4.18.
>>
>>
>
>
>
>>I tried to find in the Linux Archive patches to solve this problem, but
>>I didn't find anything (there are emails that talk about gettimeofday,
>>but probably they do not answer to my questions).
>>
>>
>
>
>
>>The same thing happen in Java using the System.currentTimeMillis() routine.
>>
>>
>
>
>
>>#include <fstream.h>
>>#include <sys/time.h>
>>
>>
>
>
>
>>// this routine calculate the current time returning its value in long
>>long format.
>>long long currentTimeMillis() {
>> long long t;
>> struct timeval tv;
>>
>>
>
>
>
>> gettimeofday(&tv, 0);
>>
>>
>
>
>
>> t = tv.tv_sec;
>> t = (t *1000) + (tv.tv_usec/1000);
>>
>>
>
>
>
>> return t;
>>}
>>
>>
>
>
>
>>void main() {
>> for (;;) {
>> long long a = currentTimeMillis();
>> long long b = currentTimeMillis();
>>
>>
>
>
>
>> if (a>b) {
>> cout << "Strange Behaviour" << endl;
>> }
>> }
>>}
>>
>>
>
>
>


2002-06-24 12:46:22

by Matti Aarnio

[permalink] [raw]
Subject: Re: gettimeofday problem

On Mon, Jun 24, 2002 at 12:20:02PM +0200, Salvatore D'Angelo wrote:
> In this piece of code I convert seconds and microseconds in
> milliseconds. I think the problem is not in my code, in fact I wrote the
> following piece of code in Java, and it does not work too. In the for
> loop the 90% of times b > a while for 10% of times not.
>
...
> long a = System.currentTimeMillis();
> long b = System.currentTimeMillis();
> if (a > b) {
> System.out.println("Wrong!!!!!!!!!!!!!");
> }


So in 10% of the cases, two successive calls yield time
rolling BACK ?

I used gettimeofday() call, and compared the original data
from the code.

At a modern uniprocessor machine I never get anything except
monotonously increasing time (TSC is used in betwen timer ticks
to supply time increase.) At a dual processor machine, on
occasion I do get SAME value twice. I have never seen time
rolling backwards.

Uh.. correction: 216199245 0:-1 -- it did step backwards,
but only once within about 216 million gettimeofday() calls.
(I am running 2.4.19-pre8smp at the test box.)

/Matti Aarnio

2002-06-24 13:57:00

by Salvatore D'Angelo

[permalink] [raw]
Subject: Re: gettimeofday problem

On 2000000 call -> 189 times I found the problem (0.00945%)
On 20000000 call ->1956 found I found the problem (0.00978%)

Probably you're right my previous percentage is too high (the one above
should be the correct one).

But do you think that this behaviour is normal?

Matti Aarnio wrote:

>On Mon, Jun 24, 2002 at 12:20:02PM +0200, Salvatore D'Angelo wrote:
>
>
>>In this piece of code I convert seconds and microseconds in
>>milliseconds. I think the problem is not in my code, in fact I wrote the
>>following piece of code in Java, and it does not work too. In the for
>>loop the 90% of times b > a while for 10% of times not.
>>
>>
>>
>...
>
>
>> long a = System.currentTimeMillis();
>> long b = System.currentTimeMillis();
>> if (a > b) {
>> System.out.println("Wrong!!!!!!!!!!!!!");
>> }
>>
>>
>
>
> So in 10% of the cases, two successive calls yield time
> rolling BACK ?
>
> I used gettimeofday() call, and compared the original data
> from the code.
>
> At a modern uniprocessor machine I never get anything except
> monotonously increasing time (TSC is used in betwen timer ticks
> to supply time increase.) At a dual processor machine, on
> occasion I do get SAME value twice. I have never seen time
> rolling backwards.
>
> Uh.. correction: 216199245 0:-1 -- it did step backwards,
> but only once within about 216 million gettimeofday() calls.
> (I am running 2.4.19-pre8smp at the test box.)
>
>/Matti Aarnio
>-
>To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>the body of a message to [email protected]
>More majordomo info at http://vger.kernel.org/majordomo-info.html
>Please read the FAQ at http://www.tux.org/lkml/
>
>
>


2002-06-24 18:56:32

by Karim Yaghmour

[permalink] [raw]
Subject: Re: gettimeofday problem


Salvatore D'Angelo wrote:
> On 2000000 call -> 189 times I found the problem (0.00945%)
> On 20000000 call ->1956 found I found the problem (0.00978%)
...
> But do you think that this behaviour is normal?

This has already been discussed on the LKML. Here's the thread:
http://marc.theaimsgroup.com/?t=102348161100006&r=1&w=2

I posted the following message on this issue:
http://marc.theaimsgroup.com/?l=linux-kernel&m=102348249521519&w=2

As I had said earlier, I've seen this happen before on both i386 and
PPC machines.

Cheers,

Karim

===================================================
Karim Yaghmour
[email protected]
Embedded and Real-Time Linux Expert
===================================================

2002-06-24 19:41:07

by Richard B. Johnson

[permalink] [raw]
Subject: Re: gettimeofday problem

On Mon, 24 Jun 2002, Salvatore D'Angelo wrote:

> In this piece of code I convert seconds and microseconds in
> milliseconds. I think the problem is not in my code, in fact I wrote the
> following piece of code in Java, and it does not work too. In the for
> loop the 90% of times b > a while for 10% of times not.
>
> class Prova {
> public static void main(....) {
> for (;;) {
> long a = System.currentTimeMillis();
> long b = System.currentTimeMillis();
>
> if (a > b) {
> System.out.println("Wrong!!!!!!!!!!!!!");
> }
> }
> }
> }
>
>

This has been running since I first read your mail about 10:00 this
morning. The kernel is 2.4.18

#include <stdio.h>
#include <sys/time.h>
#define MICRO 1000000
#define ULL unsigned long long
int main(void);
static ULL tim(void);

static ULL tim()
{
struct timeval t;
(void)gettimeofday(&t, NULL);
return (ULL)t.tv_sec * MICRO + (ULL)t.tv_usec;
}
int main()
{
ULL a, b, cnt;
for(cnt=0;;cnt++)
{
a = tim();
b = tim();
if(b < a)
break;
}
printf("Failed after %llu\n", cnt);
printf("a = %llu, b = %llu\n", a, b);
return 0;
}

It seems to work fine. That said, I didn't use your code or check
for the possibility of a sign-change miscompare. I just made sure
I don't have any by using unsigned stuff. You may want to try
this code to see if you have a compiler (or coding) problem.


Cheers,
Dick Johnson

Penguin : Linux version 2.4.18 on an i686 machine (797.90 BogoMips).

Windows-2000/Professional isn't.

2002-06-24 23:34:31

by Frank van de Pol

[permalink] [raw]
Subject: Re: gettimeofday problem



Richard,

I gave your test program a try, and the problem is reproducably triggered:


frank@idefix:~/projects$ ./tod
Failed after 2008
a = 1024961239107823, b = 1024961238254652
frank@idefix:~/projects$ ./tod
Failed after 1394
a = 1024961252215231, b = 1024961239964379
frank@idefix:~/projects$ ./tod
Failed after 110179
a = 1024961241574880, b = 1024961241573638
frank@idefix:~/projects$ ./tod
Failed after 4891
a = 1024961242145265, b = 1024961242144027
frank@idefix:~/projects$ ./tod
Failed after 45008
a = 1024961243210834, b = 1024961242943904
frank@idefix:~/projects$ ./tod
Failed after 6695
a = 1024961243405068, b = 1024961243403828
frank@idefix:~/projects$ ./tod
Failed after 1487
a = 1024961244216903, b = 1024961244093738
frank@idefix:~/projects$ ./tod
Failed after 3275
a = 1024961245674712, b = 1024961245673487
frank@idefix:~/projects$ ./tod
Failed after 42122
a = 1024961259065626, b = 1024961246333377
frank@idefix:~/projects$ ./tod
Failed after 425
a = 1024961253600435, b = 1024961252652402
frank@idefix:~/projects$


Possibly it might have to do with the fact that my machine is an SMP box
(dual pentium II)

$ uname -a

Linux idefix 2.4.18 #3 SMP Sat Apr 20 14:35:58 CEST 2002 i686 unknown

In fact it is almost symmetrical (one CPU seems to be faster than the other,
eventhough they are running at the same clock speed).

$ cat /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 3
model name : Pentium II (Klamath)
stepping : 4
cpu MHz : 333.225
cache size : 512 KB
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 2
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov mmx
bogomips : 665.19

processor : 1
vendor_id : GenuineIntel
cpu family : 6
model : 5
model name : Pentium II (Deschutes)
stepping : 2
cpu MHz : 333.225
cache size : 512 KB
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 2
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 mmx
bogomips : 748.74





On Mon, Jun 24, 2002 at 03:44:02PM -0400, Richard B. Johnson wrote:
> On Mon, 24 Jun 2002, Salvatore D'Angelo wrote:
>
> > In this piece of code I convert seconds and microseconds in
> > milliseconds. I think the problem is not in my code, in fact I wrote the
> > following piece of code in Java, and it does not work too. In the for
> > loop the 90% of times b > a while for 10% of times not.
> >
> > class Prova {
> > public static void main(....) {
> > for (;;) {
> > long a = System.currentTimeMillis();
> > long b = System.currentTimeMillis();
> >
> > if (a > b) {
> > System.out.println("Wrong!!!!!!!!!!!!!");
> > }
> > }
> > }
> > }
> >
> >
>
> This has been running since I first read your mail about 10:00 this
> morning. The kernel is 2.4.18
>
> #include <stdio.h>
> #include <sys/time.h>
> #define MICRO 1000000
> #define ULL unsigned long long
> int main(void);
> static ULL tim(void);
>
> static ULL tim()
> {
> struct timeval t;
> (void)gettimeofday(&t, NULL);
> return (ULL)t.tv_sec * MICRO + (ULL)t.tv_usec;
> }
> int main()
> {
> ULL a, b, cnt;
> for(cnt=0;;cnt++)
> {
> a = tim();
> b = tim();
> if(b < a)
> break;
> }
> printf("Failed after %llu\n", cnt);
> printf("a = %llu, b = %llu\n", a, b);
> return 0;
> }
>
> It seems to work fine. That said, I didn't use your code or check
> for the possibility of a sign-change miscompare. I just made sure
> I don't have any by using unsigned stuff. You may want to try
> this code to see if you have a compiler (or coding) problem.
>
>
> Cheers,
> Dick Johnson
>
> Penguin : Linux version 2.4.18 on an i686 machine (797.90 BogoMips).
>
> Windows-2000/Professional isn't.
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

--
+---- --- -- - - - -
| Frank van de Pol -o) A-L-S-A
| [email protected] /\\ Sounds good!
| http://www.alsa-project.org _\_v
| Linux - Why use Windows if we have doors available?

2002-06-25 00:37:35

by Christian Robert

[permalink] [raw]
Subject: Re: gettimeofday problem

For your eyes,

while reading this thread I wrote a sample program to test if the clock
sometimes goes backward/forward.

I started my program while continuing reading threads on the linux-kernel
archive. After about 90 minutes of continuous running I went back to the window
running the program and surprise I saw this:

$ ./tloop
Bump negative -4294967295
^C
Summary:
-------
Min = 0
Max = 257845
Avg = 1 (6009092476/5521919279)


So it looks like the time changed somewhere of a value +/- 4,295 seconds.

kernel 2.4.18

$ cat /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 8
model name : Pentium III (Coppermine)
stepping : 1
cpu MHz : 602.566
cache size : 256 KB
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 2
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 sep mtrr pge mca cmov pat pse36 mmx fxsr sse
bogomips : 1202.58

--------------------- program tloop.c -------------------------

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

#include <sys/time.h>
#include <signal.h>

typedef long long LL;

LL GetTime (void)
{
struct timeval tv;
LL retval;

gettimeofday (&tv, NULL);
retval = (tv.tv_sec * 1000000) + (tv.tv_usec);
return retval;
}

volatile int Break = 0;

void Trap (int sig)
{
Break = 1;
}

int main (void)
{
LL Now, Old;
LL Dt, Min=9999999, Max=0, Num=0, Tot=0;

Old = Now = GetTime ();

signal (SIGINT, Trap);
signal (SIGQUIT, Trap);

for ( ; Break==0 ; )
{
Now = GetTime();

if (Now < Old)
{
printf ("Bump negative %lld\n", (Now-Old));
}
else
{
Dt = Now-Old;

Min = (Dt < Min) ? Dt : Min;
Max = (Dt > Max) ? Dt : Max;

Tot += Dt;
Num += 1;
}

Old = Now;
}

printf ("Summary:\n-------\n Min = %lld\n Max = %lld\n "
"Avg = %lld (%lld/%lld)\n", Min, Max, Tot/Num, Tot, Num);

return 0;
}

2002-06-25 00:46:25

by Brad Hards

[permalink] [raw]
Subject: Re: gettimeofday problem

On Tue, 25 Jun 2002 10:37, Christian Robert wrote:
> gettimeofday (&tv, NULL);
How about checking the return value of the function call?

Brad
--
http://conf.linux.org.au. 22-25Jan2003. Perth, Australia. Birds in Black.

2002-06-25 02:03:14

by Christian Robert

[permalink] [raw]
Subject: Re: gettimeofday problem

Brad Hards wrote:
>
> On Tue, 25 Jun 2002 10:37, Christian Robert wrote:
> > gettimeofday (&tv, NULL);
> How about checking the return value of the function call?
>
> Brad
> --
> http://conf.linux.org.au. 22-25Jan2003. Perth, Australia. Birds in Black.

$ time ./tloop
Bump negative -4294967295
Summary:
-------
Min = 0
Max = 140068
Avg = 1 (4064861295/3825797418)

real 67m44.891s
user 29m29.690s
sys 27m53.130s


Same thing. Took about an hour before getting the negative bump.

Xtian.

---- modified GetTime() checking return value of gettimeofday() -----

LL GetTime (void)
{
struct timeval tv;
LL retval;
int rc;

while (0 != (rc = gettimeofday (&tv, NULL)))
printf ("Wow! gettimeofday () returned %d\n", rc);

retval = (tv.tv_sec * 1000000) + (tv.tv_usec);
return retval;
}

2002-06-25 02:48:01

by John Alvord

[permalink] [raw]
Subject: Re: gettimeofday problem

Maybe this is the result of floating point rounding errors. Floating
point is notorious for occaisional strange results. I suggest redoing
the test program to keep all results in integer and seeing what
happens...

john


On Mon, 24 Jun 2002 22:03:12 -0400, Christian Robert
<[email protected]> wrote:

>Brad Hards wrote:
>>
>> On Tue, 25 Jun 2002 10:37, Christian Robert wrote:
>> > gettimeofday (&tv, NULL);
>> How about checking the return value of the function call?
>>
>> Brad
>> --
>> http://conf.linux.org.au. 22-25Jan2003. Perth, Australia. Birds in Black.
>
>$ time ./tloop
>Bump negative -4294967295
>Summary:
>-------
> Min = 0
> Max = 140068
> Avg = 1 (4064861295/3825797418)
>
>real 67m44.891s
>user 29m29.690s
>sys 27m53.130s
>
>
>Same thing. Took about an hour before getting the negative bump.
>
>Xtian.
>
>---- modified GetTime() checking return value of gettimeofday() -----
>
>LL GetTime (void)
>{
> struct timeval tv;
> LL retval;
> int rc;
>
> while (0 != (rc = gettimeofday (&tv, NULL)))
> printf ("Wow! gettimeofday () returned %d\n", rc);
>
> retval = (tv.tv_sec * 1000000) + (tv.tv_usec);
> return retval;
>}
>-
>To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>the body of a message to [email protected]
>More majordomo info at http://vger.kernel.org/majordomo-info.html
>Please read the FAQ at http://www.tux.org/lkml/

2002-06-25 09:17:57

by Christian Robert

[permalink] [raw]
Subject: Re: gettimeofday problem

John Alvord wrote:
>
> Maybe this is the result of floating point rounding errors. Floating
> point is notorious for occaisional strange results. I suggest redoing
> the test program to keep all results in integer and seeing what
> happens...
>
> john

You were close.

Programming error on my part.

This modified subroutine does correct my test.
Notice the (LL) cast on 1000000

Computation was done on 32 bits integer then assign on the 64 bits integer.

sorry.

ps: It may help explain the other > 1 hour time jump I've seen in an
other thread.


LL GetTime (void)
{
struct timeval tv;
LL retval;
int rc;

while (0 != (rc = gettimeofday (&tv, NULL)))
printf ("Wow! gettimeofday () returned %d\n", rc);

retval = (tv.tv_sec * (LL)1000000) + (tv.tv_usec);

return retval;
}

2002-06-25 11:44:12

by Richard B. Johnson

[permalink] [raw]
Subject: Re: gettimeofday problem

On Tue, 25 Jun 2002, Brad Hards wrote:

> On Tue, 25 Jun 2002 10:37, Christian Robert wrote:
> > gettimeofday (&tv, NULL);
> How about checking the return value of the function call?
>
> Brad
> --
> http://conf.linux.org.au. 22-25Jan2003. Perth, Australia. Birds in Black.

I think the only possible error returned would relate to the time-zone
being invalid. The time-zone pointer being a NULL is valid so it isn't
going to return EINVAL.


Cheers,
Dick Johnson

Penguin : Linux version 2.4.18 on an i686 machine (797.90 BogoMips).

Windows-2000/Professional isn't.

2002-06-25 11:40:54

by Richard B. Johnson

[permalink] [raw]
Subject: Re: gettimeofday problem

On Tue, 25 Jun 2002, Frank van de Pol wrote:

>
>
> Richard,
>
> I gave your test program a try, and the problem is reproducably triggered:
>
>
> frank@idefix:~/projects$ ./tod
> Failed after 2008
> a = 1024961239107823, b = 1024961238254652
> frank@idefix:~/projects$ ./tod
> Failed after 1394
> a = 1024961252215231, b = 1024961239964379
> frank@idefix:~/projects$ ./tod
> Failed after 110179
> a = 1024961241574880, b = 1024961241573638
> frank@idefix:~/projects$ ./tod
> Failed after 4891
> a = 1024961242145265, b = 1024961242144027
> frank@idefix:~/projects$ ./tod
> Failed after 45008
> a = 1024961243210834, b = 1024961242943904
> frank@idefix:~/projects$ ./tod
> Failed after 6695
> a = 1024961243405068, b = 1024961243403828
> frank@idefix:~/projects$ ./tod
> Failed after 1487
> a = 1024961244216903, b = 1024961244093738
> frank@idefix:~/projects$ ./tod
> Failed after 3275
> a = 1024961245674712, b = 1024961245673487
> frank@idefix:~/projects$ ./tod
> Failed after 42122
> a = 1024961259065626, b = 1024961246333377
> frank@idefix:~/projects$ ./tod
> Failed after 425
> a = 1024961253600435, b = 1024961252652402
> frank@idefix:~/projects$
>
>
> Possibly it might have to do with the fact that my machine is an SMP box
> (dual pentium II)
>
> $ uname -a
>
> Linux idefix 2.4.18 #3 SMP Sat Apr 20 14:35:58 CEST 2002 i686 unknown
>
> In fact it is almost symmetrical (one CPU seems to be faster than the other,
> eventhough they are running at the same clock speed).
>
> $ cat /proc/cpuinfo
> processor : 0
> vendor_id : GenuineIntel
> cpu family : 6
> model : 3
> model name : Pentium II (Klamath)
> stepping : 4
> cpu MHz : 333.225
> cache size : 512 KB
> fdiv_bug : no
> hlt_bug : no
> f00f_bug : no
> coma_bug : no
> fpu : yes
> fpu_exception : yes
> cpuid level : 2
> wp : yes
> flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov mmx
> bogomips : 665.19
>
> processor : 1
> vendor_id : GenuineIntel
> cpu family : 6
> model : 5
> model name : Pentium II (Deschutes)
> stepping : 2
> cpu MHz : 333.225
> cache size : 512 KB
> fdiv_bug : no
> hlt_bug : no
> f00f_bug : no
> coma_bug : no
> fpu : yes
> fpu_exception : yes
> cpuid level : 2
> wp : yes
> flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 mmx
> bogomips : 748.74
>
>

Well, I have a dual Pentium also, but I didn't mix two different
CPUs as you have done. I'm supprised that your combination even
works! It could be a good 'checker' for hard-to-make race conditions.

Linux chaos 2.4.18 #14 SMP Mon Jun 17 15:31:13 EDT 2002 i686

processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 5
model name : Pentium II (Deschutes)
stepping : 1
cpu MHz : 399.570
cache size : 512 KB
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 2
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 mmx fxsr
bogomips : 797.90

processor : 1
vendor_id : GenuineIntel
cpu family : 6
model : 5
model name : Pentium II (Deschutes)
stepping : 1
cpu MHz : 399.570
cache size : 512 KB
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 2
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 mmx fxsr
bogomips : 797.90


I ran my program all night on another machine and it's still running.
Neither of these machines are trying to sync with NIST. Machines that
are running timing daemons that attempt to sync their clocks could, of
course, have problems with time-jumps.

Cheers,
Dick Johnson

Penguin : Linux version 2.4.18 on an i686 machine (797.90 BogoMips).

Windows-2000/Professional isn't.

2002-06-25 11:53:08

by Brad Hards

[permalink] [raw]
Subject: Re: gettimeofday problem

On Tue, 25 Jun 2002 21:45, Richard B. Johnson wrote:
> On Tue, 25 Jun 2002, Brad Hards wrote:
> > On Tue, 25 Jun 2002 10:37, Christian Robert wrote:
> > > gettimeofday (&tv, NULL);
> >
> > How about checking the return value of the function call?
> >
> > Brad
> > --
> > http://conf.linux.org.au. 22-25Jan2003. Perth, Australia. Birds in Black.
>
> I think the only possible error returned would relate to the time-zone
> being invalid. The time-zone pointer being a NULL is valid so it isn't
> going to return EINVAL.
It was just a thought - it just seemed a reasonable thing to test for, since
the call can fail.

I didn't check the lib code, so I imagined that their might be some glibc
munging of the syscall output. man gettimeofday sez, inter alia:
EINVAL Timezone (or something else) is invalid.

EFAULT One of tv or tz pointed outside your accessible
address space.

As you pointed out, EINVAL doesn't seem too likely, and I can't see a pointer
problem. So it looks like something else.

Brad

--
http://conf.linux.org.au. 22-25Jan2003. Perth, Australia. Birds in Black.

2002-06-25 13:37:55

by Chris Friesen

[permalink] [raw]
Subject: Re: gettimeofday problem

"Richard B. Johnson" wrote:

> This has been running since I first read your mail about 10:00 this
> morning. The kernel is 2.4.18

<code snipped, find it on google>

Using that code, I can reliably trigger the fault on 2.2.17 on a G4 desktop,
while it doesn't trigger on 2.4.18 on a G4 cPCI blade.

I saw this a long time back (a year or two) on 2.2 but never really tracked it
down properly as it wasn't a showstopper and I had other things to do at the
time.

Chris

--
Chris Friesen | MailStop: 043/33/F10
Nortel Networks | work: (613) 765-0557
3500 Carling Avenue | fax: (613) 765-2986
Nepean, ON K2H 8E9 Canada | email: [email protected]

2002-06-26 10:58:11

by Gabriel Paubert

[permalink] [raw]
Subject: Re: gettimeofday problem

Chris Friesen wrote:

> "Richard B. Johnson" wrote:
>
>
>>This has been running since I first read your mail about 10:00 this
>>morning. The kernel is 2.4.18
>>
>
> <code snipped, find it on google>
>
> Using that code, I can reliably trigger the fault on 2.2.17 on a G4 desktop,
> while it doesn't trigger on 2.4.18 on a G4 cPCI blade.


Given the fact that the relevant code has been almost completely rewritten
for 2.4 (I'm the culprit), this is not surprising.

The rewrite was mostly to make the code more robust in case of missed
ticks: all PPC are now guaranteed to withstand at least (HZ-1) lost clock
interrupts before things go awfully wrong (this was very useful when the
frame buffer drivers blocked interrupts for too long during screen
switching). There are also several other less visible bug fixes: when
connected to an NTP server, in some cases the timebase frequency appeared
to vary depending on system load (fixed by a different algorithm to
compute values to load into the decrementer) and other bogosities.

Now if you are able to trigger a problem with 2.4. I'm extremely interested.


>
> I saw this a long time back (a year or two) on 2.2 but never really tracked it
> down properly as it wasn't a showstopper and I had other things to do at the
> time.


I have a patch for 2.2, but it is mixed up in a set of huge patches for
Motorola VME boards (MVME2600 and 2400 series) with a bootloader
(including an x86 emulator to initialize graphics boards by executing the
BIOS ROM), VME drivers and other things that you can find at:

ftp://vlab1.iram.es/pub/linux-2.2

I could extract the timing patches if somebody is interested, but not in
the next few days (I have patches which apply to more recent versions than
the ones I put on the FTP area).

Gabriel.

2002-06-26 11:01:27

by Gabriel Paubert

[permalink] [raw]
Subject: Re: gettimeofday problem

Karim Yaghmour wrote:


> As I had said earlier, I've seen this happen before on both i386 and
> PPC machines.


Have you seen this on PPC with recent 2.4 and 2.5 kernels
or only with 2.2 ?

Gabriel.

2002-06-26 15:32:42

by Jan Hudec

[permalink] [raw]
Subject: Re: gettimeofday problem

On Tue, Jun 25, 2002 at 05:17:52AM -0400, Christian Robert wrote:
> John Alvord wrote:
> > Maybe this is the result of floating point rounding errors. Floating
> > point is notorious for occaisional strange results. I suggest redoing
> > the test program to keep all results in integer and seeing what
> > happens...
> You were close.
> Programming error on my part.

What about comparing the struct timeval things directly? There is even a
timercmp macro for that (well I noticed that in the manpage when I
have olrady had the test written; the macro can only do sharp comparsions).

Something like this:
(I am now running it on three machines - Athlon 850, Pentium 1500 and dual
Pentium III 500 - all seem to be OK so far)

#include<stdio.h>
#include<errno.h>
#include<sys/time.h>
#include<signal.h>

volatile int loop = 1;

void sigint(int foo) {
loop = 0;
}

int main(void) {
unsigned long long cnt = 0, bcnt = 0, ecnt = 0;
struct timeval old, new = {0, 0};

signal(SIGINT, sigint);
while(loop && cnt < (1LLU<<54)) {
cnt++;
old = new;
if(gettimeofday(&new, NULL)) {
ecnt++;
printf("Error #%llu: count=%llu"
" error/count=0.%04llu errno=%i (%s)\n",
ecnt, cnt, (10000*ecnt)/cnt, errno,
sys_errlist[errno]);
continue;
}
if((new.tv_sec < old.tv_sec) || ((new.tv_sec == old.tv_sec) && (new.tv_usec < old.tv_usec))) {
bcnt++;
printf("Skew #%llu: count=%llu errors=%llu"
" skew/good count=0.%04llu, new=(%li,"
" %li) old=(%li, %li)\n", bcnt, cnt,
ecnt, (10000*bcnt)/(cnt-ecnt),
new.tv_sec, new.tv_usec, old.tv_sec,
old.tv_usec);
}
}

printf("Counted %llu, errors %llu (0.%04llu), skews %llu"
" (0.%04llu)\n", cnt, ecnt, (10000*ecnt)/cnt, bcnt,
(10000*bcnt)/(cnt-ecnt));
return 0;
}



-------------------------------------------------------------------------------
Jan 'Bulb' Hudec <[email protected]>

2002-06-28 18:02:29

by George Anzinger

[permalink] [raw]
Subject: Re: gettimeofday problem

"Richard B. Johnson" wrote:
>
<snip>
> I ran my program all night on another machine and it's still running.
> Neither of these machines are trying to sync with NIST. Machines that
> are running timing daemons that attempt to sync their clocks could, of
> course, have problems with time-jumps.
>

Me thinks it is time to fix this NIST/NTP issue. The
problem is that we are adjusting the wall clock every 1/HZ
tick instead of adjusting the 1/HZ tick AND the
interpolation constant. What happens is (in the X86) is
that we assume that the conversion of TSC to usec is fixed
and exact as computed at boot time. The time sync protocols
have a more "correct" story to tell. We need to incorporate
this information into the TSC to usec conversion so that the
wall clock correction for times between 1/HZ ticks agrees
with what is done at the tick time.
>

--
George Anzinger [email protected]
High-res-timers:
http://sourceforge.net/projects/high-res-timers/
Real time sched: http://sourceforge.net/projects/rtsched/
Preemption patch:
http://www.kernel.org/pub/linux/kernel/people/rml

2002-07-01 18:56:11

by Pavel Machek

[permalink] [raw]
Subject: Re: gettimeofday problem

Hi!

> On 2000000 call -> 189 times I found the problem (0.00945%)
> On 20000000 call ->1956 found I found the problem (0.00978%)
>
> Probably you're right my previous percentage is too high (the one above
> should be the correct one).
>
> But do you think that this behaviour is normal?

Buggy hardware and kernel that is not able to work around that. Try
googling or asking vojtech pavlik, there should be patches to fix
this.

What chipset?
Pavel
--
(about SSSCA) "I don't say this lightly. However, I really think that the U.S.
no longer is classifiable as a democracy, but rather as a plutocracy." --hpa

2002-07-01 19:03:09

by Pavel Machek

[permalink] [raw]
Subject: Re: gettimeofday problem

Hi!

> > In this piece of code I convert seconds and microseconds in
> > milliseconds. I think the problem is not in my code, in fact I wrote the
> > following piece of code in Java, and it does not work too. In the for
> > loop the 90% of times b > a while for 10% of times not.
> >
> ...
> > long a = System.currentTimeMillis();
> > long b = System.currentTimeMillis();
> > if (a > b) {
> > System.out.println("Wrong!!!!!!!!!!!!!");
> > }
>
>
> So in 10% of the cases, two successive calls yield time
> rolling BACK ?
>
> I used gettimeofday() call, and compared the original data
> from the code.
>
> At a modern uniprocessor machine I never get anything except
> monotonously increasing time (TSC is used in betwen timer ticks
> to supply time increase.) At a dual processor machine, on
> occasion I do get SAME value twice. I have never seen time
> rolling backwards.
>
> Uh.. correction: 216199245 0:-1 -- it did step backwards,
> but only once within about 216 million gettimeofday() calls.
> (I am running 2.4.19-pre8smp at the test box.)

Hmm, so it is buggy even for you. He probably has way crappier
hardware. Neptun chipsets and via chipsets have bugs in time
implementation.
Pavel
--
(about SSSCA) "I don't say this lightly. However, I really think that the U.S.
no longer is classifiable as a democracy, but rather as a plutocracy." --hpa

2002-07-19 17:14:40

by Amos Waterland

[permalink] [raw]
Subject: Re: gettimeofday problem

There may be several distinct errors that people are observing. I wrote
a program that makes 32 gettimeofday() calls in a tight loop so as to
try to catch rare anomalies, and after running for about 24 hours, it
just triggered a strange problem.

On 2.4.18-3 on a Pentium III, the following output shows a large jump in
the tv_usec value, followed by an immediate return to the monotonically
increasing sequence, but with a large sequence discontinuity.

So in this case the time is not so much "going backwards", but rather
making a huge jump forwards and then returning to the original sequence.
Hope this helps.

% ./test0035
Run: 1844080460; Timer went backwards!; dump:
1027095141 707896
1027095141 707896
1027095141 707897
1027095141 707897
1027095141 707898
1027095141 707898
1027095141 707899
1027095141 707899
1027095141 707900
1027095141 707900
1027095141 707901
1027095141 707901
1027095141 707902
1027095141 707902
1027095141 707903
1027095141 707903
1027095141 852234 <-- this is the anomaly
1027095141 712510 <-- sequence returns with 4607us discontinuity
1027095141 712510
1027095141 712511
1027095141 712511
1027095141 712512
1027095141 712512
1027095141 712513
1027095141 712513
1027095141 712514
1027095141 712514
1027095141 712515
1027095141 712515
1027095141 712516
1027095141 712516
1027095141 712517
% cat test0035.c
/* Test for timer weirdness.
* Amos Waterland <[email protected]>
* 15 July 2002
*/

#include <errno.h>
#include <error.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/time.h>

int
main (int argc, char *argv[])
{
int result = 0;

{
const int ELEMS = 32;
struct timeval tm[ELEMS];
long j, i;

while (1)
{
++j;

for (i = 0; i < ELEMS; i++)
{
gettimeofday (&(tm[i]), NULL);
}

for (i = 1; i < ELEMS; i++)
{
if ((long) tm[i].tv_sec > (long) tm[i - 1].tv_sec)
continue;

if ((long) tm[i].tv_sec < (long) tm[i - 1].tv_sec ||
(long) tm[i].tv_usec < (long) tm[i - 1].tv_usec)
{
printf ("\nRun: %li; Timer went backwards!; dump:\n", j);

for (i = 0; i < ELEMS; i++)
{
printf ("%li %li\n",
(long) tm[i].tv_sec, (long) tm[i].tv_usec);
}
}
}
}
}

exit (result);
}



On Tue, Jun 25, 2002 at 12:00:57PM +0200, Jan Hudec wrote:
> On Tue, Jun 25, 2002 at 05:17:52AM -0400, Christian Robert wrote:
> > John Alvord wrote:
> > > Maybe this is the result of floating point rounding errors. Floating
> > > point is notorious for occaisional strange results. I suggest redoing
> > > the test program to keep all results in integer and seeing what
> > > happens...
> > You were close.
> > Programming error on my part.
>
> What about comparing the struct timeval things directly? There is even a
> timercmp macro for that (well I noticed that in the manpage when I
> have olrady had the test written; the macro can only do sharp comparsions).
>
> Something like this:
> (I am now running it on three machines - Athlon 850, Pentium 1500 and dual
> Pentium III 500 - all seem to be OK so far)
>
> #include<stdio.h>
> #include<errno.h>
> #include<sys/time.h>
> #include<signal.h>
>
> volatile int loop = 1;
>
> void sigint(int foo) {
> loop = 0;
> }
>
> int main(void) {
> unsigned long long cnt = 0, bcnt = 0, ecnt = 0;
> struct timeval old, new = {0, 0};
>
> signal(SIGINT, sigint);
> while(loop && cnt < (1LLU<<54)) {
> cnt++;
> old = new;
> if(gettimeofday(&new, NULL)) {
> ecnt++;
> printf("Error #%llu: count=%llu"
> " error/count=0.%04llu errno=%i (%s)\n",
> ecnt, cnt, (10000*ecnt)/cnt, errno,
> sys_errlist[errno]);
> continue;
> }
> if((new.tv_sec < old.tv_sec) || ((new.tv_sec ==
> old.tv_sec) && (new.tv_usec < old.tv_usec))) { bcnt++;
> printf("Skew #%llu: count=%llu errors=%llu"
> " skew/good count=0.%04llu, new=(%li,"
> " %li) old=(%li, %li)\n", bcnt, cnt,
> ecnt, (10000*bcnt)/(cnt-ecnt),
> new.tv_sec, new.tv_usec, old.tv_sec,
> old.tv_usec);
> }
> }
>
> printf("Counted %llu, errors %llu (0.%04llu), skews %llu"
> " (0.%04llu)\n", cnt, ecnt, (10000*ecnt)/cnt, bcnt,
> (10000*bcnt)/(cnt-ecnt));
> return 0;
> }