2015-02-19 18:35:38

by Nishanth Aravamudan

[permalink] [raw]
Subject: time / gtod seconds value out of sync?

Hi John,

We're seeing an interesting issue with the openposix testcase
difftime/1-1, which basically calls gtod/time, sleeps, calls time/gtod,
then difftime and sees if they disagree. The issue occurs with either
vDSO implementations or direct syscalls.

We are seeing failures on ppc64le and x86_64 (probably other places too,
just not tested yet), because (I'm pretty sure), the time() syscalls
granularity is not accounting for the nsecs value at all. Instead, it
just returns get_seconds().

In one case, I see, in sys_time():

[ 313.001823] NACC: timekeeping_get_ns = 1000121642
[ 314.001889] NACC: timekeeping_get_ns = 188401

gtod correctly accumulates those nsecs into the secs value:

ts->tv_sec = tk->xtime_sec;
nsecs = timekeeping_get_ns(&tk->tkr);
ts->tv_nsec = 0;
timespec64_add_ns(ts, nsecs);

but time() does:

return tk->xtime_sec;

It seems like overkill to do the full timekeeping_get_ns() in time(),
but maybe it's also necessary to account for leap seconds? That is, we
need to ensure that accumulate_nsecs_to_secs() has been called before
return tk->xtime_sec?

Thoughts?

Thanks,
Nish


2015-02-19 19:03:28

by John Stultz

[permalink] [raw]
Subject: Re: time / gtod seconds value out of sync?

Hey Nish! Long time!

On Thu, Feb 19, 2015 at 10:35 AM, Nishanth Aravamudan
<[email protected]> wrote:
> Hi John,
>
> We're seeing an interesting issue with the openposix testcase
> difftime/1-1, which basically calls gtod/time, sleeps, calls time/gtod,

I'm not familiar with the test... Do you have a link?

> then difftime and sees if they disagree. The issue occurs with either
> vDSO implementations or direct syscalls.
>
> We are seeing failures on ppc64le and x86_64 (probably other places too,
> just not tested yet), because (I'm pretty sure), the time() syscalls
> granularity is not accounting for the nsecs value at all. Instead, it
> just returns get_seconds().


Right, so there is always a problem mixing calls of different
granularity (similar issues crop up with gettimeofday() and filesystem
timestamps), so the basis of the test worries me a little bit from the
description, but I'd have to look at it to really get a sense.


> In one case, I see, in sys_time():
>
> [ 313.001823] NACC: timekeeping_get_ns = 1000121642
> [ 314.001889] NACC: timekeeping_get_ns = 188401
>
> gtod correctly accumulates those nsecs into the secs value:
>
> ts->tv_sec = tk->xtime_sec;
> nsecs = timekeeping_get_ns(&tk->tkr);
> ts->tv_nsec = 0;
> timespec64_add_ns(ts, nsecs);
>
> but time() does:
>
> return tk->xtime_sec;
>
> It seems like overkill to do the full timekeeping_get_ns() in time(),

Right, so looking into the git history,
f20bf6125605acbbc7eb8c9420d7221c91aa83eb (time: introduce
xtime_seconds) changed this specifically for performance reasons
(cc'ed Ingo here, in case he remembers more context).

The idea that time() would be ok as being HZ granular, and its been
this way since 2.6.23. Thus you have a < HZ sized window where
gettimeofday() will return the next second before time() gets updated
by the tick.

> but maybe it's also necessary to account for leap seconds? That is, we
> need to ensure that accumulate_nsecs_to_secs() has been called before
> return tk->xtime_sec?

So leapseconds are also applied at tick time, so I don't think you'd
see any different behavior with them.

There was a thread on this quite awhile back and I if I recall I think
the general consensus was to keep time() tick granular (so it aligns
with filesystem timestamps) and gettimeofday() hardware granular.
Though we also introduced the CLOCK_REALTIME_COARSE to match
sub-second filesystem timestamps as well.

So yea... I don't think we want to make a change here, but maybe I'm
not understanding the underlying issue... so feel free to push back
here. :)

thanks
-john

2015-02-19 19:16:07

by Ingo Molnar

[permalink] [raw]
Subject: Re: time / gtod seconds value out of sync?


* John Stultz <[email protected]> wrote:

> > [ 313.001823] NACC: timekeeping_get_ns = 1000121642
> > [ 314.001889] NACC: timekeeping_get_ns = 188401
> >
> > gtod correctly accumulates those nsecs into the secs value:
> >
> > ts->tv_sec = tk->xtime_sec;
> > nsecs = timekeeping_get_ns(&tk->tkr);
> > ts->tv_nsec = 0;
> > timespec64_add_ns(ts, nsecs);
> >
> > but time() does:
> >
> > return tk->xtime_sec;
> >
> > It seems like overkill to do the full timekeeping_get_ns() in time(),
>
> Right, so looking into the git history,
> f20bf6125605acbbc7eb8c9420d7221c91aa83eb (time: introduce
> xtime_seconds) changed this specifically for performance
> reasons (cc'ed Ingo here, in case he remembers more
> context).
>
> The idea that time() would be ok as being HZ granular,
> and its been this way since 2.6.23. Thus you have a < HZ
> sized window where gettimeofday() will return the next
> second before time() gets updated by the tick.

Yes, and the scalability advantage is significant if you
have an app that calls time() often. Undoing that would
certainly make me sad.

Thanks,

Ingo

2015-02-19 19:29:41

by Nishanth Aravamudan

[permalink] [raw]
Subject: Re: time / gtod seconds value out of sync?

Hi John!

On 19.02.2015 [11:03:26 -0800], John Stultz wrote:
> Hey Nish! Long time!

yep :)

> On Thu, Feb 19, 2015 at 10:35 AM, Nishanth Aravamudan
> <[email protected]> wrote:
> > Hi John,
> >
> > We're seeing an interesting issue with the openposix testcase
> > difftime/1-1, which basically calls gtod/time, sleeps, calls time/gtod,
>
> I'm not familiar with the test... Do you have a link?

Sorry about that:

http://sourceforge.net/projects/posixtest/files/posixtest/posixtestsuite-1.5.2/posixtestsuite-1.5.2.tar.gz/download

conformance/interfaces/difftime/1-1.c

It takes quite a few iterations in a loop, but it eventually fails with
mainline on both x86-64 and ppc64le.

> > then difftime and sees if they disagree. The issue occurs with either
> > vDSO implementations or direct syscalls.
> >
> > We are seeing failures on ppc64le and x86_64 (probably other places too,
> > just not tested yet), because (I'm pretty sure), the time() syscalls
> > granularity is not accounting for the nsecs value at all. Instead, it
> > just returns get_seconds().
>
>
> Right, so there is always a problem mixing calls of different
> granularity (similar issues crop up with gettimeofday() and filesystem
> timestamps), so the basis of the test worries me a little bit from the
> description, but I'd have to look at it to really get a sense.

Yep, that makes sense to me too -- the only concern I had was that the
point where time() is returning X seconds, a simultaneous (theoretical)
call to gtod() would return the correct X+1 seconds (presuming nsecs had
exceeded 1000000000).

> > In one case, I see, in sys_time():
> >
> > [ 313.001823] NACC: timekeeping_get_ns = 1000121642
> > [ 314.001889] NACC: timekeeping_get_ns = 188401
> >
> > gtod correctly accumulates those nsecs into the secs value:
> >
> > ts->tv_sec = tk->xtime_sec;
> > nsecs = timekeeping_get_ns(&tk->tkr);
> > ts->tv_nsec = 0;
> > timespec64_add_ns(ts, nsecs);
> >
> > but time() does:
> >
> > return tk->xtime_sec;
> >
> > It seems like overkill to do the full timekeeping_get_ns() in time(),
>
> Right, so looking into the git history,
> f20bf6125605acbbc7eb8c9420d7221c91aa83eb (time: introduce
> xtime_seconds) changed this specifically for performance reasons
> (cc'ed Ingo here, in case he remembers more context).

Ah I see. I can see the performance impact of calling into gtod being
high.

> The idea that time() would be ok as being HZ granular, and its been
> this way since 2.6.23. Thus you have a < HZ sized window where
> gettimeofday() will return the next second before time() gets updated
> by the tick.

Right.

> > but maybe it's also necessary to account for leap seconds? That is, we
> > need to ensure that accumulate_nsecs_to_secs() has been called before
> > return tk->xtime_sec?
>
> So leapseconds are also applied at tick time, so I don't think you'd
> see any different behavior with them.

Yep, ok.

> There was a thread on this quite awhile back and I if I recall I think
> the general consensus was to keep time() tick granular (so it aligns
> with filesystem timestamps) and gettimeofday() hardware granular.
> Though we also introduced the CLOCK_REALTIME_COARSE to match
> sub-second filesystem timestamps as well.
>
> So yea... I don't think we want to make a change here, but maybe I'm
> not understanding the underlying issue... so feel free to push back
> here. :)

Oh that's fine. I mostly wanted the subsystem experts to chime in on if
the the testcase was valid, etc.

Jan, do you have any other concerns?

Thanks,
Nish

2015-02-20 13:32:03

by Jan Stancek

[permalink] [raw]
Subject: Re: time / gtod seconds value out of sync?





----- Original Message -----
> From: "Nishanth Aravamudan" <[email protected]>
> To: "John Stultz" <[email protected]>
> Cc: "Thomas Gleixner" <[email protected]>, "lkml" <[email protected]>, [email protected], "Ingo Molnar"
> <[email protected]>
> Sent: Thursday, 19 February, 2015 8:28:40 PM
> Subject: Re: time / gtod seconds value out of sync?
>
> Hi John!
>
> On 19.02.2015 [11:03:26 -0800], John Stultz wrote:
> > Hey Nish! Long time!
>
> yep :)
>
> > On Thu, Feb 19, 2015 at 10:35 AM, Nishanth Aravamudan
> > <[email protected]> wrote:
> > > Hi John,
> > >
> > > We're seeing an interesting issue with the openposix testcase
> > > difftime/1-1, which basically calls gtod/time, sleeps, calls time/gtod,
> >
> > I'm not familiar with the test... Do you have a link?
>
> Sorry about that:
>
> http://sourceforge.net/projects/posixtest/files/posixtest/posixtestsuite-1.5.2/posixtestsuite-1.5.2.tar.gz/download
>
> conformance/interfaces/difftime/1-1.c
>
> It takes quite a few iterations in a loop, but it eventually fails with
> mainline on both x86-64 and ppc64le.
>
> > > then difftime and sees if they disagree. The issue occurs with either
> > > vDSO implementations or direct syscalls.
> > >
> > > We are seeing failures on ppc64le and x86_64 (probably other places too,
> > > just not tested yet), because (I'm pretty sure), the time() syscalls
> > > granularity is not accounting for the nsecs value at all. Instead, it
> > > just returns get_seconds().
> >
> >
> > Right, so there is always a problem mixing calls of different
> > granularity (similar issues crop up with gettimeofday() and filesystem
> > timestamps), so the basis of the test worries me a little bit from the
> > description, but I'd have to look at it to really get a sense.
>
> Yep, that makes sense to me too -- the only concern I had was that the
> point where time() is returning X seconds, a simultaneous (theoretical)
> call to gtod() would return the correct X+1 seconds (presuming nsecs had
> exceeded 1000000000).
>
> > > In one case, I see, in sys_time():
> > >
> > > [ 313.001823] NACC: timekeeping_get_ns = 1000121642
> > > [ 314.001889] NACC: timekeeping_get_ns = 188401
> > >
> > > gtod correctly accumulates those nsecs into the secs value:
> > >
> > > ts->tv_sec = tk->xtime_sec;
> > > nsecs = timekeeping_get_ns(&tk->tkr);
> > > ts->tv_nsec = 0;
> > > timespec64_add_ns(ts, nsecs);
> > >
> > > but time() does:
> > >
> > > return tk->xtime_sec;
> > >
> > > It seems like overkill to do the full timekeeping_get_ns() in time(),
> >
> > Right, so looking into the git history,
> > f20bf6125605acbbc7eb8c9420d7221c91aa83eb (time: introduce
> > xtime_seconds) changed this specifically for performance reasons
> > (cc'ed Ingo here, in case he remembers more context).
>
> Ah I see. I can see the performance impact of calling into gtod being
> high.
>
> > The idea that time() would be ok as being HZ granular, and its been
> > this way since 2.6.23. Thus you have a < HZ sized window where
> > gettimeofday() will return the next second before time() gets updated
> > by the tick.

Thanks John for the explanation, this made it clear for me.
I wonder if this wouldn't be helpful NOTE in time(2) man page.

>
> Right.
>
> > > but maybe it's also necessary to account for leap seconds? That is, we
> > > need to ensure that accumulate_nsecs_to_secs() has been called before
> > > return tk->xtime_sec?
> >
> > So leapseconds are also applied at tick time, so I don't think you'd
> > see any different behavior with them.
>
> Yep, ok.
>
> > There was a thread on this quite awhile back and I if I recall I think
> > the general consensus was to keep time() tick granular (so it aligns
> > with filesystem timestamps) and gettimeofday() hardware granular.
> > Though we also introduced the CLOCK_REALTIME_COARSE to match
> > sub-second filesystem timestamps as well.
> >
> > So yea... I don't think we want to make a change here, but maybe I'm
> > not understanding the underlying issue... so feel free to push back
> > here. :)
>
> Oh that's fine. I mostly wanted the subsystem experts to chime in on if
> the the testcase was valid, etc.
>
> Jan, do you have any other concerns?

No, I don't.

Regards,
Jan

>
> Thanks,
> Nish
>
>