2007-06-25 20:06:39

by Ingo Molnar

[permalink] [raw]
Subject: [patch, v2.6.22-rc6] sys_time() speedup

Subject: [patch] sys_time() speedup
From: Ingo Molnar <[email protected]>

improve performance of sys_time(). sys_time() returns time in seconds,
but it does so by calling do_gettimeofday() and then returning the
tv_sec portion of the GTOD time. But the data structure "xtime", which
is updated by every timer/scheduler tick, already offers HZ granularity
time.

the patch improves the sysbench OLTP macrobenchmark significantly:

2.6.22-rc6:

#threads
1: transactions: 3733 (373.21 per sec.)
2: transactions: 6676 (667.46 per sec.)
3: transactions: 6957 (695.50 per sec.)
4: transactions: 7055 (705.48 per sec.)
5: transactions: 6596 (659.33 per sec.)

2.6.22-rc6 + sys_time.patch:

1: transactions: 4005 (400.47 per sec.)
2: transactions: 7379 (737.77 per sec.)
3: transactions: 7347 (734.49 per sec.)
4: transactions: 7468 (746.65 per sec.)
5: transactions: 7428 (742.47 per sec.)

mixed API uses of gettimeofday() and time() are guaranteed to be
coherent via the use of a at-most-once-per-second slowpath that updates
xtime.

Signed-off-by: Ingo Molnar <[email protected]>
---
kernel/time.c | 27 ++++++++++++++++++++++-----
1 file changed, 22 insertions(+), 5 deletions(-)

Index: linux/kernel/time.c
===================================================================
--- linux.orig/kernel/time.c
+++ linux/kernel/time.c
@@ -57,14 +57,17 @@ EXPORT_SYMBOL(sys_tz);
*/
asmlinkage long sys_time(time_t __user * tloc)
{
- time_t i;
- struct timeval tv;
+ /*
+ * We read xtime.tv_sec atomically - it's updated
+ * atomically by update_wall_time(), so no need to
+ * even read-lock the xtime seqlock:
+ */
+ time_t i = xtime.tv_sec;

- do_gettimeofday(&tv);
- i = tv.tv_sec;
+ smp_rmb(); /* sys_time() results are coherent */

if (tloc) {
- if (put_user(i,tloc))
+ if (put_user(i, tloc))
i = -EFAULT;
}
return i;
@@ -373,6 +376,20 @@ void do_gettimeofday (struct timeval *tv

tv->tv_sec = sec;
tv->tv_usec = usec;
+
+ /*
+ * Make sure xtime.tv_sec [returned by sys_time()] always
+ * follows the gettimeofday() result precisely. This
+ * condition is extremely unlikely, it can hit at most
+ * once per second:
+ */
+ if (unlikely(xtime.tv_sec != tv->tv_sec)) {
+ unsigned long flags;
+
+ write_seqlock_irqsave(&xtime_lock);
+ update_wall_time();
+ write_seqlock_irqrestore(&xtime_lock);
+ }
}

EXPORT_SYMBOL(do_gettimeofday);


2007-06-25 21:10:32

by Roman Zippel

[permalink] [raw]
Subject: Re: [patch, v2.6.22-rc6] sys_time() speedup

Hi,

On Monday 25 June 2007, Ingo Molnar wrote:

> the patch improves the sysbench OLTP macrobenchmark significantly:

Has that any real practical relevance?

> @@ -373,6 +376,20 @@ void do_gettimeofday (struct timeval *tv
>
> tv->tv_sec = sec;
> tv->tv_usec = usec;
> +
> + /*
> + * Make sure xtime.tv_sec [returned by sys_time()] always
> + * follows the gettimeofday() result precisely. This
> + * condition is extremely unlikely, it can hit at most
> + * once per second:
> + */
> + if (unlikely(xtime.tv_sec != tv->tv_sec)) {
> + unsigned long flags;
> +
> + write_seqlock_irqsave(&xtime_lock);
> + update_wall_time();
> + write_seqlock_irqrestore(&xtime_lock);
> + }
> }
>
> EXPORT_SYMBOL(do_gettimeofday);

Is this the do_gettimeofday() inside CONFIG_TIME_INTERPOLATION?
What did you test?
There can be many ways to read the clock, do you want to put this hook
everywhere? Wouldn't it be better to improve the clock performance?

bye, Roman

2007-06-25 21:18:16

by Jesper Juhl

[permalink] [raw]
Subject: Re: [patch, v2.6.22-rc6] sys_time() speedup

On 25/06/07, Roman Zippel <[email protected]> wrote:
> Hi,
>
> On Monday 25 June 2007, Ingo Molnar wrote:
>
> > the patch improves the sysbench OLTP macrobenchmark significantly:
>
> Has that any real practical relevance?
>
It seems to me that Ingo's patch offers slightly improved performance
for any program using the time() system call, with no real drawbacks,
so why wouldn't we want to use it?

--
Jesper Juhl <[email protected]>
Don't top-post http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please http://www.expita.com/nomime.html

2007-06-25 22:00:27

by Roman Zippel

[permalink] [raw]
Subject: Re: [patch, v2.6.22-rc6] sys_time() speedup

Hi,

On Mon, 25 Jun 2007, Jesper Juhl wrote:

> > On Monday 25 June 2007, Ingo Molnar wrote:
> >
> > > the patch improves the sysbench OLTP macrobenchmark significantly:
> >
> > Has that any real practical relevance?
> >
> It seems to me that Ingo's patch offers slightly improved performance
> for any program using the time() system call, with no real drawbacks,
> so why wouldn't we want to use it?

How do you come to the conclusion it has no real drawbacks?
Ingo provided no information about his test setup and his patch was a
little strange, so I can't say that yet.

bye, Roman

2007-06-25 22:03:40

by Eric Dumazet

[permalink] [raw]
Subject: Re: [patch, v2.6.22-rc6] sys_time() speedup

Ingo Molnar a ?crit :
> Subject: [patch] sys_time() speedup
> From: Ingo Molnar <[email protected]>
>
> improve performance of sys_time(). sys_time() returns time in seconds,
> but it does so by calling do_gettimeofday() and then returning the
> tv_sec portion of the GTOD time. But the data structure "xtime", which
> is updated by every timer/scheduler tick, already offers HZ granularity
> time.
>
> the patch improves the sysbench OLTP macrobenchmark significantly:
>
> 2.6.22-rc6:
>
> #threads
> 1: transactions: 3733 (373.21 per sec.)
> 2: transactions: 6676 (667.46 per sec.)
> 3: transactions: 6957 (695.50 per sec.)
> 4: transactions: 7055 (705.48 per sec.)
> 5: transactions: 6596 (659.33 per sec.)
>
> 2.6.22-rc6 + sys_time.patch:
>
> 1: transactions: 4005 (400.47 per sec.)
> 2: transactions: 7379 (737.77 per sec.)
> 3: transactions: 7347 (734.49 per sec.)
> 4: transactions: 7468 (746.65 per sec.)
> 5: transactions: 7428 (742.47 per sec.)
>
> mixed API uses of gettimeofday() and time() are guaranteed to be
> coherent via the use of a at-most-once-per-second slowpath that updates
> xtime.
>
> Signed-off-by: Ingo Molnar <[email protected]>
> ---
> kernel/time.c | 27 ++++++++++++++++++++++-----
> 1 file changed, 22 insertions(+), 5 deletions(-)
>
> Index: linux/kernel/time.c
> ===================================================================
> --- linux.orig/kernel/time.c
> +++ linux/kernel/time.c
> @@ -57,14 +57,17 @@ EXPORT_SYMBOL(sys_tz);
> */
> asmlinkage long sys_time(time_t __user * tloc)
> {
> - time_t i;
> - struct timeval tv;
> + /*
> + * We read xtime.tv_sec atomically - it's updated
> + * atomically by update_wall_time(), so no need to
> + * even read-lock the xtime seqlock:
> + */
> + time_t i = xtime.tv_sec;
>
> - do_gettimeofday(&tv);
> - i = tv.tv_sec;
> + smp_rmb(); /* sys_time() results are coherent */
>
> if (tloc) {
> - if (put_user(i,tloc))
> + if (put_user(i, tloc))
> i = -EFAULT;
> }
> return i;
> @@ -373,6 +376,20 @@ void do_gettimeofday (struct timeval *tv
>
> tv->tv_sec = sec;
> tv->tv_usec = usec;
> +
> + /*
> + * Make sure xtime.tv_sec [returned by sys_time()] always
> + * follows the gettimeofday() result precisely. This
> + * condition is extremely unlikely, it can hit at most
> + * once per second:
> + */

Unfortunatly, some arches (x86_64) can call both sys_time() and vgettimeofday().

And vgettimeofday() cannot update xtime (its mapped readonly in vsyscall
page), so the coherency wont be guaranted.

Also, I thought glibc time(0) was calling gettimeofday() on x86_64, so I
wonder on which machine you got your bench results.

Are you still using a 32 bits platform ? :)

2007-06-25 22:16:25

by Andrew Morton

[permalink] [raw]
Subject: Re: [patch, v2.6.22-rc6] sys_time() speedup

On Mon, 25 Jun 2007 23:09:46 +0200
Roman Zippel <[email protected]> wrote:

> Hi,
>
> On Monday 25 June 2007, Ingo Molnar wrote:
>
> > the patch improves the sysbench OLTP macrobenchmark significantly:
>
> Has that any real practical relevance?

Interesting question. The patch adds a new test-n-branch to gettimeofday()
so if gettimeofday() is used much more frequently than time(), we lose.

> > @@ -373,6 +376,20 @@ void do_gettimeofday (struct timeval *tv
> >
> > tv->tv_sec = sec;
> > tv->tv_usec = usec;
> > +
> > + /*
> > + * Make sure xtime.tv_sec [returned by sys_time()] always
> > + * follows the gettimeofday() result precisely. This
> > + * condition is extremely unlikely, it can hit at most
> > + * once per second:
> > + */
> > + if (unlikely(xtime.tv_sec != tv->tv_sec)) {
> > + unsigned long flags;
> > +
> > + write_seqlock_irqsave(&xtime_lock);
> > + update_wall_time();
> > + write_seqlock_irqrestore(&xtime_lock);
> > + }
> > }
> >
> > EXPORT_SYMBOL(do_gettimeofday);
>
> Is this the do_gettimeofday() inside CONFIG_TIME_INTERPOLATION?

Yes.

> What did you test?
> There can be many ways to read the clock, do you want to put this hook
> everywhere?

Yeah, it isn't immediately obvious (to this little black duck) why similar
fixups weren't needed in timekeeping.c.


2007-06-25 22:20:44

by Jesper Juhl

[permalink] [raw]
Subject: Re: [patch, v2.6.22-rc6] sys_time() speedup

On 26/06/07, Roman Zippel <[email protected]> wrote:
> Hi,
>
> On Mon, 25 Jun 2007, Jesper Juhl wrote:
>
> > > On Monday 25 June 2007, Ingo Molnar wrote:
> > >
> > > > the patch improves the sysbench OLTP macrobenchmark significantly:
> > >
> > > Has that any real practical relevance?
> > >
> > It seems to me that Ingo's patch offers slightly improved performance
> > for any program using the time() system call, with no real drawbacks,
> > so why wouldn't we want to use it?
>
> How do you come to the conclusion it has no real drawbacks?

His change to do_gettimeofday() will of course slow that path down a
tiny bit since he's adding an extra 'if', but since it's wrapped in
unlikely() and should hit at most one time pr second I would guess the
performance impact of that to be negligible.
The change to sys_time() does away with some local variables and
replaces the call to do_gettimeofday() with a memory barrier and a
simple read of xtime.tv_se. I find it hard to believe (although I have
not tested it) that that wouldn't be faster than the original.

So that's how I came to that conclusion; just reading the patch, going
over what it does in my head and thinking about it a bit. Not the
most scientific of things I admit.

Even if it is not faster, what would make it slower? Have you spotted
something I have not?

> Ingo provided no information about his test setup and his patch was a
> little strange, so I can't say that yet.
>
What did you find strange about it? I'm currious.
Sure it needs testing and of course it would be nice with some more
details on Ingo's test setup.


--
Jesper Juhl <[email protected]>
Don't top-post http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please http://www.expita.com/nomime.html

2007-06-25 22:49:40

by Roman Zippel

[permalink] [raw]
Subject: Re: [patch, v2.6.22-rc6] sys_time() speedup

Hi,

On Tue, 26 Jun 2007, Jesper Juhl wrote:

> Even if it is not faster, what would make it slower? Have you spotted
> something I have not?

There are other ways to read the clock and would require similiar
synchronization hooks.
Some archs can implement sys_time() in userspace, so there this change
would be useless.
I don't know what clock was used in the test, so maybe it can be replaced
with a faster clock.

AFAICT OLTP is not really a common application for most users, so there
may be other ways to optimize this special case. Just reading the patch
isn't enough here, you have to look at the whole picture and some pieces
are still missing...

bye, Roman

2007-06-26 00:22:50

by Mark Lord

[permalink] [raw]
Subject: Re: [patch, v2.6.22-rc6] sys_time() speedup

Ingo Molnar wrote:
> Subject: [patch] sys_time() speedup
> From: Ingo Molnar <[email protected]>
>
> improve performance of sys_time(). sys_time() returns time in seconds,
> but it does so by calling do_gettimeofday() and then returning the
> tv_sec portion of the GTOD time. But the data structure "xtime", which
> is updated by every timer/scheduler tick, already offers HZ granularity
> time.

How well synchronized is xtime with real-time ?

Programs invoking sys_time() do expect it to be as accurate
as gettimeofday(), even if only at 1-second boundaries.

Cheers

2007-06-26 02:20:45

by Stephen Rothwell

[permalink] [raw]
Subject: Re: [patch, v2.6.22-rc6] sys_time() speedup

On Mon, 25 Jun 2007 15:15:08 -0700 Andrew Morton <[email protected]> wrote:
>
> On Mon, 25 Jun 2007 23:09:46 +0200
> Roman Zippel <[email protected]> wrote:
>
> > On Monday 25 June 2007, Ingo Molnar wrote:
> >
> > > the patch improves the sysbench OLTP macrobenchmark significantly:
> >
> > Has that any real practical relevance?
>
> Interesting question. The patch adds a new test-n-branch to gettimeofday()
> so if gettimeofday() is used much more frequently than time(), we lose.

Isn't gettimeofday() called *lots* by the X server and programs - one of
the reasons we bother putting it in the VDSO.

--
Cheers,
Stephen Rothwell [email protected]
http://www.canb.auug.org.au/~sfr/


Attachments:
(No filename) (724.00 B)
(No filename) (189.00 B)
Download all attachments

2007-06-26 14:59:33

by Ingo Molnar

[permalink] [raw]
Subject: Re: [patch, v2.6.22-rc6] sys_time() speedup


* Mark Lord <[email protected]> wrote:

> Ingo Molnar wrote:
> >Subject: [patch] sys_time() speedup
> >From: Ingo Molnar <[email protected]>
> >
> >improve performance of sys_time(). sys_time() returns time in seconds,
> >but it does so by calling do_gettimeofday() and then returning the
> >tv_sec portion of the GTOD time. But the data structure "xtime", which
> >is updated by every timer/scheduler tick, already offers HZ granularity
> >time.
>
> How well synchronized is xtime with real-time ?

it's updated by every jiffy.

> Programs invoking sys_time() do expect it to be as accurate as
> gettimeofday(), even if only at 1-second boundaries.

yes.

Ingo

2007-06-26 15:27:17

by Ingo Molnar

[permalink] [raw]
Subject: Re: [patch, v2.6.22-rc6] sys_time() speedup


* Andrew Morton <[email protected]> wrote:

> > > the patch improves the sysbench OLTP macrobenchmark significantly:
> >
> > Has that any real practical relevance?
>
> Interesting question. [...]

i'm missing the <sarcastic> tag i guess ;-)

<sarcastic> Oh my, does database macro-performance have any relevance to
Linux bread and butter markets in general. Boggle, it is a really
difficult question i suspect. </sarcastic>

If we ignore those few million database and web server Linux boxes on
the market and concentrate purely on the few m68k boxes that are still
in existance, _then_ we might be doubtful about this question ;-)

> [...] The patch adds a new test-n-branch to gettimeofday() so if
> gettimeofday() is used much more frequently than time(), we lose.

given that the cost to sys_gettimeofday() is less than a cycle (we test
a value already in a register, with an unlikely hint), and the benefit
to sys_time() is around 6000 cycles (or more), sys_gettimeofday() would
have to be used thousands of times more frequently than sys_time() -
which it clearly isnt. As a test i just triggered a really X-intense
workload and for that gettimeofday-dominated landscape there was still 1
sys_time() call for every 50 gettimeofday calls - so it's a small win
even for this X workload.

Ingo

2007-06-26 15:43:42

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: [patch, v2.6.22-rc6] sys_time() speedup

On Mon, Jun 25, 2007 at 03:15:08PM -0700, Andrew Morton wrote:
> Interesting question. The patch adds a new test-n-branch to gettimeofday()
> so if gettimeofday() is used much more frequently than time(), we lose.

I think gettimeofday is generally used much more frequently than
time. Real db calls gettimeofday not time, infact some real db related
app even go as far as calling rdtsc directly (on hardware where the
tsc is synchronized). What's the point of calling time so many times
per second when it'll always return the same value anyway?

I think this is a case of the simulator not simulating the real
workload and hence that should be fixed instead of optimizing for the
erratic simulator.

Just place a systemtap for time and gettimeofday, run a real db or a
videogame and then show the number of time vs gettimeofday calls.

2007-06-26 16:19:52

by Ingo Molnar

[permalink] [raw]
Subject: Re: [patch, v2.6.22-rc6] sys_time() speedup


* Jesper Juhl <[email protected]> wrote:

> > Ingo provided no information about his test setup and his patch was
> > a little strange, so I can't say that yet.
>
> What did you find strange about it? I'm currious. Sure it needs
> testing and of course it would be nice with some more details on
> Ingo's test setup.

My patch improves MySQL wall-clock performance by ~10% on a dual-core
box [see the numbers i cited in my initial mail, they were ran on a T60
with a 1.83 GHz Core2Duo] and by 7% on an 8-way box:

2.6.22-rc6:

#threads
9: transactions: 8440 (843.29 per sec.)
9: transactions: 8423 (841.18 per sec.)
9: transactions: 8511 (849.98 per sec.)
9: transactions: 8473 (846.23 per sec.)

2.6.22-rc6 + sys_time.patch:

#threads
9: transactions: 9043 (903.36 per sec.)
9: transactions: 9020 (900.78 per sec.)
9: transactions: 8974 (896.61 per sec.)
9: transactions: 9007 (899.97 per sec.)

[ to reproduce it, run sysbench 0.4.8 with --test=oltp --num-threads=9.
The other tests show similar speedup, so this is in no way limited to
OLTP. ]

in other words, if you are using MySQL in a serious way then this patch
provides you a real-world speedup equivalent to upgrading a 1.66 GHz
Core2Duo to a 1.83 GHz Core2Duo. I'd call that anything but "slightly
improved performance" ;-)

if you are curious why Roman's reaction to this patch was so negative:
i'm extremely curious myself too! ;-) That man, with his eternal
negativism (i dare anyone to point me to a _single_ lkml posting of
Roman where he gives any positive feedback to anyone) is a pure walking
mystery to me ;)

( whether there is any correlation between a decade long fundamental
suckage and stagnation of the Linux time and NTP subsystem and Roman's
decade long negative feedback presence in that area of code is left up
to the reader. :)

This current ... interesting piece of Roman about a _single_ trivial
unlikely() branch in do_gettimeofday() borders on the ridiculous. My
patch might be wrong for various reasons, but that single
'if (unlikely())' statement is not one of those reasons =B-)

Ingo

2007-06-26 16:39:34

by Roman Zippel

[permalink] [raw]
Subject: Re: [patch, v2.6.22-rc6] sys_time() speedup

Hi,

On Tue, 26 Jun 2007, Ingo Molnar wrote:

> if you are curious why Roman's reaction to this patch was so negative:

Instead of answering all the open questions, pretty much the second thing
you do is to discredit me personally. :-(
BTW there is a difference between critical and negative...

bye, Roman

2007-06-26 16:49:33

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: [patch, v2.6.22-rc6] sys_time() speedup

On Tue, Jun 26, 2007 at 06:18:57PM +0200, Ingo Molnar wrote:
> My patch improves MySQL wall-clock performance by ~10% on a dual-core
> box [see the numbers i cited in my initial mail, they were ran on a T60
> with a 1.83 GHz Core2Duo] and by 7% on an 8-way box:

mysql isn't froznen in stone like some commercial db, you can look into its
source, the testsuite isn't either I guess. Did you ask yourself at
least once why it's calling time() so many times per second? The
timestamp sql type should always trigger gettimeofday calls AFIK.

If there's a good reason to call time so frequently in an important
app like mysql, then your patch sure is a good idea, but at first
glance it looks fishy that time is so performance critical.

I'm not objecting the patch itself, if there's a legitimate reason to
call time so frequently that's sure a fine optimization despite the
branch in gettimeofday, but I'm asking why time is called so many
times on this specific workload, because I'm fairly certain that in
average (desktop and server) gettimeofday is called much more
frequently and if it was up to me to tell, I would expect time
microoptimizations to result in irrelevant performance
differences. Infact I seem to recall that even other commercial dbs
calls tends to call floods of gettimeofday while I can't recall any
time in the strace output. So unless there is a legitimate reason to
call time() dozen thousand times per second, if it was my choice, I
would prefer to have gettimeofday as fast as it can be. Not that a
branch will make any measurable difference, but still that would be my
choice.

But perhaps you already know why time is called so frequently, I
certainly don't, nor I would expect it.

Thanks.

2007-06-26 16:59:34

by john stultz

[permalink] [raw]
Subject: Re: [patch, v2.6.22-rc6] sys_time() speedup

On Tue, 2007-06-26 at 16:58 +0200, Ingo Molnar wrote:
> * Mark Lord <[email protected]> wrote:
>
> > Ingo Molnar wrote:
> > >Subject: [patch] sys_time() speedup
> > >From: Ingo Molnar <[email protected]>
> > >
> > >improve performance of sys_time(). sys_time() returns time in seconds,
> > >but it does so by calling do_gettimeofday() and then returning the
> > >tv_sec portion of the GTOD time. But the data structure "xtime", which
> > >is updated by every timer/scheduler tick, already offers HZ granularity
> > >time.
> >
> > How well synchronized is xtime with real-time ?
>
> it's updated by every jiffy.

Well, NTP_INTERVAL_LENGTH to be specific which is every jiffie with one
exception: With dynticks this is a bit more complicated, and xtime is
updated only twice a second.

thanks
-john

2007-06-26 17:08:20

by Roman Zippel

[permalink] [raw]
Subject: Re: [patch, v2.6.22-rc6] sys_time() speedup

Hi,

On Tue, 26 Jun 2007, Ingo Molnar wrote:

Another BTW before someone takes this seriously:

> ( whether there is any correlation between a decade long fundamental
> suckage and stagnation of the Linux time and NTP subsystem and Roman's
> decade long negative feedback presence in that area of code is left up
> to the reader. :)

That's complete bullshit.

> This current ... interesting piece of Roman about a _single_ trivial
> unlikely() branch in do_gettimeofday() borders on the ridiculous. My
> patch might be wrong for various reasons, but that single
> 'if (unlikely())' statement is not one of those reasons =B-)

That's even more nonsense, that wasn't what my mail was about and Andrew
understood me correctly, so you could have too.

Ingo, I at least know that I'm difficult to deal with and try to take this
into account, which is hard for me, but you don't even seem to know what
kind of ass you are towards people who don't suck up to you (I guess you
simply get away with it far too often).

bye, Roman

2007-06-26 17:13:40

by Ray Lee

[permalink] [raw]
Subject: Re: [patch, v2.6.22-rc6] sys_time() speedup

On 6/26/07, Andrea Arcangeli <[email protected]> wrote:
> On Tue, Jun 26, 2007 at 06:18:57PM +0200, Ingo Molnar wrote:
> > My patch improves MySQL wall-clock performance by ~10% on a dual-core
> > box [see the numbers i cited in my initial mail, they were ran on a T60
> > with a 1.83 GHz Core2Duo] and by 7% on an 8-way box:
>
> mysql isn't froznen in stone like some commercial db, you can look into its
> source, the testsuite isn't either I guess. Did you ask yourself at
> least once why it's calling time() so many times per second? The
> timestamp sql type should always trigger gettimeofday calls AFIK.
>
> If there's a good reason to call time so frequently in an important
> app like mysql, then your patch sure is a good idea, but at first
> glance it looks fishy that time is so performance critical.

It's not an unreasonable expectation (from userspace's point of view)
that the simpler sys_time() system call would run faster than
gettimeofday. I suspect that this may be the case on other platforms.

However, your entire argument seems to be "It helps userspace run
faster? Weird. It shouldn't. They must be doing something wrong,
therefore the patch is stupid."

If that is your argument, then it's bogus. The bottom line is that
code, deployed today, significantly and impressively benefits from the
patch. And it's not a microbenchmark, that ignores the effect on the
rest of the system. Ingo said that both MySQL and X are improved by
this, so between the two of them that covers most every desktop system
and many servers.

> I'm not objecting the patch itself, if there's a legitimate reason to
> call time so frequently that's sure a fine optimization despite the
> branch in gettimeofday, but I'm asking why time is called so many
> times on this specific workload, because I'm fairly certain that in
> average (desktop and server) gettimeofday is called much more
> frequently

You're absolutely correct, gettimeofday *is* called much more
frequently. But not enough to outweigh the gains from sys_time
improvements.

> and if it was up to me to tell, I would expect time
> microoptimizations to result in irrelevant performance
> differences. Infact I seem to recall that even other commercial dbs
> calls tends to call floods of gettimeofday while I can't recall any
> time in the strace output. So unless there is a legitimate reason to
> call time() dozen thousand times per second,

(Why does this argument always rear its head on topics in timekeeping?)

Userspace does, whether you or I think it is legitimate or not, and
that's the reality we need to deal with. Further, I'd argue that it
*does* make sense, as sys_time() should be as cheap or cheaper than
gettimeofday(), and bottom line, userspace code needs to know the time
for lots of reasonable reasons.

> if it was my choice, I
> would prefer to have gettimeofday as fast as it can be. Not that a
> branch will make any measurable difference, but still that would be my
> choice.

Then you'd be penny-wise and pound-foolish, as the saying goes. Said
un-idiomatically, you're optimizing from a microbenchmark point of
view, which is not guaranteed to always be a win.

Ray

2007-06-26 17:16:10

by Andrew Morton

[permalink] [raw]
Subject: Re: [patch, v2.6.22-rc6] sys_time() speedup

On Tue, 26 Jun 2007 17:26:29 +0200 Ingo Molnar <[email protected]> wrote:

>
> * Andrew Morton <[email protected]> wrote:
>
> > > > the patch improves the sysbench OLTP macrobenchmark significantly:
> > >
> > > Has that any real practical relevance?
> >
> > Interesting question. [...]
>
> i'm missing the <sarcastic> tag i guess ;-)
>
> <sarcastic> Oh my, does database macro-performance have any relevance to
> Linux bread and butter markets in general. Boggle, it is a really
> difficult question i suspect. </sarcastic>
>
> If we ignore those few million database and web server Linux boxes on
> the market and concentrate purely on the few m68k boxes that are still
> in existance, _then_ we might be doubtful about this question ;-)

On my machine, time(2) doesn't do any syscall at all - it uses the vsyscall
page. I'd be surprised if a database uses sys_time() either.

> > [...] The patch adds a new test-n-branch to gettimeofday() so if
> > gettimeofday() is used much more frequently than time(), we lose.
>
> given that the cost to sys_gettimeofday() is less than a cycle (we test
> a value already in a register, with an unlikely hint), and the benefit
> to sys_time() is around 6000 cycles (or more), sys_gettimeofday() would
> have to be used thousands of times more frequently than sys_time() -
> which it clearly isnt. As a test i just triggered a really X-intense
> workload and for that gettimeofday-dominated landscape there was still 1
> sys_time() call for every 50 gettimeofday calls - so it's a small win
> even for this X workload.

So something in X is somehow calling sys_time()? How come, and is that an
outlier? How generalisable is this observation?

2007-06-26 17:36:14

by Andrew Morton

[permalink] [raw]
Subject: Re: [patch, v2.6.22-rc6] sys_time() speedup

On Tue, 26 Jun 2007 19:08:27 +0200 (CEST) Roman Zippel <[email protected]> wrote:

> > This current ... interesting piece of Roman about a _single_ trivial
> > unlikely() branch in do_gettimeofday() borders on the ridiculous. My
> > patch might be wrong for various reasons, but that single
> > 'if (unlikely())' statement is not one of those reasons =B-)
>
> That's even more nonsense, that wasn't what my mail was about and Andrew
> understood me correctly, so you could have too.

umm, yeah. Ingo went a bit over the top there, IMO.

It boils down to: is sys_time() called at more or less than 1/2000th the
frequency of gettimeofday(), across the expected lifetime of 2.6.23 and
later? Ingo has a couple of (surprising) examples where the sys_time()
call frequency _is_ high, but whether that will remain true across 2.6.23
and later is an open question.

How does mysql call sys_time() at all, if time(2) uses the vsyscall page??

Will contemporary-to-2.6.23-and-later mysqls do this?

All this isn't super-trivial silliness, either. gettimeofday() is, for
many workloads, the kernel's most time-critical codepath bar none, I
believe.

2007-06-26 17:36:57

by Andrew Morton

[permalink] [raw]
Subject: Re: [patch, v2.6.22-rc6] sys_time() speedup

On Mon, 25 Jun 2007 15:15:08 -0700 Andrew Morton <[email protected]> wrote:

> > What did you test?
> > There can be many ways to read the clock, do you want to put this hook
> > everywhere?
>
> Yeah, it isn't immediately obvious (to this little black duck) why similar
> fixups weren't needed in timekeeping.c.

did this get addressed?

2007-06-27 00:16:27

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: [patch, v2.6.22-rc6] sys_time() speedup

On Tue, Jun 26, 2007 at 10:13:31AM -0700, Ray Lee wrote:
> faster? Weird. It shouldn't. They must be doing something wrong,
> therefore the patch is stupid."

Just in case it's not obvious the above are Ray Lee words, mine not.

-----------
#!/usr/bin/env stap

# edited top.stp from systemtap

global syscalls

function print_top () {
printf ("SYSCALL\t\t\t\tCOUNT\n")
foreach ([name] in syscalls- limit 20)
printf("%-20s\t\t%5d\n",name, syscalls[name])
printf("--------------------------------------\n")
}

probe syscall.time {
syscalls[probefunc()]++
}
probe syscall.gettimeofday {
syscalls[probefunc()]++
}

# print top syscalls every 5 seconds
probe timer.ms(5000) {
print_top ()
}
-----------

The above while running various huge sql operations with real life
postgresql app running sql in loop for a minute or so (sorry no mysql
setup but the world isn't mysql, I'd rather want to see oracle if
something):

SYSCALL COUNT
sys_gettimeofday 4998
sys_time 120
--------------------------------------
SYSCALL COUNT
sys_gettimeofday 9989
sys_time 185
--------------------------------------
SYSCALL COUNT
sys_gettimeofday 15219
sys_time 335
--------------------------------------
SYSCALL COUNT
sys_gettimeofday 21215
sys_time 428
--------------------------------------
SYSCALL COUNT
sys_gettimeofday 26194
sys_time 629
--------------------------------------
SYSCALL COUNT
sys_gettimeofday 30752
sys_time 734
--------------------------------------
SYSCALL COUNT
sys_gettimeofday 37379
sys_time 976
--------------------------------------
SYSCALL COUNT
sys_gettimeofday 42381
sys_time 1125
--------------------------------------
SYSCALL COUNT
sys_gettimeofday 47722
sys_time 1391
--------------------------------------
SYSCALL COUNT
sys_gettimeofday 53138
sys_time 1520
--------------------------------------
SYSCALL COUNT
sys_gettimeofday 57499
sys_time 1651
--------------------------------------
SYSCALL COUNT
sys_gettimeofday 62314
sys_time 1712
--------------------------------------
SYSCALL COUNT
sys_gettimeofday 66874
sys_time 1827
--------------------------------------
SYSCALL COUNT
sys_gettimeofday 71757
sys_time 2007
--------------------------------------
SYSCALL COUNT
sys_gettimeofday 76335
sys_time 2240
SYSCALL COUNT
sys_gettimeofday 80469
sys_time 2354
--------------------------------------
SYSCALL COUNT
sys_gettimeofday 85420
sys_time 2519
--------------------------------------
SYSCALL COUNT
sys_gettimeofday 90662
sys_time 2648
--------------------------------------
SYSCALL COUNT
sys_gettimeofday 95513
sys_time 2909
--------------------------------------
SYSCALL COUNT
sys_gettimeofday 100767
sys_time 3111
--------------------------------------
SYSCALL COUNT
sys_gettimeofday 106553
sys_time 3427
--------------------------------------
SYSCALL COUNT
sys_gettimeofday 112300
sys_time 3673
--------------------------------------
SYSCALL COUNT
sys_gettimeofday 115706
sys_time 3793
SYSCALL COUNT
sys_gettimeofday 119842
sys_time 3893
--------------------------------------
SYSCALL COUNT
sys_gettimeofday 123054
sys_time 4113
--------------------------------------
SYSCALL COUNT
sys_gettimeofday 126286
sys_time 4250
--------------------------------------
SYSCALL COUNT
sys_gettimeofday 129077
sys_time 4396
--------------------------------------
SYSCALL COUNT
sys_gettimeofday 132002
sys_time 4506
--------------------------------------
SYSCALL COUNT
sys_gettimeofday 138518
sys_time 4800
--------------------------------------
SYSCALL COUNT
sys_gettimeofday 143572
sys_time 4901
--------------------------------------
SYSCALL COUNT
sys_gettimeofday 148621
sys_time 5069

Now if I play some music with projectm in the background:

SYSCALL COUNT
sys_gettimeofday 6337
sys_time 128
--------------------------------------
SYSCALL COUNT
sys_gettimeofday 11462
sys_time 249
--------------------------------------
SYSCALL COUNT
sys_gettimeofday 21905
sys_time 332
--------------------------------------
SYSCALL COUNT
sys_gettimeofday 36205
sys_time 494
--------------------------------------
SYSCALL COUNT
sys_gettimeofday 53792
sys_time 569
--------------------------------------
SYSCALL COUNT
sys_gettimeofday 69699
sys_time 709
--------------------------------------
SYSCALL COUNT
sys_gettimeofday 85663
sys_time 791
--------------------------------------
SYSCALL COUNT
sys_gettimeofday 101427
sys_time 908
--------------------------------------
SYSCALL COUNT
sys_gettimeofday 117199
sys_time 985
--------------------------------------
SYSCALL COUNT
sys_gettimeofday 132963
sys_time 1100
--------------------------------------
SYSCALL COUNT
sys_gettimeofday 148974
sys_time 1162
--------------------------------------
SYSCALL COUNT
sys_gettimeofday 164677
sys_time 1288
--------------------------------------
SYSCALL COUNT
sys_gettimeofday 180697
sys_time 1356
--------------------------------------
SYSCALL COUNT
sys_gettimeofday 196517
sys_time 1479
--------------------------------------
SYSCALL COUNT
sys_gettimeofday 209618
sys_time 1546
--------------------------------------
SYSCALL COUNT
sys_gettimeofday 214914
sys_time 1726
--------------------------------------
SYSCALL COUNT
sys_gettimeofday 221257
sys_time 1798

If I listen some music on youtube.com:

SYSCALL COUNT
sys_gettimeofday 5151
sys32_gettimeofday 3951
sys_time 202
compat_sys_time 1
--------------------------------------
SYSCALL COUNT
sys32_gettimeofday 16427
sys_gettimeofday 10247
sys_time 306
compat_sys_time 1
--------------------------------------
SYSCALL COUNT
sys32_gettimeofday 42436
sys_gettimeofday 15633
sys_time 438
compat_sys_time 1
--------------------------------------
SYSCALL COUNT
sys32_gettimeofday 64931
sys_gettimeofday 21121
sys_time 509
compat_sys_time 1
--------------------------------------
SYSCALL COUNT
sys32_gettimeofday 94366
sys_gettimeofday 26399
sys_time 638
compat_sys_time 1
--------------------------------------
SYSCALL COUNT
sys32_gettimeofday 117573
sys_gettimeofday 33100
sys_time 722
compat_sys_time 1
--------------------------------------
SYSCALL COUNT
sys32_gettimeofday 137415
sys_gettimeofday 38357
sys_time 863
compat_sys_time 1
--------------------------------------
SYSCALL COUNT
sys32_gettimeofday 154831
sys_gettimeofday 43459
sys_time 933
compat_sys_time 1

Those aren't gettimeofday heavy users at all, they're the normal
"light" kernel users of the most common workloads. The sql wasn't
gettimeofday intensive either. The very heavy users will run hundred
thousands gettimeofday or more per second so they're not even
interesting to measure because the sys_time system time will be close
to zero compared to the gettimeofday one.

BTW, to run the above I had to disable vsyscall64 sysctl, my x2 would
never waste any time running sys_time or sys_gettimeofday in the first
place ;).

This patch may be a good tradeoff anyway, but IMHO it's not possible
to judje the idea of adding a branch to sys_gettimeofday to make
sys_time faster without knowing why mysql calls time() so
frequently. Even if you care only about the present and you don't care
about the future, mysql isn't the only db in the marketplace and like
I wrote in the other email the others I've seen were running floods of
gettimeofday (I repeat, some even went as far as giving an option to
call rdtsc directly on the few servers with tsc synchronized in smp).

2007-06-27 00:22:42

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: [patch, v2.6.22-rc6] sys_time() speedup

On Tue, Jun 26, 2007 at 10:14:40AM -0700, Andrew Morton wrote:
> On my machine, time(2) doesn't do any syscall at all - it uses the vsyscall
> page. I'd be surprised if a database uses sys_time() either.

Large boxes unfortunately can't always use vsyscalls... that's a real
pity. I also had to disable the vsyscalls64 to generate some number.

I think there shall be a perfectly accurate but not monotone mode for
gettimeofday so we can enable rdtscp (via sysctl or/and prctl). Aware
apps can enable the prctl, aware or brave admins can turn on the
sysctl. Vojtech and others should have proper patches to merge for
this.