LinuxLists.cc - [git pull] scheduler updates

2008-11-08 17:02:47

Subject: [git pull] scheduler updates

Linus,

Please pull the latest scheduler updates git tree from:

git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip.git sched-fixes-for-linus

Thanks,

Ingo

------------------>
Ingo Molnar (2):
sched: improve sched_clock() performance
sched: optimize sched_clock() a bit

arch/x86/include/asm/msr.h | 2 --
arch/x86/include/asm/tsc.h | 8 +++++++-
arch/x86/kernel/tsc.c | 2 +-
3 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/msr.h b/arch/x86/include/asm/msr.h
index 46be2fa..c2a812e 100644
--- a/arch/x86/include/asm/msr.h
+++ b/arch/x86/include/asm/msr.h
@@ -108,9 +108,7 @@ static __always_inline unsigned long long __native_read_tsc(void)
{
DECLARE_ARGS(val, low, high);

- rdtsc_barrier();
asm volatile("rdtsc" : EAX_EDX_RET(val, low, high));
- rdtsc_barrier();

return EAX_EDX_VAL(val, low, high);
}
diff --git a/arch/x86/include/asm/tsc.h b/arch/x86/include/asm/tsc.h
index 38ae163..9cd83a8 100644
--- a/arch/x86/include/asm/tsc.h
+++ b/arch/x86/include/asm/tsc.h
@@ -34,6 +34,8 @@ static inline cycles_t get_cycles(void)

static __always_inline cycles_t vget_cycles(void)
{
+ cycles_t cycles;
+
/*
* We only do VDSOs on TSC capable CPUs, so this shouldnt
* access boot_cpu_data (which is not VDSO-safe):
@@ -42,7 +44,11 @@ static __always_inline cycles_t vget_cycles(void)
if (!cpu_has_tsc)
return 0;
#endif
- return (cycles_t)__native_read_tsc();
+ rdtsc_barrier();
+ cycles = (cycles_t)__native_read_tsc();
+ rdtsc_barrier();
+
+ return cycles;
}

extern void tsc_init(void);
diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index 2ef80e3..424093b 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -55,7 +55,7 @@ u64 native_sched_clock(void)
rdtscll(this_offset);

/* return the value in ns */
- return cycles_2_ns(this_offset);
+ return __cycles_2_ns(this_offset);
}

/* We need to define a real function for sched_clock, to override the

2008-11-08 18:29:01

by Linus Torvalds

[permalink] [raw]

Subject: Re: [git pull] scheduler updates

On Sat, 8 Nov 2008, Ingo Molnar wrote:
>
> Ingo Molnar (2):
> sched: improve sched_clock() performance
> sched: optimize sched_clock() a bit

Btw, why do we do that _idiotic_ rdtsc_barrier() AT ALL?

No sane user can possibly want it. If you do 'rdtsc', there's nothing you
can do about a few cycles difference due to OoO _anyway_. Adding barriers
is entirely meaningless - it's not going to make the return value mean
anything else.

Can we please just remove that idiocy? Or can somebody give a _sane_
argument for it?

Linus

2008-11-08 18:39:16

by Linus Torvalds

[permalink] [raw]

Subject: Re: [git pull] scheduler updates

On Sat, 8 Nov 2008, Linus Torvalds wrote:
>
> Can we please just remove that idiocy? Or can somebody give a _sane_
> argument for it?

Btw, the only _possible_ sane argument I see is

- two consecutive rdtsc calls
- timing the code in between
- the code in between is not self-serializing

and quite frankly, if that's the case, then it's _that_ code that should
have the barriers, not some generic "[v]get_cycles()".

IOW, the rdtsc_barrier may make sense when you're synchronizing the TSC to
some other hardware event (eg the "tie the TSC to the HPET" kind of
code), but then the barriers are about the code, not about the TSC access
itself.

Linus

2008-11-08 18:41:18

by Arjan van de Ven

[permalink] [raw]

Subject: Re: [git pull] scheduler updates

On Sat, 8 Nov 2008 10:28:21 -0800 (PST)
Linus Torvalds <[email protected]> wrote:

>
>
> On Sat, 8 Nov 2008, Ingo Molnar wrote:
> >
> > Ingo Molnar (2):
> > sched: improve sched_clock() performance
> > sched: optimize sched_clock() a bit
>
> Btw, why do we do that _idiotic_ rdtsc_barrier() AT ALL?
>
> No sane user can possibly want it. If you do 'rdtsc', there's nothing
> you can do about a few cycles difference due to OoO _anyway_. Adding
> barriers is entirely meaningless - it's not going to make the return
> value mean anything else.
>
> Can we please just remove that idiocy? Or can somebody give a _sane_
> argument for it?

historically it was for early AMD cpus (K7, not sure if early K8 did
this) where 2 consecutive rdtsc's in the same codestream would get
reordered compared to eachother, so you could observe the tsc go
backwards...

--
Arjan van de Ven Intel Open Source Technology Centre
For development, discussion and tips for power savings,
visit http://www.lesswatts.org

2008-11-08 18:52:43

by Ingo Molnar

[permalink] [raw]

Subject: Re: [git pull] scheduler updates

* Linus Torvalds <[email protected]> wrote:

> On Sat, 8 Nov 2008, Ingo Molnar wrote:
> >
> > Ingo Molnar (2):
> > sched: improve sched_clock() performance
> > sched: optimize sched_clock() a bit
>
> Btw, why do we do that _idiotic_ rdtsc_barrier() AT ALL?
>
> No sane user can possibly want it. If you do 'rdtsc', there's
> nothing you can do about a few cycles difference due to OoO
> _anyway_. Adding barriers is entirely meaningless - it's not going
> to make the return value mean anything else.
>
> Can we please just remove that idiocy? Or can somebody give a _sane_
> argument for it?

yeah, i had the same thinking, so i zapped it for everything but the
vdso driven vgettimeofday GTOD method.

For that one, i chickened out, because we have this use in
arch/x86/kernel/vsyscall_64.c:

now = vread();
base = __vsyscall_gtod_data.clock.cycle_last;
mask = __vsyscall_gtod_data.clock.mask;
mult = __vsyscall_gtod_data.clock.mult;
shift = __vsyscall_gtod_data.clock.shift;

which can be triggered by gettimeofday() on certain systems.

And i couldnt convince myself that this sequence couldnt result in
userspace-observable GTOD time warps there, so i went for the obvious
fix first.

If the "now = vread()"'s RDTSC instruction is speculated to after it
reads cycle_last, and another vdso call shortly after this does
another RDTSC in this same sequence, the two RDTSC's could be mixed up
in theory, resulting in negative time?

I _think_ i heard some noises in the past that this could indeed
happen (and have vague memories that this was the justification for
the barrier's introduction), but have to check the old emails to
figure out what exactly the issue was and on what CPUs.

It's not completely impossible for this to happen, as the vdso calls
are really just simple function calls, so not nearly as strongly
serialized as say a real syscall based gettimeofday() call.

In any case, it failed my "it must be obvious within 1 minute for it
to be eligible for sched/urgent" threshold and i didnt want to
introduce a time warp.

But you are right, and i've queued up the full fix below as well, as a
reminder.

Ingo

----------------->
>From 2efe2c42e008a80ebe1992db63749386778f7df8 Mon Sep 17 00:00:00 2001
From: Ingo Molnar <[email protected]>
Date: Sat, 8 Nov 2008 19:46:48 +0100
Subject: [PATCH] x86, time: remove rdtsc_barrier()

Linus pointed out that even for vread() rdtsc_barrier() is pointless
overhead - as due to speculative instructions there's no such thing
as reliable cycle count anyway.

Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/include/asm/system.h | 13 -------------
arch/x86/include/asm/tsc.h | 6 +-----
2 files changed, 1 insertions(+), 18 deletions(-)

diff --git a/arch/x86/include/asm/system.h b/arch/x86/include/asm/system.h
index 2ed3f0f..1a1d45e 100644
--- a/arch/x86/include/asm/system.h
+++ b/arch/x86/include/asm/system.h
@@ -409,17 +409,4 @@ void default_idle(void);
#define set_mb(var, value) do { var = value; barrier(); } while (0)
#endif

-/*
- * Stop RDTSC speculation. This is needed when you need to use RDTSC
- * (or get_cycles or vread that possibly accesses the TSC) in a defined
- * code region.
- *
- * (Could use an alternative three way for this if there was one.)
- */
-static inline void rdtsc_barrier(void)
-{
- alternative(ASM_NOP3, "mfence", X86_FEATURE_MFENCE_RDTSC);
- alternative(ASM_NOP3, "lfence", X86_FEATURE_LFENCE_RDTSC);
-}
-
#endif /* _ASM_X86_SYSTEM_H */
diff --git a/arch/x86/include/asm/tsc.h b/arch/x86/include/asm/tsc.h
index 9cd83a8..700aeb8 100644
--- a/arch/x86/include/asm/tsc.h
+++ b/arch/x86/include/asm/tsc.h
@@ -44,11 +44,7 @@ static __always_inline cycles_t vget_cycles(void)
if (!cpu_has_tsc)
return 0;
#endif
- rdtsc_barrier();
- cycles = (cycles_t)__native_read_tsc();
- rdtsc_barrier();
-
- return cycles;
+ return (cycles_t)__native_read_tsc();
}

extern void tsc_init(void);

2008-11-08 18:58:24

by Ingo Molnar

[permalink] [raw]

Subject: Re: [git pull] scheduler updates

* Ingo Molnar <[email protected]> wrote:

> For that one, i chickened out, because we have this use in
> arch/x86/kernel/vsyscall_64.c:
>
> now = vread();
> base = __vsyscall_gtod_data.clock.cycle_last;
> mask = __vsyscall_gtod_data.clock.mask;
> mult = __vsyscall_gtod_data.clock.mult;
> shift = __vsyscall_gtod_data.clock.shift;
>
> which can be triggered by gettimeofday() on certain systems.
>
> And i couldnt convince myself that this sequence couldnt result in
> userspace-observable GTOD time warps there, so i went for the
> obvious fix first.
>
> If the "now = vread()"'s RDTSC instruction is speculated to after it
> reads cycle_last, and another vdso call shortly after this does
> another RDTSC in this same sequence, the two RDTSC's could be mixed
> up in theory, resulting in negative time?

the fuller sequence is:

now = vread();
base = __vsyscall_gtod_data.clock.cycle_last;
mask = __vsyscall_gtod_data.clock.mask;
mult = __vsyscall_gtod_data.clock.mult;
shift = __vsyscall_gtod_data.clock.shift;

tv->tv_sec = __vsyscall_gtod_data.wall_time_sec;
nsec = __vsyscall_gtod_data.wall_time_nsec;
} while (read_seqretry(&__vsyscall_gtod_data.lock, seq));

now here we could have another race as well: on another CPU we have a
timer IRQ running, which updates
__vsyscall_gtod_data.wall_time_[n]sec.

now __vsyscall_gtod_data updates are protected via the
__vsyscall_gtod_data.lock seqlock, but that assumes that all
instructions within that sequence listen to the barriers.

Except for RDTSC, which can be speculated to outside that region of
code.

RDTSC has no 'explicit' data dependency - there's no MESI-alike
coherency guarantee for stuffing a cycle counter into a register and
then putting that into __vsyscall_gtod_data.clock.cycle_last. So we
create one, by using the combination of LFENCE and SFENCE. (because
RDTSC implementations on Intel and AMD CPUs listen to different
sequences.)

all in one, i think it's still needed to avoid negative GTOD jumps.

Ingo

2008-11-08 19:01:25

by Linus Torvalds

[permalink] [raw]

Subject: Re: [git pull] scheduler updates

On Sat, 8 Nov 2008, Arjan van de Ven wrote:
>
> historically it was for early AMD cpus (K7, not sure if early K8 did
> this) where 2 consecutive rdtsc's in the same codestream would get
> reordered compared to eachother, so you could observe the tsc go
> backwards...

.. but this only happens with two _consecutive_ ones.

The thing is, nobody sane does that in generic code. The scheduler wants
to have cycles, yes, but two consecutive scheduler invocations will have
spinlocks etc in between. That's true of _all_ sane uses of a TSC.

I don't see that there is ever any reason to do the barriers for any
normal case. And the cases where it does matter would actually be worth
pointing out (ie making the barriers explicit in those cases, and those
cases only).

Doing it in get_cycles() and "forgetting about it" may sound like a simple
solution, but it's likely wrong. For example, one of the few cases where
we realy care about time going backwards is gettimeofday() - which uses
tsc, but which also has tons of serializing instructions on its own.
EXCEPT WHEN IT IS a vsyscall!

But in that case, we don't even have the barrier, because we put it in the
wrong function and 'forgot about it'. Of course, we may not need it
(rdtscp maybe always serializes, I didn't check), but the point is, an
explicit barrier is actually better than one that is hidden.

So who _really_ needs it? And why not just do it there?

Linus

2008-11-08 19:05:35

by Ingo Molnar

[permalink] [raw]

Subject: Re: [git pull] scheduler updates

* Linus Torvalds <[email protected]> wrote:

> Doing it in get_cycles() and "forgetting about it" may sound like a
> simple solution, but it's likely wrong. For example, one of the few
> cases where we realy care about time going backwards is
> gettimeofday() - which uses tsc, but which also has tons of
> serializing instructions on its own. EXCEPT WHEN IT IS a vsyscall!
>
> But in that case, we don't even have the barrier, because we put it
> in the wrong function and 'forgot about it'. Of course, we may not
> need it (rdtscp maybe always serializes, I didn't check), but the
> point is, an explicit barrier is actually better than one that is
> hidden.

no, we really had it in the vsyscall case: which uses vread, which
uses __native_read_tsc(), which had the barriers.

And i think that's the _only_ valid place to have it.

So that's why my change moves it from the __native_read_tsc() over to
_only_ the vget_cycles().

am i missing something on such a nice Saturday evening? :)

Ingo

2008-11-08 19:10:27

by Ingo Molnar

[permalink] [raw]

Subject: Re: [git pull] scheduler updates

* Linus Torvalds <[email protected]> wrote:

> On Sat, 8 Nov 2008, Arjan van de Ven wrote:
> >
> > historically it was for early AMD cpus (K7, not sure if early K8 did
> > this) where 2 consecutive rdtsc's in the same codestream would get
> > reordered compared to eachother, so you could observe the tsc go
> > backwards...
>
> .. but this only happens with two _consecutive_ ones.
>
> The thing is, nobody sane does that in generic code. The scheduler wants
> to have cycles, yes, but two consecutive scheduler invocations will have
> spinlocks etc in between. That's true of _all_ sane uses of a TSC.
>
> I don't see that there is ever any reason to do the barriers for any
> normal case. And the cases where it does matter would actually be worth
> pointing out (ie making the barriers explicit in those cases, and those
> cases only).
>
> Doing it in get_cycles() and "forgetting about it" may sound like a simple
> solution, but it's likely wrong. For example, one of the few cases where
> we realy care about time going backwards is gettimeofday() - which uses
> tsc, but which also has tons of serializing instructions on its own.
> EXCEPT WHEN IT IS a vsyscall!
>
> But in that case, we don't even have the barrier, because we put it in the
> wrong function and 'forgot about it'. Of course, we may not need it
> (rdtscp maybe always serializes, I didn't check), but the point is, an
> explicit barrier is actually better than one that is hidden.
>
> So who _really_ needs it? And why not just do it there?

i think, the tree as offered to you, intends to do just that, unless i
made some grave (and unintended) mistake somewhere.

The barrier is only present in the vread function: which is the
vsyscall-read function, to be used from user-space.

Even in the past, no was actually forgotten or put in the wrong
function as far as i can see because previously _everything_
(including the vread method) had the barrier.

The change from me simply removes the barrier from the places that
dont need it - exactly for the reason you outlined: the scheduler is
both imprecise and has a ton of natural serialization anyway, so it's
a non-issue there.

Hm?

Ingo

2008-11-08 19:21:49

by Linus Torvalds

[permalink] [raw]

Subject: Re: [git pull] scheduler updates

On Sat, 8 Nov 2008, Ingo Molnar wrote:
>
> So that's why my change moves it from the __native_read_tsc() over to
> _only_ the vget_cycles().

Ahh. I was looking at native_read_tscp(). Which has no barriers. But then
we don't actually save the actual TSC, we only end up using the "p" part,
so we don't care..

Anyway, even for the vget_cycles(), is there really any reason to have
_two_ barriers? Also, I still think it would be a hell of a lot more
readable and logical to put the barriers in the _caller_, so that people
actually see what the barriers are there for.

When they are hidden, they make no sense. The helper function just has two
insane barriers without explanation, and the caller doesn't know that the
code is serialized wrt something random.

Linus

2008-11-08 19:30:24

by Ingo Molnar

[permalink] [raw]

Subject: Re: [git pull] scheduler updates

* Linus Torvalds <[email protected]> wrote:

> On Sat, 8 Nov 2008, Ingo Molnar wrote:
> >
> > So that's why my change moves it from the __native_read_tsc() over to
> > _only_ the vget_cycles().
>
> Ahh. I was looking at native_read_tscp(). Which has no barriers. But then
> we don't actually save the actual TSC, we only end up using the "p" part,
> so we don't care..
>
> Anyway, even for the vget_cycles(), is there really any reason to
> have _two_ barriers? Also, I still think it would be a hell of a lot
> more readable and logical to put the barriers in the _caller_, so
> that people actually see what the barriers are there for.
>
> When they are hidden, they make no sense. The helper function just
> has two insane barriers without explanation, and the caller doesn't
> know that the code is serialized wrt something random.

ok, fully agreed, i've queued up the cleanup for that, see it below.

sidenote: i still kept the get_cycles() versus vget_cycles()
distinction, to preserve the explicit marker that vget_cycles() is
used in user-space mode code. We periodically forgot about that in the
past. But otherwise, the two inline functions are now identical.
(except for the assymetry of its inlining, and the comment about the
boot_cpu_data use of the has_tsc check)

Ingo

--------------->
>From cb9e35dce94a1b9c59d46224e8a94377d673e204 Mon Sep 17 00:00:00 2001
From: Ingo Molnar <[email protected]>
Date: Sat, 8 Nov 2008 20:27:00 +0100
Subject: [PATCH] x86: clean up rdtsc_barrier() use

Impact: cleanup

Move rdtsc_barrier() use to vsyscall_64.c where it's relied on,
and point out its role in the context of its use.

Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/include/asm/tsc.h | 6 +-----
arch/x86/kernel/vsyscall_64.c | 9 +++++++++
2 files changed, 10 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/tsc.h b/arch/x86/include/asm/tsc.h
index 9cd83a8..700aeb8 100644
--- a/arch/x86/include/asm/tsc.h
+++ b/arch/x86/include/asm/tsc.h
@@ -44,11 +44,7 @@ static __always_inline cycles_t vget_cycles(void)
if (!cpu_has_tsc)
return 0;
#endif
- rdtsc_barrier();
- cycles = (cycles_t)__native_read_tsc();
- rdtsc_barrier();
-
- return cycles;
+ return (cycles_t)__native_read_tsc();
}

extern void tsc_init(void);
diff --git a/arch/x86/kernel/vsyscall_64.c b/arch/x86/kernel/vsyscall_64.c
index 0b8b669..ebf2f12 100644
--- a/arch/x86/kernel/vsyscall_64.c
+++ b/arch/x86/kernel/vsyscall_64.c
@@ -128,7 +128,16 @@ static __always_inline void do_vgettimeofday(struct timeval * tv)
gettimeofday(tv,NULL);
return;
}
+
+ /*
+ * Surround the RDTSC by barriers, to make sure it's not
+ * speculated to outside the seqlock critical section and
+ * does not cause time warps:
+ */
+ rdtsc_barrier();
now = vread();
+ rdtsc_barrier();
+
base = __vsyscall_gtod_data.clock.cycle_last;
mask = __vsyscall_gtod_data.clock.mask;
mult = __vsyscall_gtod_data.clock.mult;

2008-11-08 19:33:18

by Ingo Molnar

[permalink] [raw]

Subject: Re: [git pull] scheduler updates

* Ingo Molnar <[email protected]> wrote:

> But you are right, and i've queued up the full fix below as well, as
> a reminder.

... i zapped this, as it's wrong to remove the barrier use from
vsyscall_64.c.

Ingo

2008-11-08 19:41:17

by Ingo Molnar

[permalink] [raw]

Subject: Re: [git pull] scheduler updates

* Linus Torvalds <[email protected]> wrote:

> On Sat, 8 Nov 2008, Ingo Molnar wrote:
> >
> > So that's why my change moves it from the __native_read_tsc() over
> > to _only_ the vget_cycles().
>
> Ahh. I was looking at native_read_tscp(). Which has no barriers. But
> then we don't actually save the actual TSC, we only end up using the
> "p" part, so we don't care..

sidenote #3: RDTSCP is a relatively new instruction, and has implicit
barrier properties: it is a serializing instruction.

Ingo

2008-11-17 22:44:17

by Pallipadi, Venkatesh

[permalink] [raw]

Subject: Re: [git pull] scheduler updates

On Sat, Nov 08, 2008 at 11:29:57AM -0800, Ingo Molnar wrote:
>
> * Linus Torvalds <[email protected]> wrote:
>
> > On Sat, 8 Nov 2008, Ingo Molnar wrote:
> > >
> > > So that's why my change moves it from the __native_read_tsc() over to
> > > _only_ the vget_cycles().
> >
> > Ahh. I was looking at native_read_tscp(). Which has no barriers. But then
> > we don't actually save the actual TSC, we only end up using the "p" part,
> > so we don't care..
> >
> > Anyway, even for the vget_cycles(), is there really any reason to
> > have _two_ barriers? Also, I still think it would be a hell of a lot
> > more readable and logical to put the barriers in the _caller_, so
> > that people actually see what the barriers are there for.
> >
> > When they are hidden, they make no sense. The helper function just
> > has two insane barriers without explanation, and the caller doesn't
> > know that the code is serialized wrt something random.
>
> ok, fully agreed, i've queued up the cleanup for that, see it below.
>
> sidenote: i still kept the get_cycles() versus vget_cycles()
> distinction, to preserve the explicit marker that vget_cycles() is
> used in user-space mode code. We periodically forgot about that in the
> past. But otherwise, the two inline functions are now identical.
> (except for the assymetry of its inlining, and the comment about the
> boot_cpu_data use of the has_tsc check)
>

Patch being discussed on this thread (commit 0d12cdd) has a regression on
one of the test systems here.

With the patch, I see

checking TSC synchronization [CPU#0 -> CPU#1]:
Measured 28 cycles TSC warp between CPUs, turning off TSC clock.
Marking TSC unstable due to check_tsc_sync_source failed

Whereas, without the patch syncs pass fine on all CPUs

checking TSC synchronization [CPU#0 -> CPU#1]: passed.

Due to this, TSC is marke unstable, when it is not actually unstable.
This is because syncs in check_tsc_wrap() goes away due to this commit.

As per the discussion on this thread, correct way to fix this is to add
explicit syncs as below?

Signed-off-by: Venkatesh Pallipadi <[email protected]>

---
arch/x86/kernel/tsc_sync.c | 4 ++++
1 file changed, 4 insertions(+)

Index: linux-2.6/arch/x86/kernel/tsc_sync.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/tsc_sync.c 2008-11-10 15:27:12.000000000 -0800
+++ linux-2.6/arch/x86/kernel/tsc_sync.c 2008-11-17 14:13:17.000000000 -0800
@@ -46,7 +46,9 @@ static __cpuinit void check_tsc_warp(voi
cycles_t start, now, prev, end;
int i;

+ rdtsc_barrier();
start = get_cycles();
+ rdtsc_barrier();
/*
* The measurement runs for 20 msecs:
*/
@@ -61,7 +63,9 @@ static __cpuinit void check_tsc_warp(voi
*/
__raw_spin_lock(&sync_lock);
prev = last_tsc;
+ rdtsc_barrier();
now = get_cycles();
+ rdtsc_barrier();
last_tsc = now;
__raw_spin_unlock(&sync_lock);

2008-11-17 22:50:45

by Ingo Molnar

[permalink] [raw]

Subject: Re: [git pull] scheduler updates

* Venki Pallipadi <[email protected]> wrote:

> Patch being discussed on this thread (commit 0d12cdd) has a
> regression on one of the test systems here.
>
> With the patch, I see
>
> checking TSC synchronization [CPU#0 -> CPU#1]:
> Measured 28 cycles TSC warp between CPUs, turning off TSC clock.
> Marking TSC unstable due to check_tsc_sync_source failed
>
> Whereas, without the patch syncs pass fine on all CPUs
>
> checking TSC synchronization [CPU#0 -> CPU#1]: passed.
>
> Due to this, TSC is marke unstable, when it is not actually unstable.
> This is because syncs in check_tsc_wrap() goes away due to this commit.
>
> As per the discussion on this thread, correct way to fix this is to add
> explicit syncs as below?

ah. Yes.

Could you please check whether:

> + rdtsc_barrier();
> start = get_cycles();
> + rdtsc_barrier();
> /*
> * The measurement runs for 20 msecs:
> */
> @@ -61,7 +63,9 @@ static __cpuinit void check_tsc_warp(voi
> */
> __raw_spin_lock(&sync_lock);
> prev = last_tsc;
> + rdtsc_barrier();
> now = get_cycles();
> + rdtsc_barrier();

adding the barrier just _after_ the get_cycles() call (but not before
it) does the trick too? That should be enough in this case.

Ingo

2008-11-17 23:04:15

by Pallipadi, Venkatesh

[permalink] [raw]

Subject: Re: [git pull] scheduler updates

On Mon, Nov 17, 2008 at 02:50:18PM -0800, Ingo Molnar wrote:
>
> * Venki Pallipadi <[email protected]> wrote:
>
> > Patch being discussed on this thread (commit 0d12cdd) has a
> > regression on one of the test systems here.
> >
> > With the patch, I see
> >
> > checking TSC synchronization [CPU#0 -> CPU#1]:
> > Measured 28 cycles TSC warp between CPUs, turning off TSC clock.
> > Marking TSC unstable due to check_tsc_sync_source failed
> >
> > Whereas, without the patch syncs pass fine on all CPUs
> >
> > checking TSC synchronization [CPU#0 -> CPU#1]: passed.
> >
> > Due to this, TSC is marke unstable, when it is not actually unstable.
> > This is because syncs in check_tsc_wrap() goes away due to this commit.
> >
> > As per the discussion on this thread, correct way to fix this is to add
> > explicit syncs as below?
>
> ah. Yes.
>
> Could you please check whether:
>
> > + rdtsc_barrier();
> > start = get_cycles();
> > + rdtsc_barrier();
> > /*
> > * The measurement runs for 20 msecs:
> > */
> > @@ -61,7 +63,9 @@ static __cpuinit void check_tsc_warp(voi
> > */
> > __raw_spin_lock(&sync_lock);
> > prev = last_tsc;
> > + rdtsc_barrier();
> > now = get_cycles();
> > + rdtsc_barrier();
>
> adding the barrier just _after_ the get_cycles() call (but not before
> it) does the trick too? That should be enough in this case.
>

With barrier only after get_cycles, I do see syncs across first few CPUs
passing. But later I see:

checking TSC synchronization [CPU#0 -> CPU#13]:
Measured 4 cycles TSC warp between CPUs, turning off TSC clock.
Marking TSC unstable due to check_tsc_sync_source failed

Thanks,
Venki

2008-11-17 23:13:25

by Ingo Molnar

[permalink] [raw]

Subject: Re: [git pull] scheduler updates

* Venki Pallipadi <[email protected]> wrote:

> On Mon, Nov 17, 2008 at 02:50:18PM -0800, Ingo Molnar wrote:
> >
> > * Venki Pallipadi <[email protected]> wrote:
> >
> > > Patch being discussed on this thread (commit 0d12cdd) has a
> > > regression on one of the test systems here.
> > >
> > > With the patch, I see
> > >
> > > checking TSC synchronization [CPU#0 -> CPU#1]:
> > > Measured 28 cycles TSC warp between CPUs, turning off TSC clock.
> > > Marking TSC unstable due to check_tsc_sync_source failed
> > >
> > > Whereas, without the patch syncs pass fine on all CPUs
> > >
> > > checking TSC synchronization [CPU#0 -> CPU#1]: passed.
> > >
> > > Due to this, TSC is marke unstable, when it is not actually unstable.
> > > This is because syncs in check_tsc_wrap() goes away due to this commit.
> > >
> > > As per the discussion on this thread, correct way to fix this is to add
> > > explicit syncs as below?
> >
> > ah. Yes.
> >
> > Could you please check whether:
> >
> > > + rdtsc_barrier();
> > > start = get_cycles();
> > > + rdtsc_barrier();
> > > /*
> > > * The measurement runs for 20 msecs:
> > > */
> > > @@ -61,7 +63,9 @@ static __cpuinit void check_tsc_warp(voi
> > > */
> > > __raw_spin_lock(&sync_lock);
> > > prev = last_tsc;
> > > + rdtsc_barrier();
> > > now = get_cycles();
> > > + rdtsc_barrier();
> >
> > adding the barrier just _after_ the get_cycles() call (but not before
> > it) does the trick too? That should be enough in this case.
> >
>
> With barrier only after get_cycles, I do see syncs across first few
> CPUs passing. But later I see:
>
> checking TSC synchronization [CPU#0 -> CPU#13]: Measured 4 cycles
> TSC warp between CPUs, turning off TSC clock. Marking TSC unstable
> due to check_tsc_sync_source failed

yeah - has to be surrounded, to make sure our last_tsc observation
does not happen after the RDTSC.

I have applied your patch to tip/x86/urgent, thanks!

Ingo