2008-01-12 19:48:34

by dean gaudet

[permalink] [raw]
Subject: nosmp/maxcpus=0 or 1 -> TSC unstable

if i boot an x86 64-bit 2.6.24-rc7 kernel with nosmp, maxcpus=0 or 1 it
still disables TSC :)

Marking TSC unstable due to TSCs unsynchronized

this is an opteron 2xx box which does have two cpus and no clock-divide in
halt or cpufreq enabled so TSC should be fine with only one cpu.

pretty sure this is the culprit is that num_possible_cpus() > 1, which
would mean cpu_possible_map contains the second cpu... but i'm not quite
sure what the right fix is... or perhaps this is all intended.

-dean


2008-01-15 22:50:49

by Pete Wyckoff

[permalink] [raw]
Subject: Re: nosmp/maxcpus=0 or 1 -> TSC unstable

[email protected] wrote on Sat, 12 Jan 2008 11:48 -0800:
> if i boot an x86 64-bit 2.6.24-rc7 kernel with nosmp, maxcpus=0 or 1 it
> still disables TSC :)
>
> Marking TSC unstable due to TSCs unsynchronized
>
> this is an opteron 2xx box which does have two cpus and no clock-divide in
> halt or cpufreq enabled so TSC should be fine with only one cpu.
>
> pretty sure this is the culprit is that num_possible_cpus() > 1, which
> would mean cpu_possible_map contains the second cpu... but i'm not quite
> sure what the right fix is... or perhaps this is all intended.

We've seen the same problem. We use gettimeofday() for timing of
network-ish operations on the order of 10-50 us. But not having
the TSC makes gettimeofday() itself very slow, on the order of 30 us.

Here's what we've been using for quite a few kernel versions. I've
not tried to submit it for fear that it could break some other
scenario, as you suggest. Although in hotplug scenarios, this
function unsynchronized_tsc() should get rerun and disable TSC if
more processors arrive.

At least count this as a "me too".

-- Pete


>From 0cdcd494bc0e27f49438bc2fc72fd3823629802b Mon Sep 17 00:00:00 2001
From: Pete Wyckoff <[email protected]>
Date: Tue, 15 Jan 2008 17:42:28 -0500
Subject: [PATCH] use tsc on 1 cpu smp

Use num_online_cpus() instead of num_present_cpus() as the
parameter to check when deciding if TSC is good enough. Thus
explicitly booting with maxcpus=1 will let us use the TSC even on
a dual-processor machine. This helps reduce gettimeofday
overheads on our dual Opteron nodes immensely (30 us vs 0.5 us).

Signed-off-by: Pete Wyckoff <[email protected]>
---
arch/x86/kernel/tsc_64.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kernel/tsc_64.c b/arch/x86/kernel/tsc_64.c
index 9c70af4..5f2e91f 100644
--- a/arch/x86/kernel/tsc_64.c
+++ b/arch/x86/kernel/tsc_64.c
@@ -235,7 +235,7 @@ __cpuinit int unsynchronized_tsc(void)
}

/* Assume multi socket systems are not synchronized */
- return num_present_cpus() > 1;
+ return num_online_cpus() > 1;
}

int __init notsc_setup(char *s)
--
1.5.3.7

2008-01-16 04:13:52

by Andi Kleen

[permalink] [raw]
Subject: Re: nosmp/maxcpus=0 or 1 -> TSC unstable

Pete Wyckoff <[email protected]> writes:

> We've seen the same problem. We use gettimeofday() for timing of
> network-ish operations on the order of 10-50 us. But not having
> the TSC makes gettimeofday() itself very slow, on the order of 30 us.
>
> Here's what we've been using for quite a few kernel versions. I've
> not tried to submit it for fear that it could break some other
> scenario, as you suggest. Although in hotplug scenarios, this
> function unsynchronized_tsc() should get rerun and disable TSC if
> more processors arrive.
>
> At least count this as a "me too".

The patch is wrong of course because when this is checked not
all CPUs are booted yet. So it will always use TSC even when
multiple CPUs are going to be booted.

The right fix for Dean's problem would be probably to add a new
parameter that disables CPU hotplug and forces smp_possible_map
to max_cpus, which could then be set with maxcpus=1 (or similar)

I would not recommend to use nosmp or maxcpus=0 either because it will
disable the APIC and that is typically a bad thing (especially if you
need network performance)

-Andi

2008-01-16 15:31:59

by Ingo Molnar

[permalink] [raw]
Subject: Re: nosmp/maxcpus=0 or 1 -> TSC unstable


* Pete Wyckoff <[email protected]> wrote:

> > pretty sure this is the culprit is that num_possible_cpus() > 1,
> > which would mean cpu_possible_map contains the second cpu... but i'm
> > not quite sure what the right fix is... or perhaps this is all
> > intended.
>
> We've seen the same problem. We use gettimeofday() for timing of
> network-ish operations on the order of 10-50 us. But not having the
> TSC makes gettimeofday() itself very slow, on the order of 30 us.

30 usecs is too much - even with pmtimer it's typically below 5 usecs.
Could you run this on your box:

http://people.redhat.com/mingo/time-warp-test/time-warp-test.c

and send back what it reports? (run it for a few minutes)

Ingo

2008-01-16 16:24:20

by Pete Wyckoff

[permalink] [raw]
Subject: Re: nosmp/maxcpus=0 or 1 -> TSC unstable

[email protected] wrote on Wed, 16 Jan 2008 16:31 +0100:
>
> * Pete Wyckoff <[email protected]> wrote:
>
> > > pretty sure this is the culprit is that num_possible_cpus() > 1,
> > > which would mean cpu_possible_map contains the second cpu... but i'm
> > > not quite sure what the right fix is... or perhaps this is all
> > > intended.
> >
> > We've seen the same problem. We use gettimeofday() for timing of
> > network-ish operations on the order of 10-50 us. But not having the
> > TSC makes gettimeofday() itself very slow, on the order of 30 us.
>
> 30 usecs is too much - even with pmtimer it's typically below 5 usecs.
> Could you run this on your box:
>
> http://people.redhat.com/mingo/time-warp-test/time-warp-test.c
>
> and send back what it reports? (run it for a few minutes)

You're right. That 30 us comes from an old comment. Testing with
your code shows under 5 us as you expected. I had to hack out the
#ifndef i386 error; compiling on x86-64. Dual-socket 2.4 GHz
Opteron 250. TSC-warps grows continually in all situations.

On 2.6.24-rc6 + scsi-misc + random stuff, 2 processors:

| 0.39 us, TSC-warps:983002775 | 4.81 us, TOD-warps:0 | 4.81 us, CLOCK-warps:0

With "maxcpus=1", no broken patch to force TSC:

| 0.33 us, TSC-warps:679679972 | 3.30 us, TOD-warps:0 | 3.30 us, CLOCK-warps:0

With "maxcpus=1", including my broken patch to force use of TSC in
this situation:

| 0.05 us, TSC-warps:2884019968 | 0.45 us, TOD-warps:0 | 0.45 us, CLOCK-warps:0

For giggles, an older fedora kernel (2.6.23.1-42.fc8) gives:

| 0.87 us, TSC-warps:575054334 | 8.67 us, TOD-warps:0 | 8.67 us, CLOCK-warps:0

-- Pete