2003-11-04 23:23:23

by Joel Becker

[permalink] [raw]
Subject: get_cycles() on i386

Folks,
Certain distributions are building all of their SMP kernels
NUMA-aware. This is great, as the kernels support boxes like the x440
with no trouble. However, this implicitly disables CONFIG_X86_TSC.
While that is good for NUMA systems, and fine from a kernel timing
standpoint, it also eliminates any generic access to the TSC via
get_cycles(). With CONFIG_X86_TSC not defined, get_cycles() always
returns 0.
Given that >95% of machines will not be x440s, this means that a
user of that kernel cannot access a high resolution timer via
get_cycles(). I don't want to have to litter my code with rdtscll()
when I managed to remove it!
The proposed patch is trivial. If the system has a TSC, it is
available get_cycles(). This makes no change to the other parts of the
kernel protected by CONFIG_X86_TSC.

Joel

diff -uNr ../kernel-2.4.21-4.0.1.EL/linux-2.4.21/include/asm-i386/timex.h linux-2.4.21/include/asm-i386/timex.h
--- ../kernel-2.4.21-4.0.1.EL/linux-2.4.21/include/asm-i386/timex.h 2002-11-28 15:53:15.000000000 -0800
+++ linux-2.4.21/include/asm-i386/timex.h 2003-11-04 11:33:08.000000000 -0800
@@ -40,7 +40,7 @@

static inline cycles_t get_cycles (void)
{
-#ifndef CONFIG_X86_TSC
+#ifndef CONFIG_X86_HAS_TSC
return 0;
#else
unsigned long long ret;


--

"Hey mister if you're gonna walk on water,
Could you drop a line my way?"

Joel Becker
Senior Member of Technical Staff
Oracle Corporation
E-mail: [email protected]
Phone: (650) 506-8127


2003-11-04 23:29:35

by john stultz

[permalink] [raw]
Subject: Re: get_cycles() on i386

On Tue, 2003-11-04 at 15:22, Joel Becker wrote:
> Folks,
> Certain distributions are building all of their SMP kernels
> NUMA-aware. This is great, as the kernels support boxes like the x440
> with no trouble. However, this implicitly disables CONFIG_X86_TSC.
> While that is good for NUMA systems, and fine from a kernel timing
> standpoint, it also eliminates any generic access to the TSC via
> get_cycles(). With CONFIG_X86_TSC not defined, get_cycles() always
> returns 0.
> Given that >95% of machines will not be x440s, this means that a
> user of that kernel cannot access a high resolution timer via
> get_cycles(). I don't want to have to litter my code with rdtscll()
> when I managed to remove it!
> The proposed patch is trivial. If the system has a TSC, it is
> available get_cycles(). This makes no change to the other parts of the
> kernel protected by CONFIG_X86_TSC.

CONFIG_X86_TSC be the devil. Personally, I'd much prefer dropping the
compile time option and using dynamic detection. Something like (not
recently tested and i believe against 2.5.something, but you get the
idea):


diff -Nru a/include/asm-i386/timex.h b/include/asm-i386/timex.h
--- a/include/asm-i386/timex.h Mon Feb 24 21:09:32 2003
+++ b/include/asm-i386/timex.h Mon Feb 24 21:09:32 2003
@@ -40,14 +40,10 @@

static inline cycles_t get_cycles (void)
{
-#ifndef CONFIG_X86_TSC
- return 0;
-#else
- unsigned long long ret;
-
- rdtscll(ret);
+ unsigned long long ret = 0;
+ if(cpu_has_tsc)
+ rdtscll(ret);
return ret;
-#endif
}

extern unsigned long cpu_khz;


thanks
-john


2003-11-04 23:54:44

by Linus Torvalds

[permalink] [raw]
Subject: Re: get_cycles() on i386


On 4 Nov 2003, john stultz wrote:
>
> CONFIG_X86_TSC be the devil. Personally, I'd much prefer dropping the
> compile time option and using dynamic detection. Something like (not
> recently tested and i believe against 2.5.something, but you get the
> idea):

Some of the users are really timing-critical (eg scheduler).

How about just using the "alternative()" infrastructure that we already
have in 2.6.x for this? See <asm-i386/system.h> for details.

We don't have an "alternative_output()" available yet, but using that it
would look something like:

static inline unsigned long long get_cycle(void)
{
unsigned long long tsc;

alternative_output(
"xorl %%eax,%%eax ; xorl %%edx,%%edx",
"rdtsc",
X86_FEATURE_TSC,
"=A" (tsc));
return tsc;
}

which should allow for "perfect" code (well, gcc tends to mess up 64-bit
stuff, but you get the idea).

We use the "alternative_input()" thing for prefetch() handling (see
<asm-i386/processor.h>).

Linus

2003-11-05 03:02:54

by Nick Piggin

[permalink] [raw]
Subject: Re: get_cycles() on i386



Nick Piggin wrote:

>
>
> Linus Torvalds wrote:
>
>> On 4 Nov 2003, john stultz wrote:
>>
>>> CONFIG_X86_TSC be the devil. Personally, I'd much prefer dropping the
>>> compile time option and using dynamic detection. Something like (not
>>> recently tested and i believe against 2.5.something, but you get the
>>> idea):
>>>
>>
>> Some of the users are really timing-critical (eg scheduler).
>>
>
> The scheduler uses its own sched_clock which only gives jiffies
> resolution if CONFIG_NUMA is defined. Unfortunate because I think
> its interactive behaviour isn't so good with ms resolution.
>
> The scheduler does not need to have synchronised TSCs though, I think.
> It just means 2 more calls to sched_clock in a slow path (smp migration).
>
Well no, its much trickier than that I think :(



2003-11-05 02:57:51

by Nick Piggin

[permalink] [raw]
Subject: Re: get_cycles() on i386



Linus Torvalds wrote:

>On 4 Nov 2003, john stultz wrote:
>
>>CONFIG_X86_TSC be the devil. Personally, I'd much prefer dropping the
>>compile time option and using dynamic detection. Something like (not
>>recently tested and i believe against 2.5.something, but you get the
>>idea):
>>
>
>Some of the users are really timing-critical (eg scheduler).
>

The scheduler uses its own sched_clock which only gives jiffies
resolution if CONFIG_NUMA is defined. Unfortunate because I think
its interactive behaviour isn't so good with ms resolution.

The scheduler does not need to have synchronised TSCs though, I think.
It just means 2 more calls to sched_clock in a slow path (smp migration).


2003-11-05 13:38:17

by Marcelo Tosatti

[permalink] [raw]
Subject: Re: get_cycles() on i386



On Tue, 4 Nov 2003, Linus Torvalds wrote:

>
> On 4 Nov 2003, john stultz wrote:
> >
> > CONFIG_X86_TSC be the devil. Personally, I'd much prefer dropping the
> > compile time option and using dynamic detection. Something like (not
> > recently tested and i believe against 2.5.something, but you get the
> > idea):
>
> Some of the users are really timing-critical (eg scheduler).
>
> How about just using the "alternative()" infrastructure that we already
> have in 2.6.x for this? See <asm-i386/system.h> for details.
>
> We don't have an "alternative_output()" available yet, but using that it
> would look something like:
>
> static inline unsigned long long get_cycle(void)
> {
> unsigned long long tsc;
>
> alternative_output(
> "xorl %%eax,%%eax ; xorl %%edx,%%edx",
> "rdtsc",
> X86_FEATURE_TSC,
> "=A" (tsc));
> return tsc;
> }
>
> which should allow for "perfect" code (well, gcc tends to mess up 64-bit
> stuff, but you get the idea).
>
> We use the "alternative_input()" thing for prefetch() handling (see
> <asm-i386/processor.h>).

I'm not confident this is something for 2.4.

The "if (cpu_has_tsc)" fix from John sounds fine.