LinuxLists.cc - [PATCH] scheduler: fix x86 regression in native_sched

2007-12-07 01:23:39

Subject: [PATCH] scheduler: fix x86 regression in native_sched_clock

This patch fixes a regression introduced by:

commit bb29ab26863c022743143f27956cc0ca362f258c
Author: Ingo Molnar <[email protected]>
Date: Mon Jul 9 18:51:59 2007 +0200

This caused the jiffies counter to leap back and forth on cpufreq changes
on my x86 box. I'd say that we can't always assume that TSC does "small
errors" only, when marked unstable. On cpufreq changes these errors can be
huge.

The original bug report can be found here:
http://bugzilla.kernel.org/show_bug.cgi?id=9475

Signed-off-by: Stefano Brivio <[email protected]>

---

diff --git a/arch/x86/kernel/tsc_32.c b/arch/x86/kernel/tsc_32.c
index 9ebc0da..d29cd9c 100644
--- a/arch/x86/kernel/tsc_32.c
+++ b/arch/x86/kernel/tsc_32.c
@@ -98,13 +98,8 @@ unsigned long long native_sched_clock(void)

/*
* Fall back to jiffies if there's no TSC available:
- * ( But note that we still use it if the TSC is marked
- * unstable. We do this because unlike Time Of Day,
- * the scheduler clock tolerates small errors and it's
- * very important for it to be as fast as the platform
- * can achive it. )
*/
- if (unlikely(!tsc_enabled && !tsc_unstable))
+ if (unlikely(!tsc_enabled))
/* No locking but a rare wrong value is not a big deal: */
return (jiffies_64 - INITIAL_JIFFIES) * (1000000000 / HZ);

--
Ciao
Stefano

2007-12-07 05:29:50

by Nick Piggin

[permalink] [raw]

Subject: Re: [PATCH] scheduler: fix x86 regression in native_sched_clock

On Friday 07 December 2007 12:19, Stefano Brivio wrote:
> This patch fixes a regression introduced by:
>
> commit bb29ab26863c022743143f27956cc0ca362f258c
> Author: Ingo Molnar <[email protected]>
> Date: Mon Jul 9 18:51:59 2007 +0200
>
> This caused the jiffies counter to leap back and forth on cpufreq changes
> on my x86 box. I'd say that we can't always assume that TSC does "small
> errors" only, when marked unstable. On cpufreq changes these errors can be
> huge.
>
> The original bug report can be found here:
> http://bugzilla.kernel.org/show_bug.cgi?id=9475
>
>
> Signed-off-by: Stefano Brivio <[email protected]>

While your fix should probably go into 2.6.24...

This particular issue has aggravated me enough times. Let's
fix the damn thing properly already... I think what would work best
is a relatively simple change to the API along these lines:

Attachments:

(No filename) (869.00 B)
sched-clock.patch (7.64 kB)
Download all attachments

2007-12-07 05:53:25

by Thomas Gleixner

[permalink] [raw]

Subject: Re: [PATCH] scheduler: fix x86 regression in native_sched_clock

On Fri, 7 Dec 2007, Stefano Brivio wrote:

> This patch fixes a regression introduced by:
>
> commit bb29ab26863c022743143f27956cc0ca362f258c
> Author: Ingo Molnar <[email protected]>
> Date: Mon Jul 9 18:51:59 2007 +0200
>
> This caused the jiffies counter to leap back and forth on cpufreq changes
> on my x86 box. I'd say that we can't always assume that TSC does "small
> errors" only, when marked unstable. On cpufreq changes these errors can be
> huge.

Hmrpf. sched_clock() is used for the time stamp of the printks. We
need to find some better solution other than killing off the tsc
access completely.

Ingo ???

Thanks,

tglx

> The original bug report can be found here:
> http://bugzilla.kernel.org/show_bug.cgi?id=9475
>
>
> Signed-off-by: Stefano Brivio <[email protected]>
>
> ---
>
> diff --git a/arch/x86/kernel/tsc_32.c b/arch/x86/kernel/tsc_32.c
> index 9ebc0da..d29cd9c 100644
> --- a/arch/x86/kernel/tsc_32.c
> +++ b/arch/x86/kernel/tsc_32.c
> @@ -98,13 +98,8 @@ unsigned long long native_sched_clock(void)
>
> /*
> * Fall back to jiffies if there's no TSC available:
> - * ( But note that we still use it if the TSC is marked
> - * unstable. We do this because unlike Time Of Day,
> - * the scheduler clock tolerates small errors and it's
> - * very important for it to be as fast as the platform
> - * can achive it. )
> */
> - if (unlikely(!tsc_enabled && !tsc_unstable))
> + if (unlikely(!tsc_enabled))
> /* No locking but a rare wrong value is not a big deal: */
> return (jiffies_64 - INITIAL_JIFFIES) * (1000000000 / HZ);
>
>
> --
> Ciao
> Stefano
>

2007-12-07 07:18:37

by Guillaume Chazarain

[permalink] [raw]

Subject: Re: [PATCH] scheduler: fix x86 regression in native_sched_clock

On Dec 7, 2007 6:51 AM, Thomas Gleixner <[email protected]> wrote:
> Hmrpf. sched_clock() is used for the time stamp of the printks. We
> need to find some better solution other than killing off the tsc
> access completely.

Something like http://lkml.org/lkml/2007/3/16/291 that would need some refresh?

--
Guillaume

2007-12-07 08:09:47

by Guillaume Chazarain

[permalink] [raw]

Subject: Re: [PATCH] scheduler: fix x86 regression in native_sched_clock

"Guillaume Chazarain" <[email protected]> wrote:

> On Dec 7, 2007 6:51 AM, Thomas Gleixner <[email protected]> wrote:
> > Hmrpf. sched_clock() is used for the time stamp of the printks. We
> > need to find some better solution other than killing off the tsc
> > access completely.
>
> Something like http://lkml.org/lkml/2007/3/16/291 that would need some refresh?

And here is a refreshed one just for testing with 2.6-git. The 64 bit
part is a shamelessly untested copy/paste as I cannot test it.

diff --git a/arch/x86/kernel/tsc_32.c b/arch/x86/kernel/tsc_32.c
index 9ebc0da..d561b2f 100644
--- a/arch/x86/kernel/tsc_32.c
+++ b/arch/x86/kernel/tsc_32.c
@@ -5,6 +5,7 @@
#include <linux/jiffies.h>
#include <linux/init.h>
#include <linux/dmi.h>
+#include <linux/percpu.h>

#include <asm/delay.h>
#include <asm/tsc.h>
@@ -78,15 +79,32 @@ EXPORT_SYMBOL_GPL(check_tsc_unstable);
* cyc2ns_scale is limited to 10^6 * 2^10, which fits in 32 bits.
* ([email protected])
*
+ * ns += offset to avoid sched_clock jumps with cpufreq
+ *
* [email protected] "math is hard, lets go shopping!"
*/
-unsigned long cyc2ns_scale __read_mostly;

#define CYC2NS_SCALE_FACTOR 10 /* 2^10, carefully chosen */

-static inline void set_cyc2ns_scale(unsigned long cpu_khz)
+DEFINE_PER_CPU(struct cyc2ns_params, cyc2ns) __read_mostly;
+
+static void set_cyc2ns_scale(unsigned long cpu_khz)
{
- cyc2ns_scale = (1000000 << CYC2NS_SCALE_FACTOR)/cpu_khz;
+ struct cyc2ns_params *params;
+ unsigned long flags;
+ unsigned long long tsc_now, ns_now;
+
+ rdtscll(tsc_now);
+ params = &get_cpu_var(cyc2ns);
+
+ local_irq_save(flags);
+ ns_now = __cycles_2_ns(params, tsc_now);
+
+ params->scale = (NSEC_PER_MSEC << CYC2NS_SCALE_FACTOR)/cpu_khz;
+ params->offset += ns_now - __cycles_2_ns(params, tsc_now);
+ local_irq_restore(flags);
+
+ put_cpu_var(cyc2ns);
}

/*
diff --git a/arch/x86/kernel/tsc_64.c b/arch/x86/kernel/tsc_64.c
index 9c70af4..93e7a06 100644
--- a/arch/x86/kernel/tsc_64.c
+++ b/arch/x86/kernel/tsc_64.c
@@ -10,6 +10,7 @@

#include <asm/hpet.h>
#include <asm/timex.h>
+#include <asm/timer.h>

static int notsc __initdata = 0;

@@ -18,16 +19,25 @@ EXPORT_SYMBOL(cpu_khz);
unsigned int tsc_khz;
EXPORT_SYMBOL(tsc_khz);

-static unsigned int cyc2ns_scale __read_mostly;
+DEFINE_PER_CPU(struct cyc2ns_params, cyc2ns) __read_mostly;

-static inline void set_cyc2ns_scale(unsigned long khz)
+static void set_cyc2ns_scale(unsigned long cpu_khz)
{
- cyc2ns_scale = (NSEC_PER_MSEC << NS_SCALE) / khz;
-}
+ struct cyc2ns_params *params;
+ unsigned long flags;
+ unsigned long long tsc_now, ns_now;

-static unsigned long long cycles_2_ns(unsigned long long cyc)
-{
- return (cyc * cyc2ns_scale) >> NS_SCALE;
+ rdtscll(tsc_now);
+ params = &get_cpu_var(cyc2ns);
+
+ local_irq_save(flags);
+ ns_now = __cycles_2_ns(params, tsc_now);
+
+ params->scale = (NSEC_PER_MSEC << CYC2NS_SCALE_FACTOR)/cpu_khz;
+ params->offset += ns_now - __cycles_2_ns(params, tsc_now);
+ local_irq_restore(flags);
+
+ put_cpu_var(cyc2ns);
}

unsigned long long sched_clock(void)
diff --git a/include/asm-x86/timer.h b/include/asm-x86/timer.h
index 0db7e99..ff4f2a3 100644
--- a/include/asm-x86/timer.h
+++ b/include/asm-x86/timer.h
@@ -2,6 +2,7 @@
#define _ASMi386_TIMER_H
#include <linux/init.h>
#include <linux/pm.h>
+#include <linux/percpu.h>

#define TICK_SIZE (tick_nsec / 1000)

@@ -16,7 +17,7 @@ extern int recalibrate_cpu_khz(void);
#define calculate_cpu_khz() native_calculate_cpu_khz()
#endif

-/* Accellerators for sched_clock()
+/* Accelerators for sched_clock()
* convert from cycles(64bits) => nanoseconds (64bits)
* basic equation:
* ns = cycles / (freq / ns_per_sec)
@@ -31,20 +32,44 @@ extern int recalibrate_cpu_khz(void);
* And since SC is a constant power of two, we can convert the div
* into a shift.
*
- * We can use khz divisor instead of mhz to keep a better percision, since
+ * We can use khz divisor instead of mhz to keep a better precision, since
* cyc2ns_scale is limited to 10^6 * 2^10, which fits in 32 bits.
* ([email protected])
*
+ * ns += offset to avoid sched_clock jumps with cpufreq
+ *
* [email protected] "math is hard, lets go shopping!"
*/
-extern unsigned long cyc2ns_scale __read_mostly;
+
+struct cyc2ns_params {
+ unsigned long scale;
+ unsigned long long offset;
+};
+
+DECLARE_PER_CPU(struct cyc2ns_params, cyc2ns) __read_mostly;

#define CYC2NS_SCALE_FACTOR 10 /* 2^10, carefully chosen */

-static inline unsigned long long cycles_2_ns(unsigned long long cyc)
+static inline unsigned long long __cycles_2_ns(struct cyc2ns_params *params,
+ unsigned long long cyc)
{
- return (cyc * cyc2ns_scale) >> CYC2NS_SCALE_FACTOR;
+ return ((cyc * params->scale) >> CYC2NS_SCALE_FACTOR) + params->offset;
}

+static inline unsigned long long cycles_2_ns(unsigned long long cyc)
+{
+ struct cyc2ns_params *params;
+ unsigned long flags;
+ unsigned long long ns;
+
+ params = &get_cpu_var(cyc2ns);
+
+ local_irq_save(flags);
+ ns = __cycles_2_ns(params, cyc);
+ local_irq_restore(flags);
+
+ put_cpu_var(cyc2ns);
+ return ns;
+}

#endif

--
Guillaume

2007-12-07 08:47:11

Subject: [PATCH] scheduler: fix x86 regression in native_sched_clock

Subject: Re: [PATCH] scheduler: fix x86 regression in native_sched_clock

Attachments:

Subject: Re: [PATCH] scheduler: fix x86 regression in native_sched_clock

Subject: Re: [PATCH] scheduler: fix x86 regression in native_sched_clock

Subject: Re: [PATCH] scheduler: fix x86 regression in native_sched_clock

Subject: Re: [PATCH] scheduler: fix x86 regression in native_sched_clock

Subject: Re: [PATCH] scheduler: fix x86 regression in native_sched_clock

Subject: Re: [PATCH] scheduler: fix x86 regression in native_sched_clock

Subject: Re: [PATCH] scheduler: fix x86 regression in native_sched_clock

Subject: Re: [PATCH] scheduler: fix x86 regression in native_sched_clock

Subject: Re: [PATCH] scheduler: fix x86 regression in native_sched_clock

Subject: Re: [PATCH] scheduler: fix x86 regression in native_sched_clock

Subject: Re: [PATCH] scheduler: fix x86 regression in native_sched_clock

Subject: Re: [PATCH] scheduler: fix x86 regression in native_sched_clock

Subject: Re: [PATCH] scheduler: fix x86 regression in native_sched_clock

Subject: Re: [PATCH] scheduler: fix x86 regression in native_sched_clock

Subject: Re: [PATCH] scheduler: fix x86 regression in native_sched_clock

Subject: Re: [PATCH] scheduler: fix x86 regression in native_sched_clock

Subject: Re: [PATCH] scheduler: fix x86 regression in native_sched_clock

Subject: Re: [PATCH] scheduler: fix x86 regression in native_sched_clock

Subject: Re: [PATCH] scheduler: fix x86 regression in native_sched_clock

Subject: Re: [PATCH] scheduler: fix x86 regression in native_sched_clock

Subject: Re: [PATCH] scheduler: fix x86 regression in native_sched_clock

Subject: Re: [PATCH] scheduler: fix x86 regression in native_sched_clock

Subject: Re: [PATCH] scheduler: fix x86 regression in native_sched_clock

Subject: [patch] x86: scale cyc_2_nsec according to CPU frequency

Subject: Re: [patch] x86: scale cyc_2_nsec according to CPU frequency

Subject: Re: [patch] x86: scale cyc_2_nsec according to CPU frequency

Subject: Re: [PATCH] scheduler: fix x86 regression in native_sched_clock

Subject: Re: [PATCH] scheduler: fix x86 regression in native_sched_clock

Subject: Re: [PATCH] scheduler: fix x86 regression in native_sched_clock

Subject: Re: [PATCH] scheduler: fix x86 regression in native_sched_clock

Subject: Re: [PATCH] scheduler: fix x86 regression in native_sched_clock

Subject: Re: [PATCH] scheduler: fix x86 regression in native_sched_clock

Subject: Re: [PATCH] scheduler: fix x86 regression in native_sched_clock

Subject: Re: [PATCH] scheduler: fix x86 regression in native_sched_clock

Subject: Re: [PATCH] scheduler: fix x86 regression in native_sched_clock

Subject: Re: [PATCH] scheduler: fix x86 regression in native_sched_clock

Subject: Re: [PATCH] scheduler: fix x86 regression in native_sched_clock

Subject: Re: [PATCH] scheduler: fix x86 regression in native_sched_clock

Subject: Re: [PATCH] scheduler: fix x86 regression in native_sched_clock

Subject: Re: [patch] x86: scale cyc_2_nsec according to CPU frequency

Subject: Re: [patch] x86: scale cyc_2_nsec according to CPU frequency

Subject: Re: [patch] x86: scale cyc_2_nsec according to CPU frequency

Subject: Re: [PATCH] scheduler: fix x86 regression in native_sched_clock

Subject: Re: [PATCH] scheduler: fix x86 regression in native_sched_clock

Subject: Re: [PATCH] scheduler: fix x86 regression in native_sched_clock