LinuxLists.cc - [patch 00/21] high resolution timers / dynamic ticks

[permalink] [raw]

Subject: Re: [patch 00/21] high resolution timers / dynamic ticks - V2

On Sun, 01 Oct 2006 22:59:01 -0000, Thomas Gleixner said:

> the following patch series is an update in response to your review.

This complains if you try to compile with -Werror-implicit-function-declaration
and rightly so, as we're missing a #include to define IS_ERR_VALUE().

Patch attached.

Signed-off-by: Valdis Kletnieks <[email protected]>

--- linux-2.6.18-mm2/kernel/hrtimer.c.buggy 2006-10-02 00:46:50.000000000 -0400
+++ linux-2.6.18-mm2/kernel/hrtimer.c 2006-10-02 01:02:55.000000000 -0400
@@ -43,6 +43,7 @@
#include <linux/clockchips.h>
#include <linux/profile.h>
#include <linux/seq_file.h>
+#include <linux/err.h>

#include <asm/uaccess.h>

--- linux-2.6.18-mm2/kernel/time/clockevents.c.buggy 2006-10-02 00:46:50.000000000 -0400
+++ linux-2.6.18-mm2/kernel/time/clockevents.c 2006-10-02 01:04:22.000000000 -0400
@@ -33,6 +33,7 @@
#include <linux/profile.h>
#include <linux/sysdev.h>
#include <linux/hrtimer.h>
+#include <linux/err.h>

#define MAX_CLOCK_EVENTS 4
#define GLOBAL_CLOCK_EVENT MAX_CLOCK_EVENTS

Attachments:

(No filename) (226.00 B)

2006-10-02 13:03:04

[permalink] [raw]

Subject: Re: [patch 00/21] high resolution timers / dynamic ticks - V2

On Sun, 01 Oct 2006 22:59:01 -0000, Thomas Gleixner said:
> the following patch series is an update in response to your review.

First runtime results - no lockups or other severe badness in a half-hour or so
of running.

-mm2-hrt-dynticks5 shows severe clock drift issues if you run 'cpuspeed'.

Using speedstep-ich as a kernel built-in, and cpuspeed is invoked as:

cpuspeed -d -n -i 10 -p 10 50 -a /proc/acpi/ac_adapter/*/state

If cpuspeed drops the CPU speed from the default 1.6Ghz down to 1.2Ghz (the
only 2 speeds available on this core), the system clock proceeds to lose
about 15 seconds a minute. I haven't dug further into why yet. (If the system
is busy so cpuspeed keeps the processor at 1.6Ghz, the clock doesn't drift
as much - so it looks like a "when speed is 1.2Ghz" issue...)

I'm also seeing gkrellm reporting about 25% CPU use when "near-idle" (X is up
but not much is going on) when that's usually down around 5-6%. I need to
collect some oprofile numbers and investigate that as well.

Attachments:

(No filename) (226.00 B)

2006-10-02 13:40:59

[permalink] [raw]

Subject: Re: [patch 00/21] high resolution timers / dynamic ticks - V2

On Mon, 2006-10-02 at 09:02 -0400, [email protected] wrote:
> On Sun, 01 Oct 2006 22:59:01 -0000, Thomas Gleixner said:
> > the following patch series is an update in response to your review.
>
> First runtime results - no lockups or other severe badness in a half-hour or so
> of running.
>
> -mm2-hrt-dynticks5 shows severe clock drift issues if you run 'cpuspeed'.
>
> Using speedstep-ich as a kernel built-in, and cpuspeed is invoked as:
>
> cpuspeed -d -n -i 10 -p 10 50 -a /proc/acpi/ac_adapter/*/state
>
> If cpuspeed drops the CPU speed from the default 1.6Ghz down to 1.2Ghz (the
> only 2 speeds available on this core), the system clock proceeds to lose
> about 15 seconds a minute. I haven't dug further into why yet. (If the system
> is busy so cpuspeed keeps the processor at 1.6Ghz, the clock doesn't drift
> as much - so it looks like a "when speed is 1.2Ghz" issue...)

Can you please send me the bootlog and further dmesg output (especially
when related to timers / cpufreq).

> I'm also seeing gkrellm reporting about 25% CPU use when "near-idle" (X is up
> but not much is going on) when that's usually down around 5-6%. I need to
> collect some oprofile numbers and investigate that as well.

I look into the accounting fixups again.

tglx

2006-10-02 18:26:08

[permalink] [raw]

Subject: Re: [patch 00/21] high resolution timers / dynamic ticks - V2

(Sorry for the size of the note, there's some 50K of logs attached)

On Mon, 02 Oct 2006 15:43:02 +0200, Thomas Gleixner said:
> Can you please send me the bootlog and further dmesg output (especially
> when related to timers / cpufreq).

I booted the box to single-user both times, and then started cpuspeed.
I then did a cat of /proc/interrupts, /proc/uptime, and a date command,
waited 60 seconds according to my watch, and repeated. I then dumped
the dmesg. The -dyntick kernel moved 'uptime' almost exactly 45 seconds
(almost certainly a by-product of running at 1.2Ghz rather than 1.6Ghz).
Does the dyntick code make any unwritten assumptions about a jiffie or
bogomips remaining constant?

Attached - config diff, date and /proc dumps from both -mm2 and -mm2-dyntick,
and the dmesg's from both boots.

Yell if you have any other questions/suggestions/etc..

> > I'm also seeing gkrellm reporting about 25% CPU use when "near-idle" (X is up
> > but not much is going on) when that's usually down around 5-6%. I need to
> > collect some oprofile numbers and investigate that as well.
>
> I look into the accounting fixups again.
I still need to get oprofile runs of this and see what's going on.

Attachments:

config.diff (962.00 B)
config.diff date.mm2 (1.32 kB)
date.mm2 date.mm2-dyntick (1.32 kB)
date.mm2-dyntick dmesg.mm2 (26.23 kB)
dmesg.mm2 dmesg.mm2-dyntick (26.31 kB)
dmesg.mm2-dyntick (No filename) (226.00 B)
Download all attachments

2006-10-02 18:38:43

by john stultz

[permalink] [raw]

Subject: Re: [patch 00/21] high resolution timers / dynamic ticks - V2

On Mon, 2006-10-02 at 14:25 -0400, [email protected] wrote:
> (Sorry for the size of the note, there's some 50K of logs attached)
>
> On Mon, 02 Oct 2006 15:43:02 +0200, Thomas Gleixner said:
> > Can you please send me the bootlog and further dmesg output (especially
> > when related to timers / cpufreq).
>
> I booted the box to single-user both times, and then started cpuspeed.
> I then did a cat of /proc/interrupts, /proc/uptime, and a date command,
> waited 60 seconds according to my watch, and repeated. I then dumped
> the dmesg. The -dyntick kernel moved 'uptime' almost exactly 45 seconds
> (almost certainly a by-product of running at 1.2Ghz rather than 1.6Ghz).
> Does the dyntick code make any unwritten assumptions about a jiffie or
> bogomips remaining constant?
>
> Attached - config diff, date and /proc dumps from both -mm2 and -mm2-dyntick,
> and the dmesg's from both boots.
>
> Yell if you have any other questions/suggestions/etc..

Hmmm. So w/ -mm2 we're seeing the TSC get detected as running too slowly
(and its replaced w/ the ACPI PM), but for some reason that doesn't
happen w/ the dynticks patch.

Now, how is cpuspeed changing the cpufreq? Is it using the /sys
interface? I've got hooks in so when the cpufreq changes we should mark
it unstable and fall back to ACPI PM, but maybe I missed whatever hook
cpuspeed is using.

thanks
-john

2006-10-02 18:41:24

[permalink] [raw]

Subject: [patch] dynticks core: Fix idle time accounting

On Mon, 2006-10-02 at 14:25 -0400, [email protected] wrote:
> > > I'm also seeing gkrellm reporting about 25% CPU use when "near-idle" (X is up
> > > but not much is going on) when that's usually down around 5-6%. I need to
> > > collect some oprofile numbers and investigate that as well.
> >
> > I look into the accounting fixups again.
> I still need to get oprofile runs of this and see what's going on.

The patch below fixes the accounting weirdness.

tglx

----------------->
Subject: dynticks core: Fix idle time accounting
From: Thomas Gleixner <tglx@linutronix,de>

The extended sleeps during idle must be accounted to the idle thread.
The original accounting fixup was too naive. The time must be accounted
when the idle thread is interrupted and the jiffies update code has
forwarded jiffies. Otherwise the accounting is done on random targets.

Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
--

kernel/hrtimer.c | 34 +++++++++++++++++++---------------
1 file changed, 19 insertions(+), 15 deletions(-)

Index: linux-2.6.18-mm2/kernel/hrtimer.c
===================================================================
--- linux-2.6.18-mm2.orig/kernel/hrtimer.c 2006-10-02 19:22:14.000000000 +0200
+++ linux-2.6.18-mm2/kernel/hrtimer.c 2006-10-02 19:22:14.000000000 +0200
@@ -44,6 +44,7 @@
#include <linux/profile.h>
#include <linux/seq_file.h>
#include <linux/err.h>
+#include <linux/kernel_stat.h>

#include <asm/uaccess.h>

@@ -447,10 +448,11 @@ static const ktime_t nsec_per_hz = { .tv
* want to wake up a complete idle cpu just to update jiffies, so we need
* something more intellegent than a mere "do this only on CPUx".
*/
-static void update_jiffies64(ktime_t now)
+static unsigned long update_jiffies64(ktime_t now)
{
unsigned long seq;
ktime_t delta;
+ unsigned long ticks = 0;

/* Preevaluate to avoid lock contention */
do {
@@ -459,14 +461,13 @@ static void update_jiffies64(ktime_t now
} while (read_seqretry(&xtime_lock, seq));

if (delta.tv64 < nsec_per_hz.tv64)
- return;
+ return 0;

/* Reevalute with xtime_lock held */
write_seqlock(&xtime_lock);

delta = ktime_sub(now, last_jiffies_update);
if (delta.tv64 >= nsec_per_hz.tv64) {
- unsigned long ticks = 1;

delta = ktime_sub(delta, nsec_per_hz);
last_jiffies_update = ktime_add(last_jiffies_update,
@@ -480,11 +481,13 @@ static void update_jiffies64(ktime_t now

last_jiffies_update = ktime_add_ns(last_jiffies_update,
incr * ticks);
- ticks++;
}
+ ticks++;
do_timer(ticks);
}
write_sequnlock(&xtime_lock);
+
+ return ticks;
}

#ifdef CONFIG_NO_HZ
@@ -500,7 +503,7 @@ static void update_jiffies64(ktime_t now
void hrtimer_update_jiffies(void)
{
struct hrtimer_cpu_base *cpu_base = &__get_cpu_var(hrtimer_bases);
- unsigned long flags;
+ unsigned long flags, ticks;
ktime_t now;

if (!cpu_base->tick_stopped || !cpu_base->hres_active)
@@ -509,7 +512,17 @@ void hrtimer_update_jiffies(void)
now = ktime_get();

local_irq_save(flags);
- update_jiffies64(now);
+ ticks = update_jiffies64(now);
+ if (ticks) {
+ /*
+ * We stopped the tick in idle and this function got called to
+ * update jiffies. Update process times would randomly account
+ * the time we slept to whatever the context of the next sched
+ * tick is. Enforce that this is accounted to idle !
+ */
+ account_system_time(current, HARDIRQ_OFFSET,
+ jiffies_to_cputime(ticks));
+ }
local_irq_restore(flags);
}

@@ -604,15 +617,6 @@ void hrtimer_restart_sched_tick(void)
local_irq_disable();
update_jiffies64(now);

- /*
- * Update process times would randomly account the time we slept to
- * whatever the context of the next sched tick is. Enforce that this
- * is accounted to idle !
- */
- add_preempt_count(HARDIRQ_OFFSET);
- update_process_times(0);
- sub_preempt_count(HARDIRQ_OFFSET);
-
/* Account the idle time */
delta = ktime_sub(now, cpu_base->idle_entrytime);
cpu_base->idle_sleeptime = ktime_add(cpu_base->idle_sleeptime, delta);

2006-10-02 19:09:04

[permalink] [raw]

Subject: Re: [patch 00/21] high resolution timers / dynamic ticks - V2

On Mon, 02 Oct 2006 11:38:36 PDT, john stultz said:

> Hmmm. So w/ -mm2 we're seeing the TSC get detected as running too slowly
> (and its replaced w/ the ACPI PM), but for some reason that doesn't
> happen w/ the dynticks patch.

It's been switching to ACPI PM for somewhere near forever, I never bothered
to check into that because the PM timer provides a reasonably stable clock
source (it drifts at about 24 ppm and NTP is happy with it, and I haven't
gotten annoyed at the fact the PM timer is slow to read...)

I wonder if the TSC has been broken for forever on this box, and I'm just
seeing it because dynticks doesn't fall over to PM timer..

> Now, how is cpuspeed changing the cpufreq? Is it using the /sys
> interface? I've got hooks in so when the cpufreq changes we should mark
> it unstable and fall back to ACPI PM, but maybe I missed whatever hook
> cpuspeed is using.

Looking at the source, it appears to do this:

const char SYSFS_CURRENT_SPEED_FILE[] =
"/sys/devices/system/cpu/cpu%u/cpufreq/scaling_setspeed";

// set the current CPU speed
void set_speed(unsigned value)
{
#ifdef DEBUG
fprintf(stderr, "[cpu%u] Setting speed to: %uKHz\n", cpu, value);
#endif
write_line(CURRENT_SPEED_FILE, "%u\n", value);
// give CPU / chipset voltage time to settle down
usleep(10000);
}

Attachments:

(No filename) (226.00 B)

2006-10-02 19:23:31

by john stultz

[permalink] [raw]

Subject: Re: [patch 00/21] high resolution timers / dynamic ticks - V2

On Mon, 2006-10-02 at 15:08 -0400, [email protected] wrote:
> On Mon, 02 Oct 2006 11:38:36 PDT, john stultz said:
>
> > Hmmm. So w/ -mm2 we're seeing the TSC get detected as running too slowly
> > (and its replaced w/ the ACPI PM), but for some reason that doesn't
> > happen w/ the dynticks patch.
>
> It's been switching to ACPI PM for somewhere near forever, I never bothered
> to check into that because the PM timer provides a reasonably stable clock
> source (it drifts at about 24 ppm and NTP is happy with it, and I haven't
> gotten annoyed at the fact the PM timer is slow to read...)
>
> I wonder if the TSC has been broken for forever on this box, and I'm just
> seeing it because dynticks doesn't fall over to PM timer..

This is what I suspect is the issue. Probably due to the new jiffies
accounting being now time based, and one of the TSC unstable checks (the
one you're tripping) being jiffies based. A tad bit circular :). I'm
working w/ tglx to see what we can do here.

> > Now, how is cpuspeed changing the cpufreq? Is it using the /sys
> > interface? I've got hooks in so when the cpufreq changes we should mark
> > it unstable and fall back to ACPI PM, but maybe I missed whatever hook
> > cpuspeed is using.
>
> Looking at the source, it appears to do this:
>
> const char SYSFS_CURRENT_SPEED_FILE[] =
> "/sys/devices/system/cpu/cpu%u/cpufreq/scaling_setspeed";
>
> // set the current CPU speed
> void set_speed(unsigned value)
> {
> #ifdef DEBUG
> fprintf(stderr, "[cpu%u] Setting speed to: %uKHz\n", cpu, value);
> #endif
> write_line(CURRENT_SPEED_FILE, "%u\n", value);
> // give CPU / chipset voltage time to settle down
> usleep(10000);
> }

I'll also take a peek there and see if I cannot add an extra hook, so we
don't have to rely on the jiffies stability check.

thanks
-john

2006-10-02 20:23:35

[permalink] [raw]

Subject: Re: [patch] dynticks core: Fix idle time accounting

On Mon, 02 Oct 2006 20:43:26 +0200, Thomas Gleixner said:
>
> The patch below fixes the accounting weirdness.

I think it's still slightly defective, or at least suffering from a
disjoint between what is going on - the numbers reported in /proc/stats
add up to the total number of timer interrupts, but that's not necessarily
representative of what happened...

% cat /proc/stat;sleep 15;cat /proc/stat
cpu 27634 0 7762 20470 881 331 252 0
cpu0 27634 0 7762 20470 881 331 252 0
intr 812332 631476 2960 0 4 4 12667 3 14 1 1 4 142891 114 0 22193 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
ctxt 2187603
btime 1159817297
processes 4028
procs_running 1
procs_blocked 0
nohz total I:397276 S:379955 T:1187.393123 A:0.003125 E: 629447
cpu 27753 0 7818 20739 881 332 253 0
cpu0 27753 0 7818 20739 881 332 253 0
intr 819027 636542 2969 0 4 4 12801 3 14 1 1 4 144371 114 0 22199 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
ctxt 2209881
btime 1159817297
processes 4033
procs_running 1
procs_blocked 0
nohz total I:401991 S:384494 T:1200.732924 A:0.003122 E: 634513

And the deltas between the sums for cpu0 are equal to the difference of
the first intr (where the timer is) - ticking along at about 446/sec over
that 15 second timeframe. And sure enough, the 'user' field is about 1/3 of
the total interrupts.

The breakage is that userspace tools like gkrellm and vmstat and top are quite
happy in saying "oh, we averaged 446 ticks/sec over the last N seconds? That's
odd, but I can deal..." but unfortunately, treating all of them the same
"width" - and the idle ones are probably twice as wide if not wider. I'm not
sure how to fix that.

(The "thought experiment" for this - imagine over a 10 second period, an idle
machine takes 100 short timeslices for a running process, and 100 very long
sleeps 10 times as long as the first 100. What should /proc/stats report at
that point?)

Attachments:

(No filename) (226.00 B)

2006-10-02 21:20:34

[permalink] [raw]

Subject: Re: [patch] dynticks core: Fix idle time accounting

On Mon, 2006-10-02 at 16:17 -0400, [email protected] wrote:
> cpu 27634 0 7762 20470 881 331 252 0
> cpu0 27634 0 7762 20470 881 331 252 0
> intr 812332 631476 2960 0 4 4 12667 3 14 1 1 4 142891 114 0 22193 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> ctxt 2187603
> btime 1159817297
> processes 4028
> procs_running 1
> procs_blocked 0
> nohz total I:397276 S:379955 T:1187.393123 A:0.003125 E: 629447
> cpu 27753 0 7818 20739 881 332 253 0
> cpu0 27753 0 7818 20739 881 332 253 0
> intr 819027 636542 2969 0 4 4 12801 3 14 1 1 4 144371 114 0 22199 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> ctxt 2209881
> btime 1159817297
> processes 4033
> procs_running 1
> procs_blocked 0
> nohz total I:401991 S:384494 T:1200.732924 A:0.003122 E: 634513

Strange.

/me digs deeper

tglx

2006-10-02 21:35:41

[permalink] [raw]

Subject: Re: [patch] dynticks core: Fix idle time accounting

On Mon, 02 Oct 2006 23:22:38 +0200, Thomas Gleixner said:
> On Mon, 2006-10-02 at 16:17 -0400, [email protected] wrote:
> > cpu 27634 0 7762 20470 881 331 252 0
> > cpu0 27634 0 7762 20470 881 331 252 0
> > intr 812332 631476 2960 0 4 4 12667 3 14 1 1 4 142891 114 0 22193 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0
> > ctxt 2187603
> > btime 1159817297
> > processes 4028
> > procs_running 1
> > procs_blocked 0
> > nohz total I:397276 S:379955 T:1187.393123 A:0.003125 E: 629447
> > cpu 27753 0 7818 20739 881 332 253 0
> > cpu0 27753 0 7818 20739 881 332 253 0
> > intr 819027 636542 2969 0 4 4 12801 3 14 1 1 4 144371 114 0 22199 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0
> > ctxt 2209881
> > btime 1159817297
> > processes 4033
> > procs_running 1
> > procs_blocked 0
> > nohz total I:401991 S:384494 T:1200.732924 A:0.003122 E: 634513
>
> Strange.
>
> /me digs deeper

Not really strange at all - between code inspection and checking other stuff,
I'm now convinced the *counts* of "was the previous tick user/nice/system/idle"
reported in the cpu0 lines are accurate and report the relative counts
correctly. The problem is that userspace tools are assuming that all the ticks
reported are created equal. "We had 200 ticks, total, 100 were user and 100
were idle, so we were at 50/50 user/idle" - but in reality we had 100 10-ms
user ticks and 100 100-ms idle ticks and and only about 10% busy.....

We could "pump up" the relative counts - if 1 no-hz tick would have been 5ms
long, increment the count by 5 rather than 1 (for an alledged 1khz tick).
However, when we do that, we break the property that the sum of the ticks
in the 'cpu0' line is equal to the number of timer interrupts reported in the
'intr' line.

Like I said - unclear how to fix this....

Attachments:

(No filename) (226.00 B)

2006-10-03 03:27:10

[permalink] [raw]

Subject: Re: [patch 00/21] high resolution timers / dynamic ticks - V2

On Sun, 01 Oct 2006 23:00:45 -0000
Thomas Gleixner <[email protected]> wrote:

> We did not address the GTOD patches, as we want to wait for John's input on
> your comments.

I note that the default CONFIG_HIGH_RES_RESOLUTION is still 1000 (one
microsecond), which is far higher resolution than you actually recommend.

I did query that last time around. I'd prefer not to have to go back and
re-review it all, please...

2006-10-03 04:01:27

[permalink] [raw]

Subject: Re: [patch 00/21] high resolution timers / dynamic ticks - V2

These patches make my Vaio run really really slowly. Maybe a quarter of
the normal speed or lower. Bisection shows that the bug is introduced by
clockevents-drivers-for-i386.patch+clockevents-drivers-for-i386-fix.patch

With all patches applied, the slowdown happens with
CONFIG_HIGH_RES_TIMERS=n and also with CONFIG_HIGH_RES_TIMERS=y &&
CONFIG_NO_HZ=y. So something got collaterally damaged.

I put various helpful stuff at http://userweb.kernel.org/~akpm/x/

I uploaded all the patches I was using to
http://userweb.kernel.org/~akpm/x/patches/

It doesn't seem to be a cpufreq thing: cpuinfo_min_freq=800kHz,
cpuinfo_max_freq=2GHz and cpuinfo_cur_freq goes up to 2GHz under load.
Wall time is increasing at one second per second.

2006-10-03 08:35:56

[permalink] [raw]

Subject: Re: [patch 00/21] high resolution timers / dynamic ticks - V2

On Mon, 2006-10-02 at 21:00 -0700, Andrew Morton wrote:
> These patches make my Vaio run really really slowly. Maybe a quarter of
> the normal speed or lower. Bisection shows that the bug is introduced by
> clockevents-drivers-for-i386.patch+clockevents-drivers-for-i386-fix.patch
>
> With all patches applied, the slowdown happens with
> CONFIG_HIGH_RES_TIMERS=n and also with CONFIG_HIGH_RES_TIMERS=y &&
> CONFIG_NO_HZ=y. So something got collaterally damaged.
>
> I put various helpful stuff at http://userweb.kernel.org/~akpm/x/

> I uploaded all the patches I was using to
> http://userweb.kernel.org/~akpm/x/patches/

That's basically the same set I have here +/- the fixups

> It doesn't seem to be a cpufreq thing: cpuinfo_min_freq=800kHz,
> cpuinfo_max_freq=2GHz and cpuinfo_cur_freq goes up to 2GHz under load.
> Wall time is increasing at one second per second.

I retest on my Vaio.

tglx

2006-10-03 08:56:10

[permalink] [raw]

Subject: Re: [patch 00/21] high resolution timers / dynamic ticks - V2

* Andrew Morton <[email protected]> wrote:

> These patches make my Vaio run really really slowly. Maybe a quarter
> of the normal speed or lower. Bisection shows that the bug is
> introduced by
> clockevents-drivers-for-i386.patch+clockevents-drivers-for-i386-fix.patch
>
> With all patches applied, the slowdown happens with
> CONFIG_HIGH_RES_TIMERS=n and also with CONFIG_HIGH_RES_TIMERS=y &&
> CONFIG_NO_HZ=y. So something got collaterally damaged.

yeah, i suspect it works again if you disable:

CONFIG_X86_UP_APIC=y
CONFIG_X86_UP_IOAPIC=y
CONFIG_X86_LOCAL_APIC=y
CONFIG_X86_IO_APIC=y

as the slowdown has the feeling of a runaway lapic timer irq.

from code review so far we can only see an udelay(10) difference in the
initialization sequence of the PIT - we'll send a fix for that but i
dont think that's the cause of the bug.

investigating it.

Ingo

2006-10-03 10:43:55

[permalink] [raw]

Subject: [patch] clockevents: drivers for i386, fix #2

* Ingo Molnar <[email protected]> wrote:

> yeah, i suspect it works again if you disable:
>
> CONFIG_X86_UP_APIC=y
> CONFIG_X86_UP_IOAPIC=y
> CONFIG_X86_LOCAL_APIC=y
> CONFIG_X86_IO_APIC=y
>
> as the slowdown has the feeling of a runaway lapic timer irq.
>
> from code review so far we can only see an udelay(10) difference in
> the initialization sequence of the PIT - we'll send a fix for that but
> i dont think that's the cause of the bug.

the patch below fixes that particular bug. But ... the symptoms you are
describing have the feeling of being apic related.

Ingo

-------------------->
Subject: clockevents: drivers for i386, fix #2
From: Ingo Molnar <[email protected]>

add back a mistakenly removed udelay(10) to the PIT initialization
sequence.

Signed-off-by: Ingo Molnar <[email protected]>
---
arch/i386/kernel/i8253.c | 1 +
1 file changed, 1 insertion(+)

Index: linux/arch/i386/kernel/i8253.c
===================================================================
--- linux.orig/arch/i386/kernel/i8253.c
+++ linux/arch/i386/kernel/i8253.c
@@ -45,6 +45,7 @@ static void init_pit_timer(enum clock_ev
outb_p(0x34, PIT_MODE);
udelay(10);
outb_p(LATCH & 0xff , PIT_CH0); /* LSB */
+ udelay(10);
outb(LATCH >> 8 , PIT_CH0); /* MSB */
break;

2006-10-03 20:00:23

[permalink] [raw]

Subject: Re: [patch] dynticks core: Fix idle time accounting

On Mon, 2006-10-02 at 17:35 -0400, [email protected] wrote:
> We could "pump up" the relative counts - if 1 no-hz tick would have been 5ms
> long, increment the count by 5 rather than 1 (for an alledged 1khz tick).
> However, when we do that, we break the property that the sum of the ticks
> in the 'cpu0' line is equal to the number of timer interrupts reported in the
> 'intr' line.

I found a way to fix my thinkos. I put up a queue with all fixes to:

http://www.tglx.de/projects/hrtimers/2.6.18-mm3/patch-2.6.18-mm3-hrt-dyntick1.patches.tar.bz2

Can you please verify if it makes your problem go away ?

tglx

2006-10-03 21:03:49

[permalink] [raw]

Subject: Re: [patch] dynticks core: Fix idle time accounting

On Tue, 2006-10-03 at 22:02 +0200, Thomas Gleixner wrote:
> I found a way to fix my thinkos. I put up a queue with all fixes to:
>
> http://www.tglx.de/projects/hrtimers/2.6.18-mm3/patch-2.6.18-mm3-hrt-dyntick1.patches.tar.bz2
>
> Can you please verify if it makes your problem go away ?

Please use dyntick2, as #1 is missing a fix. Sorry.

tglx

2006-10-04 02:35:18

[permalink] [raw]

Subject: Re: [patch] dynticks core: Fix idle time accounting

On Tue, 03 Oct 2006 22:02:30 +0200, Thomas Gleixner said:

> I found a way to fix my thinkos. I put up a queue with all fixes to:
>
> http://www.tglx.de/projects/hrtimers/2.6.18-mm3/patch-2.6.18-mm3-hrt-dyntick1.patches.tar.bz2
>
> Can you please verify if it makes your problem go away ?

Was -dyntick3 by the time I got there.

The user/system/idle/wait numbers now look sane, with one caveat:

static const ktime_t nsec_per_hz = { .tv64 = NSEC_PER_SEC / HZ };
...
if (unlikely(delta.tv64 >= nsec_per_hz.tv64)) {
s64 incr = ktime_to_ns(nsec_per_hz);
ticks = ktime_divns(delta, incr);

Even though I have CONFIG_HZ=1000, this ends up generating a synthetic
count that works out to 100 per second. gkrellm and vmstat are happy with
that state of affairs, but I'm not sure why it came out to 100/sec rather
than 1000/sec.

% cat /proc/stat /proc/uptime
cpu 28224 4688 9159 168290 9143 283 256 0
cpu0 28224 4688 9159 168290 9143 283 256 0
intr 818891 627337 3466 0 4 4 6459 3 7 1 1 4 160328 115 0 21162 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
ctxt 971441
btime 1159926408
processes 4544
procs_running 1
procs_blocked 0
nohz total I:367986 S:302473 T:1737.640072 A:0.005744 E: 625327
2176.02 1775.11

Also, it still disagrees with speedstep - it isn't noticing the TSC has
gone slow and drop back to the PM timer.

All in all, we're making progress. ;)

Attachments:

(No filename) (226.00 B)

2006-10-04 03:39:56

[permalink] [raw]

Subject: Re: [patch] clockevents: drivers for i386, fix #2

On Tue, 3 Oct 2006 12:35:03 +0200
Ingo Molnar <[email protected]> wrote:

>
> add back a mistakenly removed udelay(10) to the PIT initialization
> sequence.
>
> Signed-off-by: Ingo Molnar <[email protected]>
> ---
> arch/i386/kernel/i8253.c | 1 +
> 1 file changed, 1 insertion(+)
>
> Index: linux/arch/i386/kernel/i8253.c
> ===================================================================
> --- linux.orig/arch/i386/kernel/i8253.c
> +++ linux/arch/i386/kernel/i8253.c
> @@ -45,6 +45,7 @@ static void init_pit_timer(enum clock_ev
> outb_p(0x34, PIT_MODE);
> udelay(10);
> outb_p(LATCH & 0xff , PIT_CH0); /* LSB */
> + udelay(10);
> outb(LATCH >> 8 , PIT_CH0); /* MSB */
> break;
>

Doesn't help.

> * Ingo Molnar <[email protected]> wrote:
>
> > yeah, i suspect it works again if you disable:
> >
> > CONFIG_X86_UP_APIC=y
> > CONFIG_X86_UP_IOAPIC=y
> > CONFIG_X86_LOCAL_APIC=y
> > CONFIG_X86_IO_APIC=y
> >
> > as the slowdown has the feeling of a runaway lapic timer irq.
> >

Disabling IO_APIC doesn't fix the slowdown.

Disabling LOCAL_APIC does fix it.

2006-10-04 06:54:18

[permalink] [raw]

Subject: Re: [patch] clockevents: drivers for i386, fix #2

* Andrew Morton <[email protected]> wrote:

> Disabling LOCAL_APIC does fix it.

thanks, that narrows it down quite a bit. (We've double-checked the
lapic path and it seemed all our changes are NOP, but obviously it isnt
and we'll check it all again.)

(if you have that kernel still booted by any chance then do you see the
'LOC' IRQ count in /proc/interrupts or any other count in /proc/stats
increasing at an alarming rate? That would narrow it down to lapic timer
misprogramming.)

Ingo

2006-10-04 07:39:29

[permalink] [raw]

Subject: Re: [patch] clockevents: drivers for i386, fix #2

On Wed, 4 Oct 2006 08:46:20 +0200
Ingo Molnar <[email protected]> wrote:

>
> * Andrew Morton <[email protected]> wrote:
>
> > Disabling LOCAL_APIC does fix it.
>
> thanks, that narrows it down quite a bit. (We've double-checked the
> lapic path and it seemed all our changes are NOP, but obviously it isnt
> and we'll check it all again.)
>
> (if you have that kernel still booted by any chance then do you see the
> 'LOC' IRQ count in /proc/interrupts or any other count in /proc/stats
> increasing at an alarming rate? That would narrow it down to lapic timer
> misprogramming.)
>

None of the interrupts are doing anything wrong. oprofile shows nothing
alarming.

Disabling cpufreq in config doesn't fix it.

Userspace can count to a billion in 3.9 seconds when this problem is
present, which is the same time as it takes on a non-slow kernel.

`sleep 5' takes 5 seconds.

Yet initscripts take a long time (especially applying the ipfilter firewall
rues for some reason), and `startx' takes a long time, etc. This kernel
takes 112 seconds to boot to a login prompt - other kernels take 56 seconds
(interesting ratio..)

Weird.

2006-10-04 07:49:55

[permalink] [raw]

Subject: Re: [patch] clockevents: drivers for i386, fix #2

* Andrew Morton <[email protected]> wrote:

> Yet initscripts take a long time (especially applying the ipfilter
> firewall rues for some reason), and `startx' takes a long time, etc.
> This kernel takes 112 seconds to boot to a login prompt - other
> kernels take 56 seconds (interesting ratio..)

you are still using the non-hres config, correct? (so this is still
collateral damage on vanilla kernel functionality)

Ingo

2006-10-04 08:01:59

[permalink] [raw]

Subject: Re: [patch] clockevents: drivers for i386, fix #2

On Wed, 4 Oct 2006 09:41:42 +0200
Ingo Molnar <[email protected]> wrote:

>
> * Andrew Morton <[email protected]> wrote:
>
> > Yet initscripts take a long time (especially applying the ipfilter
> > firewall rues for some reason), and `startx' takes a long time, etc.
> > This kernel takes 112 seconds to boot to a login prompt - other
> > kernels take 56 seconds (interesting ratio..)
>
> you are still using the non-hres config, correct? (so this is still
> collateral damage on vanilla kernel functionality)
>

yup.

2006-10-04 08:04:20

[permalink] [raw]

Subject: Re: [patch] clockevents: drivers for i386, fix #2

* Andrew Morton <[email protected]> wrote:

> None of the interrupts are doing anything wrong. oprofile shows
> nothing alarming.
>
> Disabling cpufreq in config doesn't fix it.
>
> Userspace can count to a billion in 3.9 seconds when this problem is
> present, which is the same time as it takes on a non-slow kernel.
>
> `sleep 5' takes 5 seconds.
>
> Yet initscripts take a long time (especially applying the ipfilter
> firewall rues for some reason), and `startx' takes a long time, etc.
> This kernel takes 112 seconds to boot to a login prompt - other
> kernels take 56 seconds (interesting ratio..)

hm, do you have the NMI watchdog enabled by any chance? [in particular,
do you have nmi_watchdog=2?] Although your bootlog does not show it.

Ingo

2006-10-04 08:05:09

[permalink] [raw]

Subject: Re: [patch] dynticks core: Fix idle time accounting

* [email protected] <[email protected]> wrote:

> Even though I have CONFIG_HZ=1000, this ends up generating a synthetic
> count that works out to 100 per second. gkrellm and vmstat are happy
> with that state of affairs, but I'm not sure why it came out to
> 100/sec rather than 1000/sec.

that's how it worked for quite some time: all userspace APIs are
HZ-independent and depend on USER_HZ (which is 100 even if HZ is 1000).

Ingo

2006-10-04 08:16:19

[permalink] [raw]

Subject: Re: [patch] clockevents: drivers for i386, fix #2

On Wed, 4 Oct 2006 09:55:40 +0200
Ingo Molnar <[email protected]> wrote:

>
> * Andrew Morton <[email protected]> wrote:
>
> > None of the interrupts are doing anything wrong. oprofile shows
> > nothing alarming.
> >
> > Disabling cpufreq in config doesn't fix it.
> >
> > Userspace can count to a billion in 3.9 seconds when this problem is
> > present, which is the same time as it takes on a non-slow kernel.
> >
> > `sleep 5' takes 5 seconds.
> >
> > Yet initscripts take a long time (especially applying the ipfilter
> > firewall rues for some reason), and `startx' takes a long time, etc.
> > This kernel takes 112 seconds to boot to a login prompt - other
> > kernels take 56 seconds (interesting ratio..)
>
> hm, do you have the NMI watchdog enabled by any chance? [in particular,
> do you have nmi_watchdog=2?] Although your bootlog does not show it.
>

There's no nmi_watchdog setting in the kernel boot command line and the
NMI counter isn't incrementing.

2006-10-04 10:01:49

[permalink] [raw]

Subject: Re: [patch] dynticks core: Fix idle time accounting

On Wed, 04 Oct 2006 09:56:57 +0200, Ingo Molnar said:
>
> * [email protected] <[email protected]> wrote:
>
> > Even though I have CONFIG_HZ=1000, this ends up generating a synthetic
> > count that works out to 100 per second. gkrellm and vmstat are happy
> > with that state of affairs, but I'm not sure why it came out to
> > 100/sec rather than 1000/sec.
>
> that's how it worked for quite some time: all userspace APIs are
> HZ-independent and depend on USER_HZ (which is 100 even if HZ is 1000).

Nevermind - I missed where fs/proc/proc_misc.c applied jiffies_64_to_clock_t()
to the number before handing it to userspace. So the numbers *were* being
kept in terms of HZ (as my reading of the code indicated), they just didn't
manage to escape to userspace that way....

Attachments:

(No filename) (226.00 B)

2006-10-04 11:02:11

[permalink] [raw]

Subject: Re: [patch] clockevents: drivers for i386, fix #2

* Andrew Morton <[email protected]> wrote:

> > hm, do you have the NMI watchdog enabled by any chance? [in
> > particular, do you have nmi_watchdog=2?] Although your bootlog does
> > not show it.
> >
>
> There's no nmi_watchdog setting in the kernel boot command line and
> the NMI counter isn't incrementing.

there's one material difference we just found: in the !hres case we'll
do the timer IRQ handling mostly from the lapic vector - while in
mainline we do it from the irq0 vector. So, how does your
/proc/interrupts look like? How frequently does LOC increase, and how
frequently does IRQ 0 increase?

(meanwhile we'll fix and restore things so that it matches mainline
behavior.)

Ingo

2006-10-04 11:17:26

[permalink] [raw]

Subject: Re: [patch] clockevents: drivers for i386, fix #2

On Wed, 2006-10-04 at 12:53 +0200, Ingo Molnar wrote:
> there's one material difference we just found: in the !hres case we'll
> do the timer IRQ handling mostly from the lapic vector - while in
> mainline we do it from the irq0 vector. So, how does your
> /proc/interrupts look like? How frequently does LOC increase, and how
> frequently does IRQ 0 increase?
>
> (meanwhile we'll fix and restore things so that it matches mainline
> behavior.)

Andrew, does the patch below fix your problem ?

You should see the same weird behaviour when you run a plain -mm3 with
CONFIG_SMP=y on that box. This moves update_process_times() to the lapic
too.
tglx

Index: linux-2.6.18-mm3/arch/i386/kernel/apic.c
===================================================================
--- linux-2.6.18-mm3.orig/arch/i386/kernel/apic.c 2006-10-04 13:02:35.000000000 +0200
+++ linux-2.6.18-mm3/arch/i386/kernel/apic.c 2006-10-04 12:59:06.000000000 +0200
@@ -84,7 +84,9 @@ static void lapic_timer_setup(enum clock
static struct clock_event_device lapic_clockevent = {
.name = "lapic",
.capabilities = CLOCK_CAP_NEXTEVT | CLOCK_CAP_PROFILE
+#ifdef CONFIG_SMP
| CLOCK_CAP_UPDATE,
+#endif
.shift = 32,
.set_mode = lapic_timer_setup,
.set_next_event = lapic_next_event,

2006-10-04 16:02:33

[permalink] [raw]

Subject: Re: [patch] clockevents: drivers for i386, fix #2

On Wed, 04 Oct 2006 13:19:35 +0200
Thomas Gleixner <[email protected]> wrote:

> On Wed, 2006-10-04 at 12:53 +0200, Ingo Molnar wrote:
> > there's one material difference we just found: in the !hres case we'll
> > do the timer IRQ handling mostly from the lapic vector - while in
> > mainline we do it from the irq0 vector. So, how does your
> > /proc/interrupts look like? How frequently does LOC increase, and how
> > frequently does IRQ 0 increase?

sony:/home/akpm> cat /proc/interrupts ; sleep 1 ; cat /proc/interrupts
CPU0
0: 39256 IO-APIC-edge timer
1: 8 IO-APIC-edge i8042
8: 1 IO-APIC-edge rtc
9: 160 IO-APIC-fasteoi acpi
11: 3 IO-APIC-edge sonypi
12: 107 IO-APIC-edge i8042
14: 5 IO-APIC-edge libata
15: 0 IO-APIC-edge libata
16: 1 IO-APIC-fasteoi yenta, uhci_hcd:usb4
17: 246 IO-APIC-fasteoi ohci1394, eth0
18: 5759 IO-APIC-fasteoi libata
19: 3 IO-APIC-fasteoi ipw2200
20: 710 IO-APIC-fasteoi HDA Intel, uhci_hcd:usb3
21: 2 IO-APIC-fasteoi ehci_hcd:usb1
22: 0 IO-APIC-fasteoi uhci_hcd:usb2, uhci_hcd:usb5
NMI: 0
LOC: 3131
ERR: 0
MIS: 0
CPU0
0: 39519 IO-APIC-edge timer
1: 8 IO-APIC-edge i8042
8: 1 IO-APIC-edge rtc
9: 160 IO-APIC-fasteoi acpi
11: 3 IO-APIC-edge sonypi
12: 107 IO-APIC-edge i8042
14: 5 IO-APIC-edge libata
15: 0 IO-APIC-edge libata
16: 1 IO-APIC-fasteoi yenta, uhci_hcd:usb4
17: 248 IO-APIC-fasteoi ohci1394, eth0
18: 5759 IO-APIC-fasteoi libata
19: 3 IO-APIC-fasteoi ipw2200
20: 715 IO-APIC-fasteoi HDA Intel, uhci_hcd:usb3
21: 2 IO-APIC-fasteoi ehci_hcd:usb1
22: 0 IO-APIC-fasteoi uhci_hcd:usb2, uhci_hcd:usb5
NMI: 0
LOC: 3134
ERR: 0
MIS: 0

> > (meanwhile we'll fix and restore things so that it matches mainline
> > behavior.)
>
> Andrew, does the patch below fix your problem ?
>
> You should see the same weird behaviour when you run a plain -mm3 with
> CONFIG_SMP=y on that box. This moves update_process_times() to the lapic
> too.
> tglx
>
>
> Index: linux-2.6.18-mm3/arch/i386/kernel/apic.c
> ===================================================================
> --- linux-2.6.18-mm3.orig/arch/i386/kernel/apic.c 2006-10-04 13:02:35.000000000 +0200
> +++ linux-2.6.18-mm3/arch/i386/kernel/apic.c 2006-10-04 12:59:06.000000000 +0200
> @@ -84,7 +84,9 @@ static void lapic_timer_setup(enum clock
> static struct clock_event_device lapic_clockevent = {
> .name = "lapic",
> .capabilities = CLOCK_CAP_NEXTEVT | CLOCK_CAP_PROFILE
> +#ifdef CONFIG_SMP
> | CLOCK_CAP_UPDATE,
> +#endif
> .shift = 32,
> .set_mode = lapic_timer_setup,
> .set_next_event = lapic_next_event,

that (after a tweak to make it compile) fixes it. What's it do?

2006-10-04 16:18:06

[permalink] [raw]

Subject: Re: [patch] clockevents: drivers for i386, fix #2

On Wed, 2006-10-04 at 09:02 -0700, Andrew Morton wrote:
> On Wed, 04 Oct 2006 13:19:35 +0200
> Thomas Gleixner <[email protected]> wrote:
>
> > On Wed, 2006-10-04 at 12:53 +0200, Ingo Molnar wrote:
> > > there's one material difference we just found: in the !hres case we'll
> > > do the timer IRQ handling mostly from the lapic vector - while in
> > > mainline we do it from the irq0 vector. So, how does your
> > > /proc/interrupts look like? How frequently does LOC increase, and how
> > > frequently does IRQ 0 increase?
>
> sony:/home/akpm> cat /proc/interrupts ; sleep 1 ; cat /proc/interrupts
> CPU0
> 0: 39256 IO-APIC-edge timer
> LOC: 3131

> 0: 39519 IO-APIC-edge timer
> LOC: 3134

delta IRQ == 263
delta LOC == 3

That explains the problem. The lapic frequency seems to be way off. I
have no good idea offhand how to detect such lapic brokeness.

> > static struct clock_event_device lapic_clockevent = {
> > .name = "lapic",
> > .capabilities = CLOCK_CAP_NEXTEVT | CLOCK_CAP_PROFILE
> > +#ifdef CONFIG_SMP
> > | CLOCK_CAP_UPDATE,
> > +#endif
> > .shift = 32,
> > .set_mode = lapic_timer_setup,
> > .set_next_event = lapic_next_event,
>
> that (after a tweak to make it compile) fixes it. What's it do?

It brings update_process_times() back into IRQ0. On systems with a
working lapic, it would not matter. SMP moves update_process_times() to
lapic too. That's why I asked whether a SMP=y kernel has the same
problems on this box.

tglx

2006-10-04 16:43:53