2008-08-18 18:57:35

by Justin Madru

[permalink] [raw]
Subject: Backport hr-tick fix into .25/.26

Hi all,

Since the introduction of 2.6.25-rc1 kernel I've been experiencing
screen blanking during boot up.
I've reported this issue before (http://lkml.org/lkml/2008/3/12/290),
and Rafael Wysocki opened a bug report
(http://bugzilla.kernel.org/show_bug.cgi?id=10235).
Since the problem seemed to be related to the x server, I reported it to
their bug tracker (https://bugs.freedesktop.org/show_bug.cgi?id=15602).

I've tested with 2 computers, both with Intel graphic cards and various
versions of Ubuntu. Sorry, I couldn't do more testing :(
I know of one other person that ran into what seems to be the same bug.
(http://lkml.org/lkml/2008/6/10/137 |
http://bugzilla.kernel.org/show_bug.cgi?id=10892)

Now since 2.6.27-rc1 the bug has been resolved. I did a git bisect and
found both the commit that introduced the bug and also the one that
fixed it.
Greg (and stable team), I was wondering if the commit that went into .27
that fixed the bug could be backported into both the .25 and .26 stable
kernels?


This commit introduced the bug.

commit 8f4d37ec073c17e2d4aa8851df5837d798606d6f
Author: Peter Zijlstra <[email protected]>
Date: Fri Jan 25 21:08:29 2008 +0100

sched: high-res preemption tick

Use HR-timers (when available) to deliver an accurate preemption tick.

The regular scheduler tick that runs at 1/HZ can be too coarse when nice
level
are used. The fairness system will still keep the cpu utilisation 'fair' by
then delaying the task that got an excessive amount of CPU time but try to
minimize this by delivering preemption points spot-on.

The average frequency of this extra interrupt is sched_latency / nr_latency.
Which need not be higher than 1/HZ, its just that the distribution
within the
sched_latency period is important.

Signed-off-by: Peter Zijlstra <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>

:040000 040000 ab225228500f7a19d5ad20ca12ca3fc8ff5f5ad1
f1742e1d225a72aecea9d6961ed989b5943d31d8 M arch
:040000 040000 25d85e4ef7a71b0cc76801a2526ebeb4dce180fe
ae61510186b4fad708ef0211ac169decba16d4e5 M include
:040000 040000 9247cec7dd506c648ac027c17e5a07145aa41b26
950832cc1dc4d30923f593ecec883a06b45d62e9 M kernel


And this one fixed it

commit 31656519e132f6612584815f128c83976a9aaaef
Author: Peter Zijlstra <[email protected]>
Date: Fri Jul 18 18:01:23 2008 +0200

sched, x86: clean up hrtick implementation

random uvesafb failures were reported against Gentoo:

http://bugs.gentoo.org/show_bug.cgi?id=222799

and Mihai Moldovan bisected it back to:

> 8f4d37ec073c17e2d4aa8851df5837d798606d6f is first bad commit
> commit 8f4d37ec073c17e2d4aa8851df5837d798606d6f
> Author: Peter Zijlstra <[email protected]>
> Date: Fri Jan 25 21:08:29 2008 +0100
>
> sched: high-res preemption tick

Linus suspected it to be hrtick + vm86 interaction and observed:

> Btw, Peter, Ingo: I think that commit is doing bad things. They aren't
> _incorrect_ per se, but they are definitely bad.
>
> Why?
>
> Using random _TIF_WORK_MASK flags is really impolite for doing
> "scheduling" work. There's a reason that arch/x86/kernel/entry_32.S
> special-cases the _TIF_NEED_RESCHED flag: we don't want to exit out of
> vm86 mode unnecessarily.
>
> See the "work_notifysig_v86" label, and how it does that
> "save_v86_state()" thing etc etc.

Right, I never liked having to fiddle with those TIF flags. Initially I
needed it because the hrtimer base lock could not nest in the rq lock.
That however is fixed these days.

Currently the only reason left to fiddle with the TIF flags is remote
wakeups. We cannot program a remote cpu's hrtimer. I've been thinking
about using the new and improved IPI function call stuff to implement
hrtimer_start_on().

However that does require that smp_call_function_single(.wait=0) works
from interrupt context - /me looks at the latest series from Jens - Yes
that does seem to be supported, good.

Here's a stab at cleaning this stuff up ...

Mihai reported test success as well.

Signed-off-by: Peter Zijlstra <[email protected]>
Tested-by: Mihai Moldovan <[email protected]>
Cc: Michal Januszewski <[email protected]>
Cc: Antonino Daplas <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>

:040000 040000 5ae152350652713c58bd1700ba2c776a556b6985
40d22771987dc5814a1e18aa3cee82ae9e4faea5 M arch
:040000 040000 4dfe3c6abd244d2da57b7801e47f073899124376
3863e3311a21dc049d5ad98f45c272e4a5269a2b M include
:040000 040000 236b2824be1c7cf3c899a090498e4151543bba31
9f9779c89b781d8fa8950468deee42f419339bd7 M kernel

Justin Madru


2008-08-19 00:32:06

by Ingo Molnar

[permalink] [raw]
Subject: Re: Backport hr-tick fix into .25/.26


* Justin Madru <[email protected]> wrote:

> This commit introduced the bug.
> commit 8f4d37ec073c17e2d4aa8851df5837d798606d6f
> sched: high-res preemption tick

> And this one fixed it
>
> commit 31656519e132f6612584815f128c83976a9aaaef
> sched, x86: clean up hrtick implementation

hm, the backport of 31656519 is a bit intrusive.

find below is an (untested!) version of it - i havent even build-tested
it. Does it work for you? But this is Greg's call really.

Ingo

-------------->
>From 76cc8db4610c37f0d6b73d224517ce2526b681b6 Mon Sep 17 00:00:00 2001
From: Ingo Molnar <[email protected]>
Date: Tue, 19 Aug 2008 02:29:00 +0200
Subject: [PATCH] sched, x86: clean up hrtick implementation

random uvesafb failures were reported against Gentoo:

http://bugs.gentoo.org/show_bug.cgi?id=222799

and Mihai Moldovan bisected it back to:

> 8f4d37ec073c17e2d4aa8851df5837d798606d6f is first bad commit
> commit 8f4d37ec073c17e2d4aa8851df5837d798606d6f
> Author: Peter Zijlstra <[email protected]>
> Date: Fri Jan 25 21:08:29 2008 +0100
>
> sched: high-res preemption tick

Linus suspected it to be hrtick + vm86 interaction and observed:

> Btw, Peter, Ingo: I think that commit is doing bad things. They aren't
> _incorrect_ per se, but they are definitely bad.
>
> Why?
>
> Using random _TIF_WORK_MASK flags is really impolite for doing
> "scheduling" work. There's a reason that arch/x86/kernel/entry_32.S
> special-cases the _TIF_NEED_RESCHED flag: we don't want to exit out of
> vm86 mode unnecessarily.
>
> See the "work_notifysig_v86" label, and how it does that
> "save_v86_state()" thing etc etc.

Right, I never liked having to fiddle with those TIF flags. Initially I
needed it because the hrtimer base lock could not nest in the rq lock.
That however is fixed these days.

Currently the only reason left to fiddle with the TIF flags is remote
wakeups. We cannot program a remote cpu's hrtimer. I've been thinking
about using the new and improved IPI function call stuff to implement
hrtimer_start_on().

However that does require that smp_call_function_single(.wait=0) works
from interrupt context - /me looks at the latest series from Jens - Yes
that does seem to be supported, good.

Here's a stab at cleaning this stuff up ...

Mihai reported test success as well.

Signed-off-by: Peter Zijlstra <[email protected]>
Tested-by: Mihai Moldovan <[email protected]>
Cc: Michal Januszewski <[email protected]>
Cc: Antonino Daplas <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>

Conflicts:

include/asm-x86/thread_info.h
kernel/sched.c

Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/kernel/signal_32.c | 3 -
arch/x86/kernel/signal_64.c | 3 -
kernel/Kconfig.hz | 2 +-
kernel/sched.c | 197 +++++++++++++------------------------------
kernel/sched_fair.c | 5 +-
5 files changed, 61 insertions(+), 149 deletions(-)

diff --git a/arch/x86/kernel/signal_32.c b/arch/x86/kernel/signal_32.c
index d923736..e1fc7bd 100644
--- a/arch/x86/kernel/signal_32.c
+++ b/arch/x86/kernel/signal_32.c
@@ -667,8 +667,5 @@ do_notify_resume(struct pt_regs *regs, void *unused, __u32 thread_info_flags)
if (thread_info_flags & _TIF_SIGPENDING)
do_signal(regs);

- if (thread_info_flags & _TIF_HRTICK_RESCHED)
- hrtick_resched();
-
clear_thread_flag(TIF_IRET);
}
diff --git a/arch/x86/kernel/signal_64.c b/arch/x86/kernel/signal_64.c
index e53b267..88023fc 100644
--- a/arch/x86/kernel/signal_64.c
+++ b/arch/x86/kernel/signal_64.c
@@ -502,9 +502,6 @@ void do_notify_resume(struct pt_regs *regs, void *unused,
/* deal with pending signal delivery */
if (thread_info_flags & _TIF_SIGPENDING)
do_signal(regs);
-
- if (thread_info_flags & _TIF_HRTICK_RESCHED)
- hrtick_resched();
}

void signal_fault(struct pt_regs *regs, void __user *frame, char *where)
diff --git a/kernel/Kconfig.hz b/kernel/Kconfig.hz
index 526128a..2a202a8 100644
--- a/kernel/Kconfig.hz
+++ b/kernel/Kconfig.hz
@@ -55,4 +55,4 @@ config HZ
default 1000 if HZ_1000

config SCHED_HRTICK
- def_bool HIGH_RES_TIMERS && X86
+ def_bool HIGH_RES_TIMERS
diff --git a/kernel/sched.c b/kernel/sched.c
index 4e2f603..b955d92 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -532,8 +532,10 @@ struct rq {
#endif

#ifdef CONFIG_SCHED_HRTICK
- unsigned long hrtick_flags;
- ktime_t hrtick_expire;
+#ifdef CONFIG_SMP
+ int hrtick_csd_pending;
+ struct call_single_data hrtick_csd;
+#endif
struct hrtimer hrtick_timer;
#endif

@@ -996,13 +998,6 @@ static struct rq *this_rq_lock(void)
return rq;
}

-static void __resched_task(struct task_struct *p, int tif_bit);
-
-static inline void resched_task(struct task_struct *p)
-{
- __resched_task(p, TIF_NEED_RESCHED);
-}
-
#ifdef CONFIG_SCHED_HRTICK
/*
* Use HR-timers to deliver accurate preemption points.
@@ -1014,25 +1009,6 @@ static inline void resched_task(struct task_struct *p)
* When we get rescheduled we reprogram the hrtick_timer outside of the
* rq->lock.
*/
-static inline void resched_hrt(struct task_struct *p)
-{
- __resched_task(p, TIF_HRTICK_RESCHED);
-}
-
-static inline void resched_rq(struct rq *rq)
-{
- unsigned long flags;
-
- spin_lock_irqsave(&rq->lock, flags);
- resched_task(rq->curr);
- spin_unlock_irqrestore(&rq->lock, flags);
-}
-
-enum {
- HRTICK_SET, /* re-programm hrtick_timer */
- HRTICK_RESET, /* not a new slice */
- HRTICK_BLOCK, /* stop hrtick operations */
-};

/*
* Use hrtick when:
@@ -1043,40 +1019,11 @@ static inline int hrtick_enabled(struct rq *rq)
{
if (!sched_feat(HRTICK))
return 0;
- if (unlikely(test_bit(HRTICK_BLOCK, &rq->hrtick_flags)))
+ if (!cpu_online(cpu_of(rq)))
return 0;
return hrtimer_is_hres_active(&rq->hrtick_timer);
}

-/*
- * Called to set the hrtick timer state.
- *
- * called with rq->lock held and irqs disabled
- */
-static void hrtick_start(struct rq *rq, u64 delay, int reset)
-{
- assert_spin_locked(&rq->lock);
-
- /*
- * preempt at: now + delay
- */
- rq->hrtick_expire =
- ktime_add_ns(rq->hrtick_timer.base->get_time(), delay);
- /*
- * indicate we need to program the timer
- */
- __set_bit(HRTICK_SET, &rq->hrtick_flags);
- if (reset)
- __set_bit(HRTICK_RESET, &rq->hrtick_flags);
-
- /*
- * New slices are called from the schedule path and don't need a
- * forced reschedule.
- */
- if (reset)
- resched_hrt(rq->curr);
-}
-
static void hrtick_clear(struct rq *rq)
{
if (hrtimer_active(&rq->hrtick_timer))
@@ -1084,32 +1031,6 @@ static void hrtick_clear(struct rq *rq)
}

/*
- * Update the timer from the possible pending state.
- */
-static void hrtick_set(struct rq *rq)
-{
- ktime_t time;
- int set, reset;
- unsigned long flags;
-
- WARN_ON_ONCE(cpu_of(rq) != smp_processor_id());
-
- spin_lock_irqsave(&rq->lock, flags);
- set = __test_and_clear_bit(HRTICK_SET, &rq->hrtick_flags);
- reset = __test_and_clear_bit(HRTICK_RESET, &rq->hrtick_flags);
- time = rq->hrtick_expire;
- clear_thread_flag(TIF_HRTICK_RESCHED);
- spin_unlock_irqrestore(&rq->lock, flags);
-
- if (set) {
- hrtimer_start(&rq->hrtick_timer, time, HRTIMER_MODE_ABS);
- if (reset && !hrtimer_active(&rq->hrtick_timer))
- resched_rq(rq);
- } else
- hrtick_clear(rq);
-}
-
-/*
* High-resolution timer tick.
* Runs from hardirq context with interrupts disabled.
*/
@@ -1128,27 +1049,37 @@ static enum hrtimer_restart hrtick(struct hrtimer *timer)
}

#ifdef CONFIG_SMP
-static void hotplug_hrtick_disable(int cpu)
+/*
+ * called from hardirq (IPI) context
+ */
+static void __hrtick_start(void *arg)
{
- struct rq *rq = cpu_rq(cpu);
- unsigned long flags;
-
- spin_lock_irqsave(&rq->lock, flags);
- rq->hrtick_flags = 0;
- __set_bit(HRTICK_BLOCK, &rq->hrtick_flags);
- spin_unlock_irqrestore(&rq->lock, flags);
+ struct rq *rq = arg;

- hrtick_clear(rq);
+ spin_lock(&rq->lock);
+ hrtimer_restart(&rq->hrtick_timer);
+ rq->hrtick_csd_pending = 0;
+ spin_unlock(&rq->lock);
}

-static void hotplug_hrtick_enable(int cpu)
+/*
+ * Called to set the hrtick timer state.
+ *
+ * called with rq->lock held and irqs disabled
+ */
+static void hrtick_start(struct rq *rq, u64 delay)
{
- struct rq *rq = cpu_rq(cpu);
- unsigned long flags;
+ struct hrtimer *timer = &rq->hrtick_timer;
+ ktime_t time = ktime_add_ns(timer->base->get_time(), delay);

- spin_lock_irqsave(&rq->lock, flags);
- __clear_bit(HRTICK_BLOCK, &rq->hrtick_flags);
- spin_unlock_irqrestore(&rq->lock, flags);
+ timer->expires = time;
+
+ if (rq == this_rq()) {
+ hrtimer_restart(timer);
+ } else if (!rq->hrtick_csd_pending) {
+ __smp_call_function_single(cpu_of(rq), &rq->hrtick_csd);
+ rq->hrtick_csd_pending = 1;
+ }
}

static int
@@ -1163,16 +1094,7 @@ hotplug_hrtick(struct notifier_block *nfb, unsigned long action, void *hcpu)
case CPU_DOWN_PREPARE_FROZEN:
case CPU_DEAD:
case CPU_DEAD_FROZEN:
- hotplug_hrtick_disable(cpu);
- return NOTIFY_OK;
-
- case CPU_UP_PREPARE:
- case CPU_UP_PREPARE_FROZEN:
- case CPU_DOWN_FAILED:
- case CPU_DOWN_FAILED_FROZEN:
- case CPU_ONLINE:
- case CPU_ONLINE_FROZEN:
- hotplug_hrtick_enable(cpu);
+ hrtick_clear(cpu_rq(cpu));
return NOTIFY_OK;
}

@@ -1183,46 +1105,45 @@ static void init_hrtick(void)
{
hotcpu_notifier(hotplug_hrtick, 0);
}
-#endif /* CONFIG_SMP */
+#else
+/*
+ * Called to set the hrtick timer state.
+ *
+ * called with rq->lock held and irqs disabled
+ */
+static void hrtick_start(struct rq *rq, u64 delay)
+{
+ hrtimer_start(&rq->hrtick_timer, ns_to_ktime(delay), HRTIMER_MODE_REL);
+}

-static void init_rq_hrtick(struct rq *rq)
+static void init_hrtick(void)
{
- rq->hrtick_flags = 0;
- hrtimer_init(&rq->hrtick_timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
- rq->hrtick_timer.function = hrtick;
- rq->hrtick_timer.cb_mode = HRTIMER_CB_IRQSAFE_NO_SOFTIRQ;
}
+#endif /* CONFIG_SMP */

-void hrtick_resched(void)
+static void init_rq_hrtick(struct rq *rq)
{
- struct rq *rq;
- unsigned long flags;
+#ifdef CONFIG_SMP
+ rq->hrtick_csd_pending = 0;

- if (!test_thread_flag(TIF_HRTICK_RESCHED))
- return;
+ rq->hrtick_csd.flags = 0;
+ rq->hrtick_csd.func = __hrtick_start;
+ rq->hrtick_csd.info = rq;
+#endif

- local_irq_save(flags);
- rq = cpu_rq(smp_processor_id());
- hrtick_set(rq);
- local_irq_restore(flags);
+ hrtimer_init(&rq->hrtick_timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
+ rq->hrtick_timer.function = hrtick;
+ rq->hrtick_timer.cb_mode = HRTIMER_CB_IRQSAFE_NO_SOFTIRQ;
}
#else
static inline void hrtick_clear(struct rq *rq)
{
}

-static inline void hrtick_set(struct rq *rq)
-{
-}
-
static inline void init_rq_hrtick(struct rq *rq)
{
}

-void hrtick_resched(void)
-{
-}
-
static inline void init_hrtick(void)
{
}
@@ -1241,16 +1162,16 @@ static inline void init_hrtick(void)
#define tsk_is_polling(t) test_tsk_thread_flag(t, TIF_POLLING_NRFLAG)
#endif

-static void __resched_task(struct task_struct *p, int tif_bit)
+static void resched_task(struct task_struct *p)
{
int cpu;

assert_spin_locked(&task_rq(p)->lock);

- if (unlikely(test_tsk_thread_flag(p, tif_bit)))
+ if (unlikely(test_tsk_thread_flag(p, TIF_NEED_RESCHED)))
return;

- set_tsk_thread_flag(p, tif_bit);
+ set_tsk_thread_flag(p, TIF_NEED_RESCHED);

cpu = task_cpu(p);
if (cpu == smp_processor_id())
@@ -1316,10 +1237,10 @@ void wake_up_idle_cpu(int cpu)
#endif

#else
-static void __resched_task(struct task_struct *p, int tif_bit)
+static void resched_task(struct task_struct *p)
{
assert_spin_locked(&task_rq(p)->lock);
- set_tsk_thread_flag(p, tif_bit);
+ set_tsk_need_resched(p);
}
#endif

@@ -4204,8 +4125,6 @@ need_resched_nonpreemptible:
} else
spin_unlock_irq(&rq->lock);

- hrtick_set(rq);
-
if (unlikely(reacquire_kernel_lock(current) < 0))
goto need_resched_nonpreemptible;

diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
index 08ae848..e193a9d 100644
--- a/kernel/sched_fair.c
+++ b/kernel/sched_fair.c
@@ -808,7 +808,6 @@ entity_tick(struct cfs_rq *cfs_rq, struct sched_entity *curr, int queued)
#ifdef CONFIG_SCHED_HRTICK
static void hrtick_start_fair(struct rq *rq, struct task_struct *p)
{
- int requeue = rq->curr == p;
struct sched_entity *se = &p->se;
struct cfs_rq *cfs_rq = cfs_rq_of(se);

@@ -829,10 +828,10 @@ static void hrtick_start_fair(struct rq *rq, struct task_struct *p)
* Don't schedule slices shorter than 10000ns, that just
* doesn't make sense. Rely on vruntime for fairness.
*/
- if (!requeue)
+ if (rq->curr != p)
delta = max(10000LL, delta);

- hrtick_start(rq, delta, requeue);
+ hrtick_start(rq, delta);
}
}
#else

2008-08-19 02:44:17

by Justin Madru

[permalink] [raw]
Subject: Re: Backport hr-tick fix into .25/.26

Ingo Molnar wrote:
> hm, the backport of 31656519 is a bit intrusive.
>
> find below is an (untested!) version of it - i havent even build-tested
> it. Does it work for you? But this is Greg's call really.
>

I tried to apply it onto 2.6.26.2, but got this build error:

CC arch/x86/kernel/i8253.o
CC kernel/sched.o
CC arch/x86/kernel/pci-nommu.o
kernel/sched.c:537: error: field ?hrtick_csd? has incomplete type
kernel/sched.c: In function ?hrtick_start?:
kernel/sched.c:1080: error: implicit declaration of function
?__smp_call_function_single?
CC arch/x86/kernel/tsc_32.o
make[2]: *** [kernel/sched.o] Error 1
make[1]: *** [kernel] Error 2
make[1]: *** Waiting for unfinished jobs....
CC arch/x86/kernel/io_delay.o
CC arch/x86/kernel/rtc.o
CC arch/x86/kernel/trampoline.o

Was I suppose to use a different kernel? (2.6.25[.x] or 2.6.26) I'll try
2.6.25.15

Anyways, I know it's Greg's call, and I'm just trying to inform him of a
bug that's been fixed upstream that should be backported.
If he declines, that's ok. I've opened a bug report with Ubuntu. But, it
would help if I had a patch to show.

Justin Madru

2008-08-19 06:28:22

by Peter Zijlstra

[permalink] [raw]
Subject: Re: Backport hr-tick fix into .25/.26

On Tue, 2008-08-19 at 02:31 +0200, Ingo Molnar wrote:
> * Justin Madru <[email protected]> wrote:
>
> > This commit introduced the bug.
> > commit 8f4d37ec073c17e2d4aa8851df5837d798606d6f
> > sched: high-res preemption tick
>
> > And this one fixed it
> >
> > commit 31656519e132f6612584815f128c83976a9aaaef
> > sched, x86: clean up hrtick implementation
>
> hm, the backport of 31656519 is a bit intrusive.
>
> find below is an (untested!) version of it - i havent even build-tested
> it. Does it work for you? But this is Greg's call really.
>

It largely depends on all the new IPI stuff that went into 27 as well,
so I'd be surprised if its easily backportable..

2008-08-19 09:27:04

by Ingo Molnar

[permalink] [raw]
Subject: Re: Backport hr-tick fix into .25/.26


* Peter Zijlstra <[email protected]> wrote:

> On Tue, 2008-08-19 at 02:31 +0200, Ingo Molnar wrote:
> > * Justin Madru <[email protected]> wrote:
> >
> > > This commit introduced the bug.
> > > commit 8f4d37ec073c17e2d4aa8851df5837d798606d6f
> > > sched: high-res preemption tick
> >
> > > And this one fixed it
> > >
> > > commit 31656519e132f6612584815f128c83976a9aaaef
> > > sched, x86: clean up hrtick implementation
> >
> > hm, the backport of 31656519 is a bit intrusive.
> >
> > find below is an (untested!) version of it - i havent even build-tested
> > it. Does it work for you? But this is Greg's call really.
> >
>
> It largely depends on all the new IPI stuff that went into 27 as well,
> so I'd be surprised if its easily backportable..

ah, i see. How about a simple patch then that disables hrtick [given
that it's now fixed .27 but the fix is too intrusive to backport]? Would
that be too risky for -stable?

Ingo

2008-08-19 09:32:59

by Peter Zijlstra

[permalink] [raw]
Subject: Re: Backport hr-tick fix into .25/.26

On Tue, 2008-08-19 at 11:25 +0200, Ingo Molnar wrote:
> * Peter Zijlstra <[email protected]> wrote:
>
> > On Tue, 2008-08-19 at 02:31 +0200, Ingo Molnar wrote:
> > > * Justin Madru <[email protected]> wrote:
> > >
> > > > This commit introduced the bug.
> > > > commit 8f4d37ec073c17e2d4aa8851df5837d798606d6f
> > > > sched: high-res preemption tick
> > >
> > > > And this one fixed it
> > > >
> > > > commit 31656519e132f6612584815f128c83976a9aaaef
> > > > sched, x86: clean up hrtick implementation
> > >
> > > hm, the backport of 31656519 is a bit intrusive.
> > >
> > > find below is an (untested!) version of it - i havent even build-tested
> > > it. Does it work for you? But this is Greg's call really.
> > >
> >
> > It largely depends on all the new IPI stuff that went into 27 as well,
> > so I'd be surprised if its easily backportable..
>
> ah, i see. How about a simple patch then that disables hrtick [given
> that it's now fixed .27 but the fix is too intrusive to backport]? Would
> that be too risky for -stable?

That's just flipping sched_feat(HRTICK) to 0 by default.

2008-08-19 18:25:37

by Greg KH

[permalink] [raw]
Subject: Re: Backport hr-tick fix into .25/.26

On Tue, Aug 19, 2008 at 11:33:00AM +0200, Peter Zijlstra wrote:
> On Tue, 2008-08-19 at 11:25 +0200, Ingo Molnar wrote:
> > * Peter Zijlstra <[email protected]> wrote:
> >
> > > On Tue, 2008-08-19 at 02:31 +0200, Ingo Molnar wrote:
> > > > * Justin Madru <[email protected]> wrote:
> > > >
> > > > > This commit introduced the bug.
> > > > > commit 8f4d37ec073c17e2d4aa8851df5837d798606d6f
> > > > > sched: high-res preemption tick
> > > >
> > > > > And this one fixed it
> > > > >
> > > > > commit 31656519e132f6612584815f128c83976a9aaaef
> > > > > sched, x86: clean up hrtick implementation
> > > >
> > > > hm, the backport of 31656519 is a bit intrusive.
> > > >
> > > > find below is an (untested!) version of it - i havent even build-tested
> > > > it. Does it work for you? But this is Greg's call really.
> > > >
> > >
> > > It largely depends on all the new IPI stuff that went into 27 as well,
> > > so I'd be surprised if its easily backportable..
> >
> > ah, i see. How about a simple patch then that disables hrtick [given
> > that it's now fixed .27 but the fix is too intrusive to backport]? Would
> > that be too risky for -stable?
>
> That's just flipping sched_feat(HRTICK) to 0 by default.

That sounds reasonable, I'll take a patch for that for -stable :)

thanks,

greg k-h

2008-08-19 19:31:01

by Justin Madru

[permalink] [raw]
Subject: Re: Backport hr-tick fix into .25/.26

Greg KH wrote:
> On Tue, Aug 19, 2008 at 11:33:00AM +0200, Peter Zijlstra wrote:
>
>> That's just flipping sched_feat(HRTICK) to 0 by default.
>>
>
> That sounds reasonable, I'll take a patch for that for -stable :)
>
> thanks,
>
> greg k-h
>
>
Thanks Greg :)

Will the patch make it into 2.6.26.3 and 2.6.25.16, or will it be in the
next point releases (2.6.26.4 and 2.6.25.17)?

Justin Madru

2008-08-19 20:09:10

by Greg KH

[permalink] [raw]
Subject: Re: Backport hr-tick fix into .25/.26

On Tue, Aug 19, 2008 at 12:30:17PM -0700, Justin Madru wrote:
> Greg KH wrote:
>> On Tue, Aug 19, 2008 at 11:33:00AM +0200, Peter Zijlstra wrote:
>>
>>> That's just flipping sched_feat(HRTICK) to 0 by default.
>>>
>>
>> That sounds reasonable, I'll take a patch for that for -stable :)
>>
>> thanks,
>>
>> greg k-h
>>
>>
> Thanks Greg :)
>
> Will the patch make it into 2.6.26.3 and 2.6.25.16, or will it be in the
> next point releases (2.6.26.4 and 2.6.25.17)?

As no one has sent me any patch to do this, it will be very hard to get
into the release that is happening tomorrow. It will have to be one
after that.

thanks,

greg k-h

2008-08-19 22:57:55

by Justin Madru

[permalink] [raw]
Subject: [PATCH] sched: disable hrtick implementation

The hrtick implementation in 2.6.25 and .26 has been known to cause boot
problems with at least Intel GMA cards. see:

https://bugs.freedesktop.org/show_bug.cgi?id=15602
http://bugzilla.kernel.org/show_bug.cgi?id=10892

A full fix to hrtick went into 2.6.27
(31656519e132f6612584815f128c83976a9aaaef),
but that fix is too intrusive to backport. Henceforth, we default to
disable hrtick.

Signed-off-by: Justin Madru <[email protected]>
Tested-by: Justin Madru <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ingo Molnar <[email protected]>

--- a/kernel/sched_features.h
+++ b/kernel/sched_features.h
@@ -4,7 +4,7 @@
SCHED_FEAT(AFFINE_WAKEUPS, 1)
SCHED_FEAT(CACHE_HOT_BUDDY, 1)
SCHED_FEAT(SYNC_WAKEUPS, 1)
-SCHED_FEAT(HRTICK, 1)
+SCHED_FEAT(HRTICK, 0)
SCHED_FEAT(DOUBLE_TICK, 0)
SCHED_FEAT(NORMALIZED_SLEEPER, 1)
SCHED_FEAT(DEADLINE, 1)

2008-08-20 07:24:54

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH] sched: disable hrtick implementation

On Tue, 2008-08-19 at 15:54 -0700, Justin Madru wrote:
> The hrtick implementation in 2.6.25 and .26 has been known to cause boot
> problems with at least Intel GMA cards. see:
>
> https://bugs.freedesktop.org/show_bug.cgi?id=15602
> http://bugzilla.kernel.org/show_bug.cgi?id=10892
>
> A full fix to hrtick went into 2.6.27
> (31656519e132f6612584815f128c83976a9aaaef),
> but that fix is too intrusive to backport. Henceforth, we default to
> disable hrtick.

Thing is, I'm still not understanding how, or if, the old code is wrong.

That said, this will paper over the problem just fine, and as its old
kernels we're talking about I don't particularly care.

Just occured to me that a Kconfig change that always makes
CONFIG_SCHED_HRTICK not set will also do.

> Signed-off-by: Justin Madru <[email protected]>
> Tested-by: Justin Madru <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Ingo Molnar <[email protected]>
>
> --- a/kernel/sched_features.h
> +++ b/kernel/sched_features.h
> @@ -4,7 +4,7 @@
> SCHED_FEAT(AFFINE_WAKEUPS, 1)
> SCHED_FEAT(CACHE_HOT_BUDDY, 1)
> SCHED_FEAT(SYNC_WAKEUPS, 1)
> -SCHED_FEAT(HRTICK, 1)
> +SCHED_FEAT(HRTICK, 0)
> SCHED_FEAT(DOUBLE_TICK, 0)
> SCHED_FEAT(NORMALIZED_SLEEPER, 1)
> SCHED_FEAT(DEADLINE, 1)
>

2008-08-20 18:43:41

by Justin Madru

[permalink] [raw]
Subject: Re: [PATCH] sched: disable hrtick implementation

Peter Zijlstra wrote:
> On Tue, 2008-08-19 at 15:54 -0700, Justin Madru wrote:
>> The hrtick implementation in 2.6.25 and .26 has been known to cause boot
>> problems with at least Intel GMA cards. see:
>>
>> https://bugs.freedesktop.org/show_bug.cgi?id=15602
>> http://bugzilla.kernel.org/show_bug.cgi?id=10892
>>
>> A full fix to hrtick went into 2.6.27
>> (31656519e132f6612584815f128c83976a9aaaef),
>> but that fix is too intrusive to backport. Henceforth, we default to
>> disable hrtick.
>
> Thing is, I'm still not understanding how, or if, the old code is wrong.
>
> That said, this will paper over the problem just fine, and as its old
> kernels we're talking about I don't particularly care.

Fair enough.

All I know is that right when I started testing .25-rc1 I kept getting a
garbled boot splash, blank screen, and even lock-ups during booting. It
seemed to be that if I disabled the boot splash and/or started x
manually the problem would rarely occur. So, maybe the first change to
hrtick just exposed a bug in the boot splash or the x server. If that's
true then, I don't get why right when .27-rc1 was out all the boot
problems were gone.

Bisecting found both commits (introduce and fix) to be about hrtick. So
maybe the commit that introduced my bug wasn't completely wrong, but at
least it was not right enough because of the second commit was trying to
fix something -- right? And the kernel still boots with hrtick disabled,
I tested that.


> Just occured to me that a Kconfig change that always makes
> CONFIG_SCHED_HRTICK not set will also do.
>> Signed-off-by: Justin Madru <[email protected]>
>> Tested-by: Justin Madru <[email protected]>
>> Cc: Peter Zijlstra <[email protected]>
>> Cc: Ingo Molnar <[email protected]>
>>
>> --- a/kernel/sched_features.h
>> +++ b/kernel/sched_features.h
>> @@ -4,7 +4,7 @@
>> SCHED_FEAT(AFFINE_WAKEUPS, 1)
>> SCHED_FEAT(CACHE_HOT_BUDDY, 1)
>> SCHED_FEAT(SYNC_WAKEUPS, 1)
>> -SCHED_FEAT(HRTICK, 1)
>> +SCHED_FEAT(HRTICK, 0)
>> SCHED_FEAT(DOUBLE_TICK, 0)
>> SCHED_FEAT(NORMALIZED_SLEEPER, 1)
>> SCHED_FEAT(DEADLINE, 1)
>>

Wouldn't disabling CONFIG_HIGH_RES_TIMERS (in the menu) also disable
CONFIG_SCHED_HRTICK? In any case what about this patch below?


The hrtick implementation in 2.6.25 and .26 has been known to cause boot
problems with at least Intel GMA cards. see:

https://bugs.freedesktop.org/show_bug.cgi?id=15602
http://bugzilla.kernel.org/show_bug.cgi?id=10892

A full fix to hrtick went into 2.6.27
(31656519e132f6612584815f128c83976a9aaaef),
but that fix is too intrusive to backport. Henceforth, we default to
disable hrtick.

Signed-off-by: Justin Madru <[email protected]>
Tested-by: Justin Madru <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ingo Molnar <[email protected]>

--- a/kernel/Kconfig.hz
+++ b/kernel/Kconfig.hz
@@ -55,4 +55,5 @@
default 1000 if HZ_1000

config SCHED_HRTICK
- def_bool HIGH_RES_TIMERS && X86
+ bool
+ default n