2013-06-25 21:05:43

by David Ahern

[permalink] [raw]
Subject: deadlock in scheduler enabling HRTICK feature

Peter/Ingo:

I can reliably cause a deadlock in the scheduler by enabling the HRTICK
feature. I first hit the problem with 2.6.27 but have been able to
reproduce it with newer kernels. I have not tried top of Linus' tree, so
perhaps this has been fixed in 3.10. Exact backtrace differs by release,
but the root cause is the same: the run queue is locked early in the
schedule path and then wanted again servicing the softirq.

Using Fedora 18 and the 3.9.6-200.fc18.x86_64 kernel as an example,

[root@f18 ~]# cat /sys/kernel/debug/sched_features
GENTLE_FAIR_SLEEPERS START_DEBIT NO_NEXT_BUDDY LAST_BUDDY
CACHE_HOT_BUDDY WAKEUP_PREEMPTION ARCH_POWER NO_HRTICK NO_DOUBLE_TICK
LB_BIAS OWNER_SPIN NONTASK_POWER TTWU_QUEUE NO_FORCE_SD_OVERLAP
RT_RUNTIME_SHARE NO_LB_MIN NO_NUMA NO_NUMA_FORCE

[root@f18 ~]# echo HRTICK > /sys/kernel/debug/sched_features

[root@f18 ~]# cat /sys/kernel/debug/sched_features
GENTLE_FAIR_SLEEPERS START_DEBIT NO_NEXT_BUDDY LAST_BUDDY
CACHE_HOT_BUDDY WAKEUP_PREEMPTION ARCH_POWER HRTICK NO_DOUBLE_TICK
LB_BIAS OWNER_SPIN NONTASK_POWER TTWU_QUEUE NO_FORCE_SD_OVERLAP
RT_RUNTIME_SHARE NO_LB_MIN NO_NUMA NO_NUMA_FORCE

For a workload a simple kernel build suffices: 'make O=/tmp/kbuild -j 8'
on a 4vcpu VM. Lockup occurs pretty quickly.

The relevant stack trace from the nmi watchdog:
...
[ 219.467698] <<EOE>> [<ffffffff81093c61>] try_to_wake_up+0x1d1/0x2d0
[ 219.467698] [<ffffffff81043daf>] ? kvm_clock_read+0x1f/0x30
[ 219.467698] [<ffffffff81093dc7>] wake_up_process+0x27/0x50
[ 219.467698] [<ffffffff81066fc9>] wakeup_softirqd+0x29/0x30
[ 219.467698] [<ffffffff81067b95>] raise_softirq_irqoff+0x25/0x30
[ 219.467698] [<ffffffff810867c5>] __hrtimer_start_range_ns+0x3a5/0x400
[ 219.467698] [<ffffffff8109a089>] ? update_curr+0x99/0x170
[ 219.467698] [<ffffffff81086854>] hrtimer_start_range_ns+0x14/0x20
[ 219.467698] [<ffffffff81090bf0>] hrtick_start+0x90/0xa0
[ 219.467698] [<ffffffff810985f8>] hrtick_start_fair+0x88/0xd0
[ 219.467698] [<ffffffff81098f33>] hrtick_update+0x73/0x80
[ 219.467698] [<ffffffff8109c876>] enqueue_task_fair+0x346/0x550
[ 219.467698] [<ffffffff81090ab6>] enqueue_task+0x66/0x80
[ 219.467698] [<ffffffff81091443>] activate_task+0x23/0x30
[ 219.467698] [<ffffffff810917ac>] ttwu_do_activate.constprop.83+0x3c/0x70
[ 219.467698] [<ffffffff81093c6c>] try_to_wake_up+0x1dc/0x2d0
[ 219.467698] [<ffffffff81198898>] ? mem_cgroup_charge_common+0xa8/0x120
[ 219.467698] [<ffffffff81093d72>] default_wake_function+0x12/0x20
[ 219.467698] [<ffffffff810833fd>] autoremove_wake_function+0x1d/0x50
[ 219.467698] [<ffffffff8108b0e5>] __wake_up_common+0x55/0x90
[ 219.467698] [<ffffffff8108e973>] __wake_up_sync_key+0x53/0x80
...


You can see the nested calls to try_to_wake_up() which has called
ttwu_queue() in both places. The trouble spot is here in ttwu_queue:
...
raw_spin_lock(&rq->lock); <---- dead lock here on second call
ttwu_do_activate(rq, p, 0);
raw_spin_unlock(&rq->lock);
...

David


2013-06-25 21:17:19

by Peter Zijlstra

[permalink] [raw]
Subject: Re: deadlock in scheduler enabling HRTICK feature

On Tue, Jun 25, 2013 at 03:05:38PM -0600, David Ahern wrote:
> Peter/Ingo:
>
> I can reliably cause a deadlock in the scheduler by enabling the HRTICK
> feature. I first hit the problem with 2.6.27 but have been able to reproduce
> it with newer kernels. I have not tried top of Linus' tree, so perhaps this
> has been fixed in 3.10.

Yeah, I know.. its been on the todo list forever but its never been
important. Its disabled by default and all sched debug features are for
those brave enough to keep the pieces ;-)

2013-06-25 21:20:06

by David Ahern

[permalink] [raw]
Subject: Re: deadlock in scheduler enabling HRTICK feature

On 6/25/13 3:17 PM, Peter Zijlstra wrote:
> On Tue, Jun 25, 2013 at 03:05:38PM -0600, David Ahern wrote:
>> Peter/Ingo:
>>
>> I can reliably cause a deadlock in the scheduler by enabling the HRTICK
>> feature. I first hit the problem with 2.6.27 but have been able to reproduce
>> it with newer kernels. I have not tried top of Linus' tree, so perhaps this
>> has been fixed in 3.10.
>
> Yeah, I know.. its been on the todo list forever but its never been
> important. Its disabled by default and all sched debug features are for
> those brave enough to keep the pieces ;-)
>

Not a whole to of pieces to keep.

What is the expectation that the feature provides? not a whole lot of
documentation on it. I walked down the path wondering if it solved an
odd problem we are seeing with the CFS in 2.6.27 kernel.

David

2013-06-26 07:06:01

by Peter Zijlstra

[permalink] [raw]
Subject: Re: deadlock in scheduler enabling HRTICK feature

On Tue, Jun 25, 2013 at 03:20:00PM -0600, David Ahern wrote:
> On 6/25/13 3:17 PM, Peter Zijlstra wrote:
> >On Tue, Jun 25, 2013 at 03:05:38PM -0600, David Ahern wrote:
> >>Peter/Ingo:
> >>
> >>I can reliably cause a deadlock in the scheduler by enabling the HRTICK
> >>feature. I first hit the problem with 2.6.27 but have been able to reproduce
> >>it with newer kernels. I have not tried top of Linus' tree, so perhaps this
> >>has been fixed in 3.10.
> >
> >Yeah, I know.. its been on the todo list forever but its never been
> >important. Its disabled by default and all sched debug features are for
> >those brave enough to keep the pieces ;-)
> >
>
> Not a whole to of pieces to keep.
>
> What is the expectation that the feature provides? not a whole lot of
> documentation on it. I walked down the path wondering if it solved an odd
> problem we are seeing with the CFS in 2.6.27 kernel.

Its supposed to use hrtimers for slice expiry instead of the regular tick.

IIRC it did work at some point but bitrotted a bit since. The good news is that
the deadline scheduler wants to use it and I'll probably have to fix it up
then.

But yes, I need to write more comments in sched/features.h

2013-06-26 16:46:46

by David Ahern

[permalink] [raw]
Subject: Re: deadlock in scheduler enabling HRTICK feature

On 6/26/13 1:05 AM, Peter Zijlstra wrote:
>> What is the expectation that the feature provides? not a whole lot of
>> documentation on it. I walked down the path wondering if it solved an odd
>> problem we are seeing with the CFS in 2.6.27 kernel.
>
> Its supposed to use hrtimers for slice expiry instead of the regular tick.

So theoretically CPU bound tasks would get preempted sooner? That was my
guess/hope anyways.

>
> IIRC it did work at some point but bitrotted a bit since. The good news is that
> the deadline scheduler wants to use it and I'll probably have to fix it up
> then.

Hmmm.... meaning I should not be expecting anything in the next couple
of months? Any gut opinions on how to approach the nested problem - at
least a quick hack for me to see if this option has any impact on our
problem? eg., a CPU variable noting we already have the runqueue lock
and no need to grab it a second time.

David

2013-06-27 10:43:17

by Peter Zijlstra

[permalink] [raw]
Subject: Re: deadlock in scheduler enabling HRTICK feature

On Wed, Jun 26, 2013 at 10:46:33AM -0600, David Ahern wrote:
> On 6/26/13 1:05 AM, Peter Zijlstra wrote:
> >>What is the expectation that the feature provides? not a whole lot of
> >>documentation on it. I walked down the path wondering if it solved an odd
> >>problem we are seeing with the CFS in 2.6.27 kernel.
> >
> >Its supposed to use hrtimers for slice expiry instead of the regular tick.
>
> So theoretically CPU bound tasks would get preempted sooner? That was my
> guess/hope anyways.

Doth the below worketh?

---
kernel/sched/core.c | 20 +++++++++++---------
1 file changed, 11 insertions(+), 9 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 9b1f2e5..0d8eb45 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -370,13 +370,6 @@ static struct rq *this_rq_lock(void)
#ifdef CONFIG_SCHED_HRTICK
/*
* Use HR-timers to deliver accurate preemption points.
- *
- * Its all a bit involved since we cannot program an hrt while holding the
- * rq->lock. So what we do is store a state in in rq->hrtick_* and ask for a
- * reschedule event.
- *
- * When we get rescheduled we reprogram the hrtick_timer outside of the
- * rq->lock.
*/

static void hrtick_clear(struct rq *rq)
@@ -404,6 +397,15 @@ static enum hrtimer_restart hrtick(struct hrtimer *timer)
}

#ifdef CONFIG_SMP
+
+static int __hrtick_restart(struct rq *rq)
+{
+ struct hrtimer *timer = &rq->hrtick_timer;
+ ktime_t time = hrtimer_get_softexpires(timer);
+
+ return __hrtimer_start_range_ns(timer, time, 0, HRTIMER_MODE_ABS_PINNED, 0);
+}
+
/*
* called from hardirq (IPI) context
*/
@@ -412,7 +414,7 @@ static void __hrtick_start(void *arg)
struct rq *rq = arg;

raw_spin_lock(&rq->lock);
- hrtimer_restart(&rq->hrtick_timer);
+ __hrtick_restart(rq);
rq->hrtick_csd_pending = 0;
raw_spin_unlock(&rq->lock);
}
@@ -430,7 +432,7 @@ void hrtick_start(struct rq *rq, u64 delay)
hrtimer_set_expires(timer, time);

if (rq == this_rq()) {
- hrtimer_restart(timer);
+ __hrtick_restart(rq);
} else if (!rq->hrtick_csd_pending) {
__smp_call_function_single(cpu_of(rq), &rq->hrtick_csd, 0);
rq->hrtick_csd_pending = 1;

2013-06-27 10:53:32

by Peter Zijlstra

[permalink] [raw]
Subject: Re: deadlock in scheduler enabling HRTICK feature

On Thu, Jun 27, 2013 at 12:43:09PM +0200, Peter Zijlstra wrote:
> On Wed, Jun 26, 2013 at 10:46:33AM -0600, David Ahern wrote:
> > On 6/26/13 1:05 AM, Peter Zijlstra wrote:
> > >>What is the expectation that the feature provides? not a whole lot of
> > >>documentation on it. I walked down the path wondering if it solved an odd
> > >>problem we are seeing with the CFS in 2.6.27 kernel.
> > >
> > >Its supposed to use hrtimers for slice expiry instead of the regular tick.
> >
> > So theoretically CPU bound tasks would get preempted sooner? That was my
> > guess/hope anyways.
>
> Doth the below worketh?
>

Related to all this; the reason its not enabled by default is that mucking
about with hrtimers all the while is god awful expensive.

I've had ideas about making this a special purpose 'hard-coded' timer in the
hrtimer guts that's only ever re-programmed when the new value is sooner.

By making it a 'special' timer we can avoid the whole rb-tree song and dance;
and by taking 'spurious' short interrupts we can avoid prodding the hardware
too often.

Then again; Thomas will likely throw frozen seafood my way for even proposing
stuff like this and I'm not even sure that's going to be enough to make the
cost acceptable.

2013-06-27 12:28:31

by Mike Galbraith

[permalink] [raw]
Subject: Re: deadlock in scheduler enabling HRTICK feature

On Thu, 2013-06-27 at 12:53 +0200, Peter Zijlstra wrote:
> On Thu, Jun 27, 2013 at 12:43:09PM +0200, Peter Zijlstra wrote:
> > On Wed, Jun 26, 2013 at 10:46:33AM -0600, David Ahern wrote:
> > > On 6/26/13 1:05 AM, Peter Zijlstra wrote:
> > > >>What is the expectation that the feature provides? not a whole lot of
> > > >>documentation on it. I walked down the path wondering if it solved an odd
> > > >>problem we are seeing with the CFS in 2.6.27 kernel.
> > > >
> > > >Its supposed to use hrtimers for slice expiry instead of the regular tick.
> > >
> > > So theoretically CPU bound tasks would get preempted sooner? That was my
> > > guess/hope anyways.
> >
> > Doth the below worketh?
> >
>
> Related to all this; the reason its not enabled by default is that mucking
> about with hrtimers all the while is god awful expensive.

That cost sprang to mind when you mentioned that deadline needs hrtick.

> I've had ideas about making this a special purpose 'hard-coded' timer in the
> hrtimer guts that's only ever re-programmed when the new value is sooner.
>
> By making it a 'special' timer we can avoid the whole rb-tree song and dance;
> and by taking 'spurious' short interrupts we can avoid prodding the hardware
> too often.
>
> Then again; Thomas will likely throw frozen seafood my way for even proposing
> stuff like this and I'm not even sure that's going to be enough to make the
> cost acceptable.

Could be worse, flaming marshmallows stick.

-Mike

2013-06-27 13:06:18

by Ingo Molnar

[permalink] [raw]
Subject: Re: deadlock in scheduler enabling HRTICK feature


* Peter Zijlstra <[email protected]> wrote:

> On Thu, Jun 27, 2013 at 12:43:09PM +0200, Peter Zijlstra wrote:
> > On Wed, Jun 26, 2013 at 10:46:33AM -0600, David Ahern wrote:
> > > On 6/26/13 1:05 AM, Peter Zijlstra wrote:
> > > >>What is the expectation that the feature provides? not a whole lot of
> > > >>documentation on it. I walked down the path wondering if it solved an odd
> > > >>problem we are seeing with the CFS in 2.6.27 kernel.
> > > >
> > > >Its supposed to use hrtimers for slice expiry instead of the regular tick.
> > >
> > > So theoretically CPU bound tasks would get preempted sooner? That was my
> > > guess/hope anyways.
> >
> > Doth the below worketh?
> >
>
> Related to all this; the reason its not enabled by default is that
> mucking about with hrtimers all the while is god awful expensive.
>
> I've had ideas about making this a special purpose 'hard-coded' timer in
> the hrtimer guts that's only ever re-programmed when the new value is
> sooner.
>
> By making it a 'special' timer we can avoid the whole rb-tree song and
> dance; and by taking 'spurious' short interrupts we can avoid prodding
> the hardware too often.

Sounds neat ...

> Then again; Thomas will likely throw frozen seafood my way for even
> proposing stuff like this and I'm not even sure that's going to be
> enough to make the cost acceptable.

(Could be worse: rotten seafood?)

Thanks,

Ingo

2013-06-27 19:18:09

by Andy Lutomirski

[permalink] [raw]
Subject: Re: deadlock in scheduler enabling HRTICK feature

On 06/27/2013 03:53 AM, Peter Zijlstra wrote:
> On Thu, Jun 27, 2013 at 12:43:09PM +0200, Peter Zijlstra wrote:
>> On Wed, Jun 26, 2013 at 10:46:33AM -0600, David Ahern wrote:
>>> On 6/26/13 1:05 AM, Peter Zijlstra wrote:
>>>>> What is the expectation that the feature provides? not a whole lot of
>>>>> documentation on it. I walked down the path wondering if it solved an odd
>>>>> problem we are seeing with the CFS in 2.6.27 kernel.
>>>>
>>>> Its supposed to use hrtimers for slice expiry instead of the regular tick.
>>>
>>> So theoretically CPU bound tasks would get preempted sooner? That was my
>>> guess/hope anyways.
>>
>> Doth the below worketh?
>>
>
> Related to all this; the reason its not enabled by default is that mucking
> about with hrtimers all the while is god awful expensive.
>
> I've had ideas about making this a special purpose 'hard-coded' timer in the
> hrtimer guts that's only ever re-programmed when the new value is sooner.
>
> By making it a 'special' timer we can avoid the whole rb-tree song and dance;
> and by taking 'spurious' short interrupts we can avoid prodding the hardware
> too often.

Supposedly, on really new x86 systems, the TSC deadline timer is so fast
to reprogram that this isn't worth it.

I wonder if the general timing code should have a way to indicate that a
given clockevent is (a) very fast and (b) is actually locked to a
clocksource as opposed to just having a vaguely accurately calibrated
frequency.

--Andy

2013-06-27 20:38:18

by Peter Zijlstra

[permalink] [raw]
Subject: Re: deadlock in scheduler enabling HRTICK feature

On Thu, Jun 27, 2013 at 12:18:04PM -0700, Andy Lutomirski wrote:
> Supposedly, on really new x86 systems, the TSC deadline timer is so fast
> to reprogram that this isn't worth it.

>From what I've heard its not all that fast. I should play with it sometime, my
SNB desktop actually has this thing.

2013-06-27 22:29:02

by David Ahern

[permalink] [raw]
Subject: Re: deadlock in scheduler enabling HRTICK feature

On 6/27/13 4:43 AM, Peter Zijlstra wrote:
> On Wed, Jun 26, 2013 at 10:46:33AM -0600, David Ahern wrote:
>> On 6/26/13 1:05 AM, Peter Zijlstra wrote:
>>>> What is the expectation that the feature provides? not a whole lot of
>>>> documentation on it. I walked down the path wondering if it solved an odd
>>>> problem we are seeing with the CFS in 2.6.27 kernel.
>>>
>>> Its supposed to use hrtimers for slice expiry instead of the regular tick.
>>
>> So theoretically CPU bound tasks would get preempted sooner? That was my
>> guess/hope anyways.
>
> Doth the below worketh?

It doth.

Usually make -j 8 for a kernel build in a VM would lock it up pretty
quickly. With the patch I was able to run full builds multiple times.

As for the solution you are avoiding the nesting by not waking up the
softirq daemon.

I'll backport to 2.6.27 and see what happens. Thanks for the patch.

David

2013-06-28 09:00:31

by Ingo Molnar

[permalink] [raw]
Subject: Re: deadlock in scheduler enabling HRTICK feature


* David Ahern <[email protected]> wrote:

> On 6/27/13 4:43 AM, Peter Zijlstra wrote:
> >On Wed, Jun 26, 2013 at 10:46:33AM -0600, David Ahern wrote:
> >>On 6/26/13 1:05 AM, Peter Zijlstra wrote:
> >>>>What is the expectation that the feature provides? not a whole lot of
> >>>>documentation on it. I walked down the path wondering if it solved an odd
> >>>>problem we are seeing with the CFS in 2.6.27 kernel.
> >>>
> >>>Its supposed to use hrtimers for slice expiry instead of the regular tick.
> >>
> >>So theoretically CPU bound tasks would get preempted sooner? That was my
> >>guess/hope anyways.
> >
> >Doth the below worketh?
>
> It doth.
>
> Usually make -j 8 for a kernel build in a VM would lock it up pretty
> quickly. With the patch I was able to run full builds multiple
> times.
>
> As for the solution you are avoiding the nesting by not waking up
> the softirq daemon.

I guess we could merge this fix?

Thanks,

Ingo

2013-06-28 09:10:08

by Peter Zijlstra

[permalink] [raw]
Subject: Re: deadlock in scheduler enabling HRTICK feature

On Thu, Jun 27, 2013 at 04:28:57PM -0600, David Ahern wrote:
> On 6/27/13 4:43 AM, Peter Zijlstra wrote:
> >On Wed, Jun 26, 2013 at 10:46:33AM -0600, David Ahern wrote:
> >>On 6/26/13 1:05 AM, Peter Zijlstra wrote:
> >>>>What is the expectation that the feature provides? not a whole lot of
> >>>>documentation on it. I walked down the path wondering if it solved an odd
> >>>>problem we are seeing with the CFS in 2.6.27 kernel.
> >>>
> >>>Its supposed to use hrtimers for slice expiry instead of the regular tick.
> >>
> >>So theoretically CPU bound tasks would get preempted sooner? That was my
> >>guess/hope anyways.
> >
> >Doth the below worketh?
>
> It doth.
>
> Usually make -j 8 for a kernel build in a VM would lock it up pretty
> quickly. With the patch I was able to run full builds multiple times.

Good!

> As for the solution you are avoiding the nesting by not waking up the
> softirq daemon.

Yah! :-) Obviously doing a wakeup while holding scheduler locks isn't going to
work out well. And the only reason we really need that pesky softirq nonsense
is when we accidentally schedule a timer that's already expired; in that case
we'll run it from sirq context.

We don't care about missing events like that; there's always the actual tick
for backup.

I suppose I'd better go write a Changelog and properly submit the patch :-)

2013-06-28 09:19:23

by Peter Zijlstra

[permalink] [raw]
Subject: Re: deadlock in scheduler enabling HRTICK feature

On Fri, Jun 28, 2013 at 11:00:21AM +0200, Ingo Molnar wrote:
> I guess we could merge this fix?

---
Subject: sched: Fix HRTICK

David reported that the HRTICK sched feature was borken; which was enough
motivation for me to finally fix it ;-)

We should not allow hrtimer code to do softirq wakeups while holding scheduler
locks. The hrtimer code only needs this when we accidentally try to program an
expired time. We don't much care about those anyway since we have the regular
tick to fall back to.

Reported-by: David Ahern <[email protected]>
Tested-by: David Ahern <[email protected]>
Signed-off-by: Peter Zijlstra <[email protected]>
---
kernel/sched/core.c | 20 +++++++++++---------
1 file changed, 11 insertions(+), 9 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 9b1f2e5..0d8eb45 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -370,13 +370,6 @@ static struct rq *this_rq_lock(void)
#ifdef CONFIG_SCHED_HRTICK
/*
* Use HR-timers to deliver accurate preemption points.
- *
- * Its all a bit involved since we cannot program an hrt while holding the
- * rq->lock. So what we do is store a state in in rq->hrtick_* and ask for a
- * reschedule event.
- *
- * When we get rescheduled we reprogram the hrtick_timer outside of the
- * rq->lock.
*/

static void hrtick_clear(struct rq *rq)
@@ -404,6 +397,15 @@ static enum hrtimer_restart hrtick(struct hrtimer *timer)
}

#ifdef CONFIG_SMP
+
+static int __hrtick_restart(struct rq *rq)
+{
+ struct hrtimer *timer = &rq->hrtick_timer;
+ ktime_t time = hrtimer_get_softexpires(timer);
+
+ return __hrtimer_start_range_ns(timer, time, 0, HRTIMER_MODE_ABS_PINNED, 0);
+}
+
/*
* called from hardirq (IPI) context
*/
@@ -412,7 +414,7 @@ static void __hrtick_start(void *arg)
struct rq *rq = arg;

raw_spin_lock(&rq->lock);
- hrtimer_restart(&rq->hrtick_timer);
+ __hrtick_restart(rq);
rq->hrtick_csd_pending = 0;
raw_spin_unlock(&rq->lock);
}
@@ -430,7 +432,7 @@ void hrtick_start(struct rq *rq, u64 delay)
hrtimer_set_expires(timer, time);

if (rq == this_rq()) {
- hrtimer_restart(timer);
+ __hrtick_restart(rq);
} else if (!rq->hrtick_csd_pending) {
__smp_call_function_single(cpu_of(rq), &rq->hrtick_csd, 0);
rq->hrtick_csd_pending = 1;

2013-06-28 17:28:59

by David Ahern

[permalink] [raw]
Subject: Re: deadlock in scheduler enabling HRTICK feature

On 6/28/13 3:09 AM, Peter Zijlstra wrote:

> Yah! :-) Obviously doing a wakeup while holding scheduler locks isn't going to
> work out well. And the only reason we really need that pesky softirq nonsense
> is when we accidentally schedule a timer that's already expired; in that case
> we'll run it from sirq context.
>
> We don't care about missing events like that; there's always the actual tick
> for backup.
>
> I suppose I'd better go write a Changelog and properly submit the patch :-)
>

Thanks for the quick turnaround on the patch - and the explanation.

David

Subject: [tip:sched/core] sched: Fix HRTICK

Commit-ID: 971ee28cbd1ccd87b3164facd9359a534c1d2892
Gitweb: http://git.kernel.org/tip/971ee28cbd1ccd87b3164facd9359a534c1d2892
Author: Peter Zijlstra <[email protected]>
AuthorDate: Fri, 28 Jun 2013 11:18:53 +0200
Committer: Ingo Molnar <[email protected]>
CommitDate: Fri, 12 Jul 2013 13:52:58 +0200

sched: Fix HRTICK

David reported that the HRTICK sched feature was borken; which was enough
motivation for me to finally fix it ;-)

We should not allow hrtimer code to do softirq wakeups while holding scheduler
locks. The hrtimer code only needs this when we accidentally try to program an
expired time. We don't much care about those anyway since we have the regular
tick to fall back to.

Reported-by: David Ahern <[email protected]>
Tested-by: David Ahern <[email protected]>
Signed-off-by: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
---
kernel/sched/core.c | 20 +++++++++++---------
1 file changed, 11 insertions(+), 9 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 9b1f2e5..0d8eb45 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -370,13 +370,6 @@ static struct rq *this_rq_lock(void)
#ifdef CONFIG_SCHED_HRTICK
/*
* Use HR-timers to deliver accurate preemption points.
- *
- * Its all a bit involved since we cannot program an hrt while holding the
- * rq->lock. So what we do is store a state in in rq->hrtick_* and ask for a
- * reschedule event.
- *
- * When we get rescheduled we reprogram the hrtick_timer outside of the
- * rq->lock.
*/

static void hrtick_clear(struct rq *rq)
@@ -404,6 +397,15 @@ static enum hrtimer_restart hrtick(struct hrtimer *timer)
}

#ifdef CONFIG_SMP
+
+static int __hrtick_restart(struct rq *rq)
+{
+ struct hrtimer *timer = &rq->hrtick_timer;
+ ktime_t time = hrtimer_get_softexpires(timer);
+
+ return __hrtimer_start_range_ns(timer, time, 0, HRTIMER_MODE_ABS_PINNED, 0);
+}
+
/*
* called from hardirq (IPI) context
*/
@@ -412,7 +414,7 @@ static void __hrtick_start(void *arg)
struct rq *rq = arg;

raw_spin_lock(&rq->lock);
- hrtimer_restart(&rq->hrtick_timer);
+ __hrtick_restart(rq);
rq->hrtick_csd_pending = 0;
raw_spin_unlock(&rq->lock);
}
@@ -430,7 +432,7 @@ void hrtick_start(struct rq *rq, u64 delay)
hrtimer_set_expires(timer, time);

if (rq == this_rq()) {
- hrtimer_restart(timer);
+ __hrtick_restart(rq);
} else if (!rq->hrtick_csd_pending) {
__smp_call_function_single(cpu_of(rq), &rq->hrtick_csd, 0);
rq->hrtick_csd_pending = 1;