2014-04-30 08:48:16

by Ma, Xindong

[permalink] [raw]
Subject: [PATCH] hrtimer:do not start hrtimer on other cpu if it is the leftmost timer.

On SMP system, if cpuX is idle and it starts an hrtimer, the timer
will be started on cpuY. But it can not reprogram the event source
on cpuY. The timer is inserted into rb tree of cpuY, if it is the
leftmost timer on cpuY and it is a very short timer, following hrtimers
started on cpuY will also not set the event source. As a result,
the timers on cpuY will expire later than expected.

When this case is detected, we should start the timer on cpuX and
program event source properly.

Signed-off-by: Leon Ma <[email protected]>
---
kernel/hrtimer.c | 5 +++++
1 file changed, 5 insertions(+)

diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c
index d55092c..68becbc 100644
--- a/kernel/hrtimer.c
+++ b/kernel/hrtimer.c
@@ -234,6 +234,11 @@ again:
goto again;
}
timer->base = new_base;
+ } else {
+ if (cpu != this_cpu && hrtimer_check_target(timer, new_base)) {
+ cpu = this_cpu;
+ goto again;
+ }
}
return new_base;
}
--
1.7.9.5


2014-04-30 10:35:37

by Viresh Kumar

[permalink] [raw]
Subject: Re: [PATCH] hrtimer:do not start hrtimer on other cpu if it is the leftmost timer.

Hi Leon,

On Wed, Apr 30, 2014 at 2:13 PM, Leon Ma <[email protected]> wrote:
> On SMP system, if cpuX is idle and it starts an hrtimer, the timer
> will be started on cpuY. But it can not reprogram the event source
> on cpuY. The timer is inserted into rb tree of cpuY, if it is the
> leftmost timer on cpuY and it is a very short timer, following hrtimers
> started on cpuY will also not set the event source. As a result,
> the timers on cpuY will expire later than expected.

Don't know but the explanation confused me a bit :), thought the patch
looked fine. So, in my words this is what I understood, let me know if
I am on the right side.

Current base of timer is: cpuY and we started timer from cpuX. If cpuX
is idle, we might select cpuY again as the base but we wouldn't be able
to reprogram the event source as it has to be done by local CPU.

And so in this case you want to select cpuX instead, right?

> When this case is detected, we should start the timer on cpuX and
> program event source properly.
>
> Signed-off-by: Leon Ma <[email protected]>
> ---
> kernel/hrtimer.c | 5 +++++
> 1 file changed, 5 insertions(+)
>
> diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c
> index d55092c..68becbc 100644
> --- a/kernel/hrtimer.c
> +++ b/kernel/hrtimer.c
> @@ -234,6 +234,11 @@ again:
> goto again;
> }
> timer->base = new_base;
> + } else {
> + if (cpu != this_cpu && hrtimer_check_target(timer, new_base)) {
> + cpu = this_cpu;
> + goto again;
> + }
> }
> return new_base;
> }
> --
> 1.7.9.5
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

Subject: [tip:timers/urgent] hrtimer: Prevent remote enqueue of leftmost timers

Commit-ID: 012a45e3f4af68e86d85cce060c6c2fed56498b2
Gitweb: http://git.kernel.org/tip/012a45e3f4af68e86d85cce060c6c2fed56498b2
Author: Leon Ma <[email protected]>
AuthorDate: Wed, 30 Apr 2014 16:43:10 +0800
Committer: Thomas Gleixner <[email protected]>
CommitDate: Wed, 30 Apr 2014 12:34:51 +0200

hrtimer: Prevent remote enqueue of leftmost timers

If a cpu is idle and starts an hrtimer which is not pinned on that
same cpu, the nohz code might target the timer to a different cpu.

In the case that we switch the cpu base of the timer we already have a
sanity check in place, which determines whether the timer is earlier
than the current leftmost timer on the target cpu. In that case we
enqueue the timer on the current cpu because we cannot reprogram the
clock event device on the target.

If the timers base is already the target CPU we do not have this
sanity check in place so we enqueue the timer as the leftmost timer in
the target cpus rb tree, but we cannot reprogram the clock event
device on the target cpu. So the timer expires late and subsequently
prevents the reprogramming of the target cpu clock event device until
the previously programmed event fires or a timer with an earlier
expiry time gets enqueued on the target cpu itself.

Add the same target check as we have for the switch base case and
start the timer on the current cpu if it would become the leftmost
timer on the target.

[ tglx: Rewrote subject and changelog ]

Signed-off-by: Leon Ma <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Cc: [email protected]
Signed-off-by: Thomas Gleixner <[email protected]>
---
kernel/hrtimer.c | 5 +++++
1 file changed, 5 insertions(+)

diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c
index e3724fd..6b715c0 100644
--- a/kernel/hrtimer.c
+++ b/kernel/hrtimer.c
@@ -234,6 +234,11 @@ again:
goto again;
}
timer->base = new_base;
+ } else {
+ if (cpu != this_cpu && hrtimer_check_target(timer, new_base)) {
+ cpu = this_cpu;
+ goto again;
+ }
}
return new_base;
}

2014-04-30 12:29:32

by Ma, Xindong

[permalink] [raw]
Subject: RE: [PATCH] hrtimer:do not start hrtimer on other cpu if it is the leftmost timer.


> -----Original Message-----
> From: [email protected] [mailto:[email protected]] On Behalf Of
> Viresh Kumar
> Sent: Wednesday, April 30, 2014 6:36 PM
> To: Ma, Xindong
> Cc: Thomas Gleixner; [email protected]
> Subject: Re: [PATCH] hrtimer:do not start hrtimer on other cpu if it is the
> leftmost timer.
>
> Hi Leon,
>
> On Wed, Apr 30, 2014 at 2:13 PM, Leon Ma <[email protected]> wrote:
> > On SMP system, if cpuX is idle and it starts an hrtimer, the timer
> > will be started on cpuY. But it can not reprogram the event source on
> > cpuY. The timer is inserted into rb tree of cpuY, if it is the
> > leftmost timer on cpuY and it is a very short timer, following
> > hrtimers started on cpuY will also not set the event source. As a
> > result, the timers on cpuY will expire later than expected.
>
> Don't know but the explanation confused me a bit :), thought the patch looked
> fine. So, in my words this is what I understood, let me know if I am on the right
> side.
>
> Current base of timer is: cpuY and we started timer from cpuX. If cpuX is idle,
> we might select cpuY again as the base but we wouldn't be able to reprogram
> the event source as it has to be done by local CPU.
>
> And so in this case you want to select cpuX instead, right?
Yes. Thanks Thomas to refine the changelog and make it clear:
http://git.kernel.org/cgit/linux/kernel/git/tip/tip.git/commit/?id=012a45e3f4af68e86d85cce060c6c2fed56498b2
>
> > When this case is detected, we should start the timer on cpuX and
> > program event source properly.
> >
> > Signed-off-by: Leon Ma <[email protected]>
> > ---
> > kernel/hrtimer.c | 5 +++++
> > 1 file changed, 5 insertions(+)
> >
> > diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c index
> > d55092c..68becbc 100644
> > --- a/kernel/hrtimer.c
> > +++ b/kernel/hrtimer.c
> > @@ -234,6 +234,11 @@ again:
> > goto again;
> > }
> > timer->base = new_base;
> > + } else {
> > + if (cpu != this_cpu && hrtimer_check_target(timer,
> new_base)) {
> > + cpu = this_cpu;
> > + goto again;
> > + }
> > }
> > return new_base;
> > }
> > --
> > 1.7.9.5
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe
> > linux-kernel" in the body of a message to [email protected]
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at http://www.tux.org/lkml/