2013-04-10 21:09:12

by Michael Bohan

[permalink] [raw]
Subject: [PATCH 0/2] hrtimer preemption bug fixes

Here are a couple of hrtimer bug fixes related to preemption.

Michael Bohan (2):
hrtimer: Consider preemption when migrating hrtimer cpu_bases
hrtimer: Prevent enqueue of hrtimer on dead CPU

kernel/hrtimer.c | 7 +++++--
1 files changed, 5 insertions(+), 2 deletions(-)

--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
hosted by The Linux Foundation


2013-04-10 21:08:43

by Michael Bohan

[permalink] [raw]
Subject: [PATCH 1/2] hrtimer: Consider preemption when migrating hrtimer cpu_bases

When switching to a new cpu_base in switch_hrtimer_base(), we
briefly enable preemption by unlocking the cpu_base lock in two
places. During this interval it's possible for the running thread
to be swapped to a different CPU.

Consider the following example:

CPU #0 CPU #1
---- ----
hrtimer_start() ...
lock_hrtimer_base()
switch_hrtimer_base()
this_cpu = 0;
target_cpu_base = 0;
raw_spin_unlock(&cpu_base->lock)
<migrate to CPU 1>
... this_cpu == 0
cpu == this_cpu
timer->base = CPU #0
timer->base != LOCAL_CPU

Since the cached this_cpu is no longer accurate, we'll skip the
hrtimer_check_target() check. Once we eventually go to program
the hardware, we'll decide not to do so since it knows the real
CPU that we're running on is not the same as the chosen base. As
a consequence, we may end up missing the hrtimer's deadline.

Fix this by updating the local CPU number each time we retake a
cpu_base lock in switch_hrtimer_base().

Another possibility is to disable preemption across the whole of
switch_hrtimer_base. This looks suboptimal since preemption
would be disabled while waiting for lock(s).

Signed-off-by: Michael Bohan <[email protected]>
---
kernel/hrtimer.c | 4 +++-
1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c
index cc47812..3f0bce9 100644
--- a/kernel/hrtimer.c
+++ b/kernel/hrtimer.c
@@ -225,10 +225,12 @@ again:
raw_spin_unlock(&base->cpu_base->lock);
raw_spin_lock(&new_base->cpu_base->lock);

+ this_cpu = smp_processor_id();
+
if (cpu != this_cpu && hrtimer_check_target(timer, new_base)) {
- cpu = this_cpu;
raw_spin_unlock(&new_base->cpu_base->lock);
raw_spin_lock(&base->cpu_base->lock);
+ cpu = smp_processor_id();
timer->base = base;
goto again;
}
--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
hosted by The Linux Foundation

2013-04-10 21:08:42

by Michael Bohan

[permalink] [raw]
Subject: [PATCH 2/2] hrtimer: Prevent enqueue of hrtimer on dead CPU

When switching the hrtimer cpu_base, we briefly allow for
preemption to become enabled by unlocking the cpu_base lock.
During this time, the CPU corresponding to the new cpu_base
that was selected may in fact go offline. In this scenario, the
hrtimer is enqueued to a CPU that's not online, and therefore
it never fires.

As an example, consider this example:

CPU #0 CPU #1
---- ----
... hrtimer_start()
lock_hrtimer_base()
switch_hrtimer_base()
cpu = hrtimer_get_target() -> 1
spin_unlock(&cpu_base->lock)
<migrate thread to CPU #0>
<offline>
spin_lock(&new_base->lock)
this_cpu = 0
cpu != this_cpu
enqueue_hrtimer(cpu_base #1)

To prevent this scenario, verify that the CPU corresponding to
the new cpu_base is indeed online before selecting it in
hrtimer_switch_base(). If it's not online, fallback to using the
base of the current CPU.

Signed-off-by: Michael Bohan <[email protected]>
---
kernel/hrtimer.c | 3 ++-
1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c
index 3f0bce9..54c5d98 100644
--- a/kernel/hrtimer.c
+++ b/kernel/hrtimer.c
@@ -227,7 +227,8 @@ again:

this_cpu = smp_processor_id();

- if (cpu != this_cpu && hrtimer_check_target(timer, new_base)) {
+ if (cpu != this_cpu && (hrtimer_check_target(timer, new_base)
+ || !cpu_online(cpu))) {
raw_spin_unlock(&new_base->cpu_base->lock);
raw_spin_lock(&base->cpu_base->lock);
cpu = smp_processor_id();
--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
hosted by The Linux Foundation

2013-04-18 09:40:37

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH 1/2] hrtimer: Consider preemption when migrating hrtimer cpu_bases

On Wed, 10 Apr 2013, Michael Bohan wrote:

> When switching to a new cpu_base in switch_hrtimer_base(), we
> briefly enable preemption by unlocking the cpu_base lock in two
> places. During this interval it's possible for the running thread
> to be swapped to a different CPU.
>
> Consider the following example:
>
> CPU #0 CPU #1
> ---- ----
> hrtimer_start() ...
> lock_hrtimer_base()
> switch_hrtimer_base()
> this_cpu = 0;
> target_cpu_base = 0;
> raw_spin_unlock(&cpu_base->lock)
> <migrate to CPU 1>

Errm. switch_hrtimer_base() is called with interrupts disabled and
they stay disabled, so how exactly is the task going to be migrated?

Thanks,

tglx

2013-04-18 09:44:07

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH 2/2] hrtimer: Prevent enqueue of hrtimer on dead CPU

On Wed, 10 Apr 2013, Michael Bohan wrote:

> When switching the hrtimer cpu_base, we briefly allow for
> preemption to become enabled by unlocking the cpu_base lock.

No, interrupts still stay disabled and there is no way to preempt
here. Can you please explain, what you are trying to do and what makes
you think there is a problem in that code?

Thanks,

tglx

2013-04-19 01:37:53

by Michael Bohan

[permalink] [raw]
Subject: Re: [PATCH 1/2] hrtimer: Consider preemption when migrating hrtimer cpu_bases

On 4/18/2013 2:40 AM, Thomas Gleixner wrote:
> On Wed, 10 Apr 2013, Michael Bohan wrote:
>
>> When switching to a new cpu_base in switch_hrtimer_base(), we
>> briefly enable preemption by unlocking the cpu_base lock in two
>> places. During this interval it's possible for the running thread
>> to be swapped to a different CPU.
>>
>> Consider the following example:
>>
>> CPU #0 CPU #1
>> ---- ----
>> hrtimer_start() ...
>> lock_hrtimer_base()
>> switch_hrtimer_base()
>> this_cpu = 0;
>> target_cpu_base = 0;
>> raw_spin_unlock(&cpu_base->lock)
>> <migrate to CPU 1>
>
> Errm. switch_hrtimer_base() is called with interrupts disabled and
> they stay disabled, so how exactly is the task going to be migrated?

My mistake - I missed that important fact. Please disregard this.

Thanks,
Mike

--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
hosted by The Linux Foundation