2014-04-29 22:55:05

by stuart hayes

[permalink] [raw]
Subject: [PATCH] hrtimer: invalid timeout set after hang_detected

Make hrtimer_force_reprogram() not reprogram the clock event device if hang_detected has been set in hrtimer_interrupt().

Otherwise, if an active hrtimer is changed by calling hrtimer_start() (for example) while hang_detected is set, the clock event device can be programmed with the wrong value, which can result in the clock event device interrupt occurring much later than expected, and timer functions being run very late.

This can occur, for instance, if a CPU goes idle and calls tick_nohz_stop_sched_tick() after hang_detected is set. The function tick_nohz_stop_sched_tick() will call hrtimer_start() to reprogram the sched_timer to a longer timeout. hrtimer_start() will call __hrtimer_start_range_ns(), which first calls remove_hrtimer() to remove sched_timer, then hrtimer_enqueue_reprogram() to add it with its new timeout. The problem is that remove_hrtimer() calls __remove_hrtimer(), which calls hrtimer_force_reprogram(), and hrtimer_force_reprogram() ignores hang_detected and will reprogram the clock event device to the next soonest hrtimer expiry, which could be, say, 11 seconds away. This overwrites the value that was programmed into the clock event device when hang_detected was set (which was no more than 100ms). Then hrtimer_enqueue_reprogram() calls hrtimer_reprogram(), which observes hang_detected and does not reprogram the clock event device, so the device remains set to the val
ue of, in this example, 11 seconds, during which time no clock event device interrupts occur and no timer expiration functions are run.


Signed-off-by: Stuart Hayes <[email protected]>
---

--- linux-3.15-rc3/kernel_orig/hrtimer.c 2014-04-29 13:10:58.087832963 -0400
+++ linux-3.15-rc3/kernel/hrtimer.c 2014-04-29 15:42:49.581084736 -0400
@@ -569,6 +569,15 @@ hrtimer_force_reprogram(struct hrtimer_c

cpu_base->expires_next.tv64 = expires_next.tv64;

+ /*
+ * If a hang was detected in the last timer interrupt then we
+ * do not schedule a timer which is earlier than the expiry
+ * which we enforced in the hang detection. We want the system
+ * to make progress.
+ */
+ if (cpu_base->hang_detected)
+ return;
+
if (cpu_base->expires_next.tv64 != KTIME_MAX)
tick_program_event(cpu_base->expires_next, 1);
}


2014-04-30 10:35:39

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH] hrtimer: invalid timeout set after hang_detected

Stuart,

On Tue, 29 Apr 2014, Stuart Hayes wrote:

> Make hrtimer_force_reprogram() not reprogram the clock event device if hang_detected has been set in hrtimer_interrupt().
>

Please use proper line breaks for the changelog.

> This can occur, for instance, if a CPU goes idle and calls
> tick_nohz_stop_sched_tick() after hang_detected is set. The
> function tick_nohz_stop_sched_tick() will call hrtimer_start() to
> reprogram the sched_timer to a longer timeout. hrtimer_start() will
> call __hrtimer_start_range_ns(), which first calls remove_hrtimer()
> to remove sched_timer, then hrtimer_enqueue_reprogram() to add it
> with its new timeout. The problem is that remove_hrtimer() calls
> __remove_hrtimer(), which calls hrtimer_force_reprogram(), and
> hrtimer_force_reprogram() ignores hang_detected and will reprogram
> the clock event device to the next soonest hrtimer expiry, which
> could be, say, 11 seconds away. This overwrites the value that was
> programmed into the clock event device when hang_detected was set
> (which was no more than 100ms). Then hrtimer_enqueue_reprogram()
> calls hrtimer_reprogram(), which observes hang_detected and does not
> reprogram the clock event device, so the device remains set to the
> val ue of, in this example, 11 seconds, during which time no clock
> event device interrupts occur and no timer expiration functions are
> run.

I took the liberty to rewrite the changelog. Please have a look and
judge yourself whats easier to grasp.

>
> Signed-off-by: Stuart Hayes <[email protected]>
> ---
>
> --- linux-3.15-rc3/kernel_orig/hrtimer.c 2014-04-29 13:10:58.087832963 -0400
> +++ linux-3.15-rc3/kernel/hrtimer.c 2014-04-29 15:42:49.581084736 -0400
> @@ -569,6 +569,15 @@ hrtimer_force_reprogram(struct hrtimer_c
>
> cpu_base->expires_next.tv64 = expires_next.tv64;
>
> + /*
> + * If a hang was detected in the last timer interrupt then we
> + * do not schedule a timer which is earlier than the expiry
> + * which we enforced in the hang detection. We want the system
> + * to make progress.

So you just blindly copied the comment from hrtimer_reprogram(). But
this is not only about scheduling an timer which is earlier.

Fixed it up, but please be more careful when you submit patches next
time.

Thanks,

tglx

Subject: [tip:timers/urgent] hrtimer: Prevent all reprogramming if hang detected

Commit-ID: 6c6c0d5a1c949d2e084706f9e5fb1fccc175b265
Gitweb: http://git.kernel.org/tip/6c6c0d5a1c949d2e084706f9e5fb1fccc175b265
Author: Stuart Hayes <[email protected]>
AuthorDate: Tue, 29 Apr 2014 17:55:02 -0500
Committer: Thomas Gleixner <[email protected]>
CommitDate: Wed, 30 Apr 2014 12:34:51 +0200

hrtimer: Prevent all reprogramming if hang detected

If the last hrtimer interrupt detected a hang it sets hang_detected=1
and programs the clock event device with a delay to let the system
make progress.

If hang_detected == 1, we prevent reprogramming of the clock event
device in hrtimer_reprogram() but not in hrtimer_force_reprogram().

This can lead to the following situation:

hrtimer_interrupt()
hang_detected = 1;
program ce device to Xms from now (hang delay)

We have two timers pending:
T1 expires 50ms from now
T2 expires 5s from now

Now T1 gets canceled, which causes hrtimer_force_reprogram() to be
invoked, which in turn programs the clock event device to T2 (5
seconds from now).

Any hrtimer_start after that will not reprogram the hardware due to
hang_detected still being set. So we effectivly block all timers until
the T2 event fires and cleans up the hang situation.

Add a check for hang_detected to hrtimer_force_reprogram() which
prevents the reprogramming of the hang delay in the hardware
timer. The subsequent hrtimer_interrupt will resolve all outstanding
issues.

[ tglx: Rewrote subject and changelog and fixed up the comment in
hrtimer_force_reprogram() ]

Signed-off-by: Stuart Hayes <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Cc: [email protected]
Signed-off-by: Thomas Gleixner <[email protected]>
---
kernel/hrtimer.c | 17 +++++++++++++++++
1 file changed, 17 insertions(+)

diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c
index d55092c..e3724fd 100644
--- a/kernel/hrtimer.c
+++ b/kernel/hrtimer.c
@@ -569,6 +569,23 @@ hrtimer_force_reprogram(struct hrtimer_cpu_base *cpu_base, int skip_equal)

cpu_base->expires_next.tv64 = expires_next.tv64;

+ /*
+ * If a hang was detected in the last timer interrupt then we
+ * leave the hang delay active in the hardware. We want the
+ * system to make progress. That also prevents the following
+ * scenario:
+ * T1 expires 50ms from now
+ * T2 expires 5s from now
+ *
+ * T1 is removed, so this code is called and would reprogram
+ * the hardware to 5s from now. Any hrtimer_start after that
+ * will not reprogram the hardware due to hang_detected being
+ * set. So we'd effectivly block all timers until the T2 event
+ * fires.
+ */
+ if (cpu_base->hang_detected)
+ return;
+
if (cpu_base->expires_next.tv64 != KTIME_MAX)
tick_program_event(cpu_base->expires_next, 1);
}