LinuxLists.cc - [PATCH] hrtimer:__run_hrtimer races with enqueue

2012-10-26 02:56:32

Subject: [PATCH] hrtimer:__run_hrtimer races with enqueue_hrtimer

From: Yanmin Zhang <[email protected]>

We hit a kernel panic at __run_hrtimer=>BUG_ON(timer->state != HRTIMER_STATE_CALLBACK).
<2>[ 10.226053, 3] kernel BUG at /home/android/xiaobing/ymz/r4/hardware/intel/linux-2.6/kernel/hrtimer.c:1228!
<0>[ 10.235682, 3] invalid opcode: 0000 [#1] PREEMPT SMP
<4>[ 10.240716, 3] Modules linked in: wl12xx_sdio wl12xx mac80211 cfg80211 compat btwilink rmi4(C) fmdrv_chr st_drv matrix(C)
<4>[ 10.251651, 3]
<4>[ 10.253391, 3] Pid: 68, comm: kworker/3:4 Tainted: G WC 3.0.34-140430-g2af538d #45 Intel Corporation CloverTrail/FFRD
<4>[ 10.264674, 3] EIP: 0060:[<c126c7ed>] EFLAGS: 00010002 CPU: 3
<4>[ 10.270411, 3] EIP is at __run_hrtimer+0xbd/0x240
<4>[ 10.275091, 3] EAX: 00000001 EBX: f67fb6b8 ECX: f57b4000 EDX: 00007301
<4>[ 10.281602, 3] ESI: c1d614c0 EDI: f67fb680 EBP: f57b5dd8 ESP: f57b5da8
<4>[ 10.288113, 3] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
<0>[ 10.293754, 3] Process kworker/3:4 (pid: 68, ti=f57b4000 task=f57aa730 task.ti=f57b4000)
<0>[ 10.301827, 3] Stack:
<4>[ 10.304083, 3] 00000000 c1afef40 f57b5dd8 c167a6e0 f67fb680 20b366e3 f67fb6b8 f57b5e14
<4>[ 10.312069, 3] 00000001 f67fb6b8 00000001 f67fb680 f57b5e28 c126d1e5 f57b5e08 c126f325
<4>[ 10.320055, 3] 00000000 86b9868d 00000001 86b9868d 00000001 00000003 ffffffff 7fffffff
<0>[ 10.328041, 3] Call Trace:
<4>[ 10.330742, 3] [<c167a6e0>] ? gburst_thread_stop.isra.25+0x40/0x40
<4>[ 10.336988, 3] [<c126d1e5>] hrtimer_interrupt+0xd5/0x250
<4>[ 10.342368, 3] [<c126f325>] ? sched_clock_cpu+0xe5/0x150
<4>[ 10.347753, 3] [<c1871d44>] smp_apic_timer_interrupt+0x54/0x88
<4>[ 10.353654, 3] [<c1496558>] ? trace_hardirqs_off_thunk+0xc/0x14
<4>[ 10.359643, 3] [<c186be9f>] apic_timer_interrupt+0x2f/0x34
<4>[ 10.365199, 3] [<c186e60f>] ? sub_preempt_count+0x1f/0x50
<4>[ 10.370669, 3] [<c149558a>] delay_tsc+0x3a/0xc0
<6>[ 10.371589, 0] android_work: did not send uevent (0 0 (null))
<4>[ 10.381171, 3] [<c14954e3>] __const_udelay+0x23/0x30
<4>[ 10.386207, 3] [<c16d043a>] mdfld_dsi_send_dcs+0x12a/0x5d0
<4>[ 10.391760, 3] [<c186b6c6>] ? _raw_spin_unlock_irqrestore+0x26/0x50
<4>[ 10.398101, 3] [<c16af431>] ? ospm_power_using_hw_begin+0xa1/0x350
<4>[ 10.399053, 3] [<c186a49f>] ? __mutex_lock_slowpath+0x1ff/0x2f0
<4>[ 10.399069, 3] [<c16bd59e>] mdfld_dbi_update_panel+0x21e/0x2d0
<4>[ 10.399085, 3] [<c16b1ae1>] mdfld_te_handler_work+0x71/0x80
<4>[ 10.399099, 3] [<c12642be>] process_one_work+0xfe/0x3f0
<4>[ 10.399114, 3] [<c16b1a70>] ? mdfld_async_flip_te_handler+0xf0/0xf0

Basically, __run_hrtimer has a race with enqueue_hrtimer. When __run_hrtimer calls
the timer callback fn, another thread might call enqueue_hrtimer or hrtimer_start
to requeue it, and the timer->state is equal to HRTIMER_STATE_CALLBACK|HRTIMER_STATE_ENQUEUED,
which causes the BUG_ON(timer->state != HRTIMER_STATE_CALLBACK) checking fails.

The patch fixes it by checking only bit HRTIMER_STATE_CALLBACK.

Signed-off-by: Yanmin Zhang <[email protected]>
Reviewed-by: He, Bo <[email protected]>
---
kernel/hrtimer.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c
index 6db7a5e..6280184 100644
--- a/kernel/hrtimer.c
+++ b/kernel/hrtimer.c
@@ -1235,7 +1235,7 @@ static void __run_hrtimer(struct hrtimer *timer, ktime_t *now)
* hrtimer_start_range_ns() or in hrtimer_interrupt()
*/
if (restart != HRTIMER_NORESTART) {
- BUG_ON(timer->state != HRTIMER_STATE_CALLBACK);
+ BUG_ON(!(timer->state & HRTIMER_STATE_CALLBACK));
enqueue_hrtimer(timer, base);
}

--
1.7.6

2012-10-26 08:51:13

by Yanmin Zhang

[permalink] [raw]

Subject: Re: [PATCH] hrtimer:__run_hrtimer races with enqueue_hrtimer

On Fri, 2012-10-26 at 10:51 +0800, he, bo wrote:
> From: Yanmin Zhang <[email protected]>
>
> We hit a kernel panic at __run_hrtimer=>BUG_ON(timer->state != HRTIMER_STATE_CALLBACK).
> <2>[ 10.226053, 3] kernel BUG at /home/android/xiaobing/ymz/r4/hardware/intel/linux-2.6/kernel/hrtimer.c:1228!
> <0>[ 10.235682, 3] invalid opcode: 0000 [#1] PREEMPT SMP
> <4>[ 10.240716, 3] Modules linked in: wl12xx_sdio wl12xx mac80211 cfg80211 compat btwilink rmi4(C) fmdrv_chr st_drv matrix(C)
> <4>[ 10.251651, 3]
> <4>[ 10.253391, 3] Pid: 68, comm: kworker/3:4 Tainted: G WC 3.0.34-140430-g2af538d #45 Intel Corporation CloverTrail/FFRD
> <4>[ 10.264674, 3] EIP: 0060:[<c126c7ed>] EFLAGS: 00010002 CPU: 3
> <4>[ 10.270411, 3] EIP is at __run_hrtimer+0xbd/0x240
> <4>[ 10.275091, 3] EAX: 00000001 EBX: f67fb6b8 ECX: f57b4000 EDX: 00007301
> <4>[ 10.281602, 3] ESI: c1d614c0 EDI: f67fb680 EBP: f57b5dd8 ESP: f57b5da8
> <4>[ 10.288113, 3] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
> <0>[ 10.293754, 3] Process kworker/3:4 (pid: 68, ti=f57b4000 task=f57aa730 task.ti=f57b4000)
> <0>[ 10.301827, 3] Stack:
> <4>[ 10.304083, 3] 00000000 c1afef40 f57b5dd8 c167a6e0 f67fb680 20b366e3 f67fb6b8 f57b5e14
> <4>[ 10.312069, 3] 00000001 f67fb6b8 00000001 f67fb680 f57b5e28 c126d1e5 f57b5e08 c126f325
> <4>[ 10.320055, 3] 00000000 86b9868d 00000001 86b9868d 00000001 00000003 ffffffff 7fffffff
> <0>[ 10.328041, 3] Call Trace:
> <4>[ 10.330742, 3] [<c167a6e0>] ? gburst_thread_stop.isra.25+0x40/0x40
> <4>[ 10.336988, 3] [<c126d1e5>] hrtimer_interrupt+0xd5/0x250
> <4>[ 10.342368, 3] [<c126f325>] ? sched_clock_cpu+0xe5/0x150
> <4>[ 10.347753, 3] [<c1871d44>] smp_apic_timer_interrupt+0x54/0x88
> <4>[ 10.353654, 3] [<c1496558>] ? trace_hardirqs_off_thunk+0xc/0x14
> <4>[ 10.359643, 3] [<c186be9f>] apic_timer_interrupt+0x2f/0x34
> <4>[ 10.365199, 3] [<c186e60f>] ? sub_preempt_count+0x1f/0x50
> <4>[ 10.370669, 3] [<c149558a>] delay_tsc+0x3a/0xc0
> <6>[ 10.371589, 0] android_work: did not send uevent (0 0 (null))
> <4>[ 10.381171, 3] [<c14954e3>] __const_udelay+0x23/0x30
> <4>[ 10.386207, 3] [<c16d043a>] mdfld_dsi_send_dcs+0x12a/0x5d0
> <4>[ 10.391760, 3] [<c186b6c6>] ? _raw_spin_unlock_irqrestore+0x26/0x50
> <4>[ 10.398101, 3] [<c16af431>] ? ospm_power_using_hw_begin+0xa1/0x350
> <4>[ 10.399053, 3] [<c186a49f>] ? __mutex_lock_slowpath+0x1ff/0x2f0
> <4>[ 10.399069, 3] [<c16bd59e>] mdfld_dbi_update_panel+0x21e/0x2d0
> <4>[ 10.399085, 3] [<c16b1ae1>] mdfld_te_handler_work+0x71/0x80
> <4>[ 10.399099, 3] [<c12642be>] process_one_work+0xfe/0x3f0
> <4>[ 10.399114, 3] [<c16b1a70>] ? mdfld_async_flip_te_handler+0xf0/0xf0
>
> Basically, __run_hrtimer has a race with enqueue_hrtimer. When __run_hrtimer calls
> the timer callback fn, another thread might call enqueue_hrtimer or hrtimer_start
> to requeue it, and the timer->state is equal to HRTIMER_STATE_CALLBACK|HRTIMER_STATE_ENQUEUED,
> which causes the BUG_ON(timer->state != HRTIMER_STATE_CALLBACK) checking fails.
>
> The patch fixes it by checking only bit HRTIMER_STATE_CALLBACK.
The patch has an issue that enqueue_hrtimer doesn't check if the timer is queued.
I will send a new patch.

2012-10-26 09:55:30

by Thomas Gleixner

[permalink] [raw]

Subject: Re: [PATCH] hrtimer:__run_hrtimer races with enqueue_hrtimer

On Fri, 26 Oct 2012, he, bo wrote:
> From: Yanmin Zhang <[email protected]>
>
> We hit a kernel panic at __run_hrtimer=>BUG_ON(timer->state != HRTIMER_STATE_CALLBACK).
> <2>[ 10.226053, 3] kernel BUG at /home/android/xiaobing/ymz/r4/hardware/intel/linux-2.6/kernel/hrtimer.c:1228!
>
> Basically, __run_hrtimer has a race with enqueue_hrtimer. When
> __run_hrtimer calls the timer callback fn, another thread might call
> enqueue_hrtimer or hrtimer_start to requeue it, and the timer->state
> is equal to HRTIMER_STATE_CALLBACK|HRTIMER_STATE_ENQUEUED, which
> causes the BUG_ON(timer->state != HRTIMER_STATE_CALLBACK) checking
> fails.
>
> The patch fixes it by checking only bit HRTIMER_STATE_CALLBACK.

This does not fix it. It makes it worse.

> Signed-off-by: Yanmin Zhang <[email protected]>
> Reviewed-by: He, Bo <[email protected]>
> ---
> kernel/hrtimer.c | 2 +-
> 1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c
> index 6db7a5e..6280184 100644
> --- a/kernel/hrtimer.c
> +++ b/kernel/hrtimer.c
> @@ -1235,7 +1235,7 @@ static void __run_hrtimer(struct hrtimer *timer, ktime_t *now)
> * hrtimer_start_range_ns() or in hrtimer_interrupt()
> */
> if (restart != HRTIMER_NORESTART) {
> - BUG_ON(timer->state != HRTIMER_STATE_CALLBACK);
> + BUG_ON(!(timer->state & HRTIMER_STATE_CALLBACK));
> enqueue_hrtimer(timer, base);
> }

What you are allowing here is enqueueing an already enqueued timer
again. I don't know why this does not explode elsewhere, but that's
probably pure luck. It's not allowed to double enqueue a timer.

So no, this is not a solution. The problem is not in the core timer
code, the problem is in the code which uses that timer.

Your code is returning HRTIMER_RESTART from the timer callback and at
the same time it starts the timer from some other context. That's what
needs to be fixed.

Thanks,

tglx

2012-10-26 11:39:01

by Zhang, Yanmin

[permalink] [raw]

Subject: RE: [PATCH] hrtimer:__run_hrtimer races with enqueue_hrtimer

>-----Original Message-----
>From: Thomas Gleixner [mailto:[email protected]]
>Sent: Friday, October 26, 2012 5:55 PM
>To: He, Bo
>Cc: [email protected]; Peter Zijlstra; Ingo Molnar;
>[email protected]; Zhang, Yanmin
>Subject: Re: [PATCH] hrtimer:__run_hrtimer races with enqueue_hrtimer
>
>On Fri, 26 Oct 2012, he, bo wrote:
>> From: Yanmin Zhang <[email protected]>
>>
>> We hit a kernel panic at __run_hrtimer=>BUG_ON(timer->state !=
>HRTIMER_STATE_CALLBACK).
>> <2>[ 10.226053, 3] kernel BUG at
>/home/android/xiaobing/ymz/r4/hardware/intel/linux-2.6/kernel/hrtimer.c:1228
>!
>>
>> Basically, __run_hrtimer has a race with enqueue_hrtimer. When
>> __run_hrtimer calls the timer callback fn, another thread might call
>> enqueue_hrtimer or hrtimer_start to requeue it, and the timer->state
>> is equal to HRTIMER_STATE_CALLBACK|HRTIMER_STATE_ENQUEUED, which
>> causes the BUG_ON(timer->state != HRTIMER_STATE_CALLBACK) checking
>> fails.
>>
>> The patch fixes it by checking only bit HRTIMER_STATE_CALLBACK.
>
>This does not fix it. It makes it worse.
Thanks for your kind comments. We found that and sent V2 at https://lkml.org/lkml/2012/10/26/172.

>
>> Signed-off-by: Yanmin Zhang <[email protected]>
>> Reviewed-by: He, Bo <[email protected]>
>> ---
>> kernel/hrtimer.c | 2 +-
>> 1 files changed, 1 insertions(+), 1 deletions(-)
>>
>> diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c
>> index 6db7a5e..6280184 100644
>> --- a/kernel/hrtimer.c
>> +++ b/kernel/hrtimer.c
>> @@ -1235,7 +1235,7 @@ static void __run_hrtimer(struct hrtimer *timer,
>ktime_t *now)
>> * hrtimer_start_range_ns() or in hrtimer_interrupt()
>> */
>> if (restart != HRTIMER_NORESTART) {
>> - BUG_ON(timer->state != HRTIMER_STATE_CALLBACK);
>> + BUG_ON(!(timer->state & HRTIMER_STATE_CALLBACK));
>> enqueue_hrtimer(timer, base);
>> }
>
>What you are allowing here is enqueueing an already enqueued timer
>again. I don't know why this does not explode elsewhere, but that's
>probably pure luck. It's not allowed to double enqueue a timer.
You are right.

>
>So no, this is not a solution. The problem is not in the core timer
>code, the problem is in the code which uses that timer.
>
>Your code is returning HRTIMER_RESTART from the timer callback and at
>the same time it starts the timer from some other context. That's what
>needs to be fixed.
The timer user should fix it. But could we also change hrtimer to make it more stable?
At least, instead of panic, could we print some information and go ahead to let kernel
continue?

Thanks,
Yanmin

2012-10-26 12:09:37

by Thomas Gleixner

[permalink] [raw]

Subject: RE: [PATCH] hrtimer:__run_hrtimer races with enqueue_hrtimer

On Fri, 26 Oct 2012, Zhang, Yanmin wrote:
> >From: Thomas Gleixner [mailto:[email protected]]
> >Your code is returning HRTIMER_RESTART from the timer callback and at
> >the same time it starts the timer from some other context. That's what
> >needs to be fixed.
>
> The timer user should fix it. But could we also change hrtimer to
> make it more stable? At least, instead of panic, could we print
> some information and go ahead to let kernel continue?

That's unfortunately not possible. At this point the timer might be
already corrupted.

CPU0 CPU 1

timer expires
callback runs
hrtimer_start()
expiry value is set
hrtimer_enqueue()

hrtimer_forward()
expiry value is set

return HRTIMER_RESTART

So while we can prevent the double enqueue, we have no way to deal
with the corrupted expiry value and the inconsistent RB tree. We can
give better debugging information, but we can't pretend that
everything is nice and cool.

If we really want to do something about it which keeps the machine
alive, then we need to

1) dequeue the timer
2) run a consistency check over the rbtree
3) enqueue the timer

Not sure if that's worth the trouble.

Thanks,

tglx

2012-10-29 17:30:25

by Yanmin Zhang

[permalink] [raw]

Subject: RE: [PATCH] hrtimer:__run_hrtimer races with enqueue_hrtimer

On Fri, 2012-10-26 at 14:09 +0200, Thomas Gleixner wrote:
> On Fri, 26 Oct 2012, Zhang, Yanmin wrote:
> > >From: Thomas Gleixner [mailto:[email protected]]
> > >Your code is returning HRTIMER_RESTART from the timer callback and at
> > >the same time it starts the timer from some other context. That's what
> > >needs to be fixed.
> >
> > The timer user should fix it. But could we also change hrtimer to
> > make it more stable? At least, instead of panic, could we print
> > some information and go ahead to let kernel continue?
>
> That's unfortunately not possible. At this point the timer might be
> already corrupted.
>
> CPU0 CPU 1
>
> timer expires
> callback runs
> hrtimer_start()
> expiry value is set
> hrtimer_enqueue()
>
> hrtimer_forward()
> expiry value is set
>
> return HRTIMER_RESTART
>
> So while we can prevent the double enqueue, we have no way to deal
> with the corrupted expiry value and the inconsistent RB tree. We can
> give better debugging information, but we can't pretend that
> everything is nice and cool.
>
> If we really want to do something about it which keeps the machine
> alive, then we need to
>
> 1) dequeue the timer
> 2) run a consistency check over the rbtree
> 3) enqueue the timer
>
> Not sure if that's worth the trouble.
I strongly agree with you. Such checking might cause more trouble than
what we could benefit from it.

Could we add more dumping, especially about the timer address and callback
function? Next time, developers could find the bad timer quickly.