LinuxLists.cc - workqueue panic in 3.4 kernel

2013-03-05 07:31:48

Subject: workqueue panic in 3.4 kernel

Hi Tejun,

We met one panic issue related workqueue based over 3.4.5 Linux kernel.

Panic log as:
[153587.035369] Unable to handle kernel NULL pointer dereference at
virtual address 00000004
[153587.043731] pgd = e1e74000
[153587.046691] [00000004] *pgd=00000000
[153587.050567] Internal error: Oops: 5 [#1] PREEMPT SMP ARM
[153587.056152] Modules linked in: hwmap(O) cidatattydev(O) gs_diag(O)
diag(O) gs_modem(O) ccinetdev(O) cci_datastub(O) citty(O) msocketk(O)
smsmdtv seh(O) cploaddev(O) blcr(O) blcr_imports(O) geu(O) galcore(O)
[153587.076416] CPU: 0 Tainted: G O (3.4.5+ #1)
[153587.082092] PC is at delayed_work_timer_fn+0x1c/0x28
[153587.087249] LR is at delayed_work_timer_fn+0x18/0x28
[153587.092468] pc : [<c014c7bc>] lr : [<c014c7b8>] psr: 20000113
[153587.092468] sp : e1e3bf00 ip : 00000001 fp : 0000000a
[153587.104400] r10: 00000001 r9 : 578914dc r8 : c014c7a0
[153587.109832] r7 : 00000101 r6 : bf03d554 r5 : 00000000 r4 : bf03d544
[153587.116638] r3 : 00000101 r2 : bf03d544 r1 : c1a0b27c r0 : 00000000
[153587.123352] Flags: nzCv IRQs on FIQs on Mode SVC_32 ISA ARM
Segment user
[153587.130737] Control: 10c53c7d Table: 21e7404a DAC: 00000015
[153587.611328] [<c014c7bc>] (delayed_work_timer_fn+0x1c/0x28) from
[<c014185c>] (run_timer_softirq+0x260/0x384)
[153587.621368] [<c014185c>] (run_timer_softirq+0x260/0x384) from
[<c013abfc>] (__do_softirq+0x11c/0x244)
[153587.630828] [<c013abfc>] (__do_softirq+0x11c/0x244) from
[<c013b144>] (irq_exit+0x44/0x98)
[153587.639373] [<c013b144>] (irq_exit+0x44/0x98) from [<c0113ca0>]
(handle_IRQ+0x7c/0xb8)
[153587.647583] [<c0113ca0>] (handle_IRQ+0x7c/0xb8) from [<c01084ac>]
(gic_handle_irq+0x34/0x58)
[153587.656188] [<c01084ac>] (gic_handle_irq+0x34/0x58) from
[<c0112b3c>] (__irq_usr+0x3c/0x60)

With checking memory, we find work->data becomes 0x300, when it try
to call get_work_cwq
in delayed_work_timer_fn. Thus cwq becomes NULL before calls __queue_work.
So it is reasonable kernel get panic when it try to access wq with cwq->wq.

To fix it, we try to backport below patches:
commit 60c057bca22285efefbba033624763a778f243bf
Author: Lai Jiangshan <[email protected]>
Date: Wed Feb 6 18:04:53 2013 -0800

workqueue: add delayed_work->wq to simplify reentrancy handling

commit 1265057fa02c7bed3b6d9ddc8a2048065a370364
Author: Tejun Heo <[email protected]>
Date: Wed Aug 8 09:38:42 2012 -0700

workqueue: fix CPU binding of flush_delayed_work[_sync]()

And add below change to make sure __cancel_work_timer cannot preempt
between run_timer_softirq and delayed_work_timer_fn.
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index bf4888c..0e9f77c 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -2627,7 +2627,7 @@ static bool __cancel_work_timer(struct work_struct *work,
ret = (timer && likely(del_timer(timer)));
if (!ret)
ret = try_to_grab_pending(work);
- wait_on_work(work);
+ flush_work(work);
} while (unlikely(ret < 0));

clear_work_data(work);

Do you think this fix is enough? And add flush_work directly in
__cancel_work_timer is ok for
the fix?

Thanks,
Lei

2013-03-05 16:32:35

by Tejun Heo

[permalink] [raw]

Subject: Re: workqueue panic in 3.4 kernel

Hello,

On Tue, Mar 05, 2013 at 03:31:45PM +0800, Lei Wen wrote:
> With checking memory, we find work->data becomes 0x300, when it try
> to call get_work_cwq

Why would that become 0x300? Who's writing to that memory? Nobody
should be.

> in delayed_work_timer_fn. Thus cwq becomes NULL before calls __queue_work.
> So it is reasonable kernel get panic when it try to access wq with cwq->wq.
>
> To fix it, we try to backport below patches:
> commit 60c057bca22285efefbba033624763a778f243bf
> Author: Lai Jiangshan <[email protected]>
> Date: Wed Feb 6 18:04:53 2013 -0800
>
> workqueue: add delayed_work->wq to simplify reentrancy handling
>
> commit 1265057fa02c7bed3b6d9ddc8a2048065a370364
> Author: Tejun Heo <[email protected]>
> Date: Wed Aug 8 09:38:42 2012 -0700
>
> workqueue: fix CPU binding of flush_delayed_work[_sync]()

Neither should affect the problem you described above. It *could*
make the problem go away just because it would stop using wq->data to
record cwq if the corruption was contained to that field but that
isn't a proper fix and the underlying problem could easily cause other
issues.

> And add below change to make sure __cancel_work_timer cannot preempt
> between run_timer_softirq and delayed_work_timer_fn.
> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
> index bf4888c..0e9f77c 100644
> --- a/kernel/workqueue.c
> +++ b/kernel/workqueue.c
> @@ -2627,7 +2627,7 @@ static bool __cancel_work_timer(struct work_struct *work,
> ret = (timer && likely(del_timer(timer)));
> if (!ret)
> ret = try_to_grab_pending(work);
> - wait_on_work(work);
> + flush_work(work);
> } while (unlikely(ret < 0));
>
> clear_work_data(work);
>
> Do you think this fix is enough? And add flush_work directly in
> __cancel_work_timer is ok for
> the fix?

Maybe I'm missing something but it looks like the root cause hasn't
been diagnosed and you're piling up band-aids in workqueue code. You
might get away with it but could also be making the problem just more
obscure and difficult to debug (reproducible bugs are happy bugs).

I'd suggest finding out who owns the delayed_work item and examining
why the delayed_work is getting corrupted in the first place.

Thanks.

--
tejun

2013-03-06 14:39:20

by Lei Wen

[permalink] [raw]

Subject: Re: workqueue panic in 3.4 kernel

Hi Tejun

On Wed, Mar 6, 2013 at 12:32 AM, Tejun Heo <[email protected]> wrote:
> Hello,
>
> On Tue, Mar 05, 2013 at 03:31:45PM +0800, Lei Wen wrote:
>> With checking memory, we find work->data becomes 0x300, when it try
>> to call get_work_cwq
>
> Why would that become 0x300? Who's writing to that memory? Nobody
> should be.

We find a race condition as below:
CPU0 CPU1
timer interrupt happen
__run_timers
__run_timers::spin_lock_irq(&base->lock)
__run_timers::spin_unlock_irq(&base->lock)

__cancel_work_timer

__cancel_work_timer::del_timer

__cancel_work_timer::wait_on_work

__cancel_work_timer::clear_work_data
__run_timers::call_timer_fn(timer, fn, data);
delayed_work_timer_fn::get_work_cwq
__run_timers::spin_lock_irq(&base->lock)

It is possible for __cancel_work_timer to be run over cpu1 __BEFORE__
cpu0 is ready to
run the timer callback, which is delayed_work_timer_fn in our case.

Although __cancel_work_timer would call wait_on_work to wait the
already inserted work
complete, but for the work is not queued yet for its calback is not
being executed, so
the result should be wait_on_work directly return, and clear_work_data
clears work->data.
Thus when delayed_work_timer_fn is called, it would see work->data as 0x300.

Do you think it is possible for this kind of sequence?

>
>> in delayed_work_timer_fn. Thus cwq becomes NULL before calls __queue_work.
>> So it is reasonable kernel get panic when it try to access wq with cwq->wq.
>>
>> To fix it, we try to backport below patches:
>> commit 60c057bca22285efefbba033624763a778f243bf
>> Author: Lai Jiangshan <[email protected]>
>> Date: Wed Feb 6 18:04:53 2013 -0800
>>
>> workqueue: add delayed_work->wq to simplify reentrancy handling
>>
>> commit 1265057fa02c7bed3b6d9ddc8a2048065a370364
>> Author: Tejun Heo <[email protected]>
>> Date: Wed Aug 8 09:38:42 2012 -0700
>>
>> workqueue: fix CPU binding of flush_delayed_work[_sync]()
>
> Neither should affect the problem you described above. It *could*
> make the problem go away just because it would stop using wq->data to
> record cwq if the corruption was contained to that field but that
> isn't a proper fix and the underlying problem could easily cause other
> issues.
>
>> And add below change to make sure __cancel_work_timer cannot preempt
>> between run_timer_softirq and delayed_work_timer_fn.
>> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
>> index bf4888c..0e9f77c 100644
>> --- a/kernel/workqueue.c
>> +++ b/kernel/workqueue.c
>> @@ -2627,7 +2627,7 @@ static bool __cancel_work_timer(struct work_struct *work,
>> ret = (timer && likely(del_timer(timer)));
>> if (!ret)
>> ret = try_to_grab_pending(work);
>> - wait_on_work(work);
>> + flush_work(work);
>> } while (unlikely(ret < 0));
>>
>> clear_work_data(work);
>>
>> Do you think this fix is enough? And add flush_work directly in
>> __cancel_work_timer is ok for
>> the fix?
>
> Maybe I'm missing something but it looks like the root cause hasn't
> been diagnosed and you're piling up band-aids in workqueue code. You
> might get away with it but could also be making the problem just more
> obscure and difficult to debug (reproducible bugs are happy bugs).
>
> I'd suggest finding out who owns the delayed_work item and examining
> why the delayed_work is getting corrupted in the first place.
>
> Thanks.
>
> --
> tejun

Thanks,
Lei

2013-03-06 19:14:14

by Tejun Heo

[permalink] [raw]

Subject: Re: workqueue panic in 3.4 kernel

Hello, Lei.

On Wed, Mar 06, 2013 at 10:39:15PM +0800, Lei Wen wrote:
> We find a race condition as below:
> CPU0 CPU1
> timer interrupt happen
> __run_timers
> __run_timers::spin_lock_irq(&base->lock)
> __run_timers::spin_unlock_irq(&base->lock)
>
> __cancel_work_timer
>
> __cancel_work_timer::del_timer
>
> __cancel_work_timer::wait_on_work
>
> __cancel_work_timer::clear_work_data
> __run_timers::call_timer_fn(timer, fn, data);
> delayed_work_timer_fn::get_work_cwq
> __run_timers::spin_lock_irq(&base->lock)
>
> It is possible for __cancel_work_timer to be run over cpu1 __BEFORE__
> cpu0 is ready to
> run the timer callback, which is delayed_work_timer_fn in our case.

If del_timer() happens after the timer starts running, del_timer()
would return NULL and try_to_grab_pending() will be called which will
return >=0 iff if successfully steals the PENDING bit (ie. it's the
sole owner of the work item). If del_timer() happens before the timer
starts running, the timer function would never run.

clear_work_data() happens iff the work item is confirmed to be idle.
At this point, I'm pretty skeptical this is a bug in workqueue itself
and strongly suggest looking at the crashing workqueue user.

Thanks.

--
tejun

2013-03-07 01:15:04

by Lei Wen

[permalink] [raw]

Subject: Re: workqueue panic in 3.4 kernel

Hi Tejun,

On Thu, Mar 7, 2013 at 3:14 AM, Tejun Heo <[email protected]> wrote:
> Hello, Lei.
>
> On Wed, Mar 06, 2013 at 10:39:15PM +0800, Lei Wen wrote:
>> We find a race condition as below:
>> CPU0 CPU1
>> timer interrupt happen
>> __run_timers
>> __run_timers::spin_lock_irq(&base->lock)
>> __run_timers::spin_unlock_irq(&base->lock)
>>
>> __cancel_work_timer
>>
>> __cancel_work_timer::del_timer
>>
>> __cancel_work_timer::wait_on_work
>>
>> __cancel_work_timer::clear_work_data
>> __run_timers::call_timer_fn(timer, fn, data);
>> delayed_work_timer_fn::get_work_cwq
>> __run_timers::spin_lock_irq(&base->lock)
>>
>> It is possible for __cancel_work_timer to be run over cpu1 __BEFORE__
>> cpu0 is ready to
>> run the timer callback, which is delayed_work_timer_fn in our case.
>
> If del_timer() happens after the timer starts running, del_timer()
> would return NULL and try_to_grab_pending() will be called which will
> return >=0 iff if successfully steals the PENDING bit (ie. it's the
> sole owner of the work item). If del_timer() happens before the timer
> starts running, the timer function would never run.

If del_timer() happen before __run_timers() is called, while timer irq
already happen,
would it return 1 for the timer is still not detached in __run_timers()?
If it is possible, then we would call try_to_grab_pending(), so that
work->data would
be cleared in this way.

>
> clear_work_data() happens iff the work item is confirmed to be idle.
> At this point, I'm pretty skeptical this is a bug in workqueue itself
> and strongly suggest looking at the crashing workqueue user.

Also I am not very familiar with workqueue mechanism, how many place
in kernel would
clear the work->data beside the clear_work_data()?

>From the memory, I cannot find any hint for work structure being destroyed.
So the only possible seems to me is the work->data be set by someone on purpose.

crash> struct delayed_work 0xbf03d544 -x
struct delayed_work {
work = {
data = {
counter = 0x300
},
entry = {
next = 0xbf03d548,
prev = 0xbf03d548
},
func = 0xbf014b00
},
timer = {
entry = {
next = 0x0,
prev = 0x200200
},
expires = 0x12b638b,
base = 0xc0844e01,
function = 0xc014c7a0 <delayed_work_timer_fn>,
data = 0xbf03d544,
slack = 0xffffffff,
start_pid = 0xffffffff,
start_site = 0x0,
start_comm = "\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000"
}
}

Thanks,
Lei

2013-03-07 15:22:44

by Lei Wen

[permalink] [raw]

Subject: Re: workqueue panic in 3.4 kernel

Tejun,

On Thu, Mar 7, 2013 at 9:15 AM, Lei Wen <[email protected]> wrote:
> Hi Tejun,
>
> On Thu, Mar 7, 2013 at 3:14 AM, Tejun Heo <[email protected]> wrote:
>> Hello, Lei.
>>
>> On Wed, Mar 06, 2013 at 10:39:15PM +0800, Lei Wen wrote:
>>> We find a race condition as below:
>>> CPU0 CPU1
>>> timer interrupt happen
>>> __run_timers
>>> __run_timers::spin_lock_irq(&base->lock)
>>> __run_timers::spin_unlock_irq(&base->lock)
>>>
>>> __cancel_work_timer
>>>
>>> __cancel_work_timer::del_timer
>>>
>>> __cancel_work_timer::wait_on_work
>>>
>>> __cancel_work_timer::clear_work_data
>>> __run_timers::call_timer_fn(timer, fn, data);
>>> delayed_work_timer_fn::get_work_cwq
>>> __run_timers::spin_lock_irq(&base->lock)
>>>
>>> It is possible for __cancel_work_timer to be run over cpu1 __BEFORE__
>>> cpu0 is ready to
>>> run the timer callback, which is delayed_work_timer_fn in our case.
>>
>> If del_timer() happens after the timer starts running, del_timer()
>> would return NULL and try_to_grab_pending() will be called which will
>> return >=0 iff if successfully steals the PENDING bit (ie. it's the
>> sole owner of the work item). If del_timer() happens before the timer
>> starts running, the timer function would never run.
>
> If del_timer() happen before __run_timers() is called, while timer irq
> already happen,
> would it return 1 for the timer is still not detached in __run_timers()?
> If it is possible, then we would call try_to_grab_pending(), so that
> work->data would
> be cleared in this way.
>
>>
>> clear_work_data() happens iff the work item is confirmed to be idle.
>> At this point, I'm pretty skeptical this is a bug in workqueue itself
>> and strongly suggest looking at the crashing workqueue user.
>
> Also I am not very familiar with workqueue mechanism, how many place
> in kernel would
> clear the work->data beside the clear_work_data()?
>
> From the memory, I cannot find any hint for work structure being destroyed.
> So the only possible seems to me is the work->data be set by someone on purpose.
>
> crash> struct delayed_work 0xbf03d544 -x
> struct delayed_work {
> work = {
> data = {
> counter = 0x300
> },
> entry = {
> next = 0xbf03d548,
> prev = 0xbf03d548
> },
> func = 0xbf014b00
> },
> timer = {
> entry = {
> next = 0x0,
> prev = 0x200200
> },
> expires = 0x12b638b,
> base = 0xc0844e01,
> function = 0xc014c7a0 <delayed_work_timer_fn>,
> data = 0xbf03d544,
> slack = 0xffffffff,
> start_pid = 0xffffffff,
> start_site = 0x0,
> start_comm = "\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000"
> }
> }

I captured a trace log, which shows my previous suspicion is true:
__cancel_work_timer is called before delayed_work_timer_fn, so that
work->data is cleared.

And when __cancel_work_timer is called, the timer is still pending,
so del_timer would return 1, thus no try_to_grab_pending would be called.

But it is very strange that in __run_timers, it still get the same timer.
Then its callback, delayed_work_timer_fn, would be called, which cause
the issue.

The detach_timer in __cancel_work_timer should already move the timer
from all list, am I right?
Could it happen for the timer_list be queued twice, like queue over two cpu?
If not, how could it happen?

Thanks,
Lei

2013-03-07 15:49:16

by Tejun Heo

[permalink] [raw]

Subject: Re: workqueue panic in 3.4 kernel

(cc'ing Thomas, hi!)

Hello,

Lei is seeing a problem where a delayed_work item gets corrupted (its
work->data gets cleared while still queued on the timer). He thinks
what's going on is that del_timer() is returning 1 but the timer
function still gets executed.

On Thu, Mar 07, 2013 at 11:22:40PM +0800, Lei Wen wrote:
> >> If del_timer() happens after the timer starts running, del_timer()
> >> would return NULL and try_to_grab_pending() will be called which will
> >> return >=0 iff if successfully steals the PENDING bit (ie. it's the
> >> sole owner of the work item). If del_timer() happens before the timer
> >> starts running, the timer function would never run.
> >
> > If del_timer() happen before __run_timers() is called, while timer irq
> > already happen,
> > would it return 1 for the timer is still not detached in __run_timers()?
> > If it is possible, then we would call try_to_grab_pending(), so that
> > work->data would
> > be cleared in this way.
> >
> >>
> >> clear_work_data() happens iff the work item is confirmed to be idle.
> >> At this point, I'm pretty skeptical this is a bug in workqueue itself
> >> and strongly suggest looking at the crashing workqueue user.
> >
> > Also I am not very familiar with workqueue mechanism, how many place
> > in kernel would
> > clear the work->data beside the clear_work_data()?

Work item initialization and clear_work_data() are the only places and
from the looks of it you definitely seem to be hitting
clear_work_data().

> > From the memory, I cannot find any hint for work structure being destroyed.
> > So the only possible seems to me is the work->data be set by someone on purpose.
> >
> > crash> struct delayed_work 0xbf03d544 -x
> > struct delayed_work {
> > work = {
> > data = {
> > counter = 0x300
> > },
> > entry = {
> > next = 0xbf03d548,
> > prev = 0xbf03d548
> > },
> > func = 0xbf014b00
> > },
> > timer = {
> > entry = {
> > next = 0x0,
> > prev = 0x200200
> > },
> > expires = 0x12b638b,
> > base = 0xc0844e01,
> > function = 0xc014c7a0 <delayed_work_timer_fn>,
> > data = 0xbf03d544,
> > slack = 0xffffffff,
> > start_pid = 0xffffffff,
> > start_site = 0x0,
> > start_comm = "\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000"
> > }
> > }
>
> I captured a trace log, which shows my previous suspicion is true:
> __cancel_work_timer is called before delayed_work_timer_fn, so that
> work->data is cleared.
>
> And when __cancel_work_timer is called, the timer is still pending,
> so del_timer would return 1, thus no try_to_grab_pending would be called.
>
> But it is very strange that in __run_timers, it still get the same timer.
> Then its callback, delayed_work_timer_fn, would be called, which cause
> the issue.
>
> The detach_timer in __cancel_work_timer should already move the timer
> from all list, am I right?

Yes.

> Could it happen for the timer_list be queued twice, like queue over two cpu?
> If not, how could it happen?

I can't see how something like that would happen and still find it
quite unlikely this would be a generic problem in either timer or
workqueue given how widely those are used and your case is the only
similar case that came up till now (and 3.4 is a long time ago).
Thomas, any ideas?

Thanks.

--
tejun

2013-03-07 16:07:34

by Thomas Gleixner

[permalink] [raw]

Subject: Re: workqueue panic in 3.4 kernel

On Thu, 7 Mar 2013, Tejun Heo wrote:
> I can't see how something like that would happen and still find it
> quite unlikely this would be a generic problem in either timer or
> workqueue given how widely those are used and your case is the only
> similar case that came up till now (and 3.4 is a long time ago).
> Thomas, any ideas?

debugobjects might give a hint.

Thanks,

tglx

2013-03-12 05:12:23

by Tejun Heo

[permalink] [raw]

Subject: Re: workqueue panic in 3.4 kernel

Hello,

On Tue, Mar 12, 2013 at 01:08:15PM +0800, Lei Wen wrote:
> diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h
> index 8afab27..425d5a2 100644
> --- a/include/linux/workqueue.h
> +++ b/include/linux/workqueue.h
> @@ -189,12 +189,16 @@ static inline unsigned int work_static(struct
> work_struct *work) { return 0; }
> * NOTE! No point in using "atomic_long_set()": using a direct
> * assignment of the work data initializer allows the compiler
> * to generate better code.
> + *
> + * We take the assumption that work should not be inited if it already
> + * hold the pending bit, or bug would be reported.
> */
> #ifdef CONFIG_LOCKDEP
> #define __INIT_WORK(_work, _func, _onstack) \
> do { \
> static struct lock_class_key __key; \
> \
> + BUG_ON(work_pending(_work)); \

You're initializing random piece of memory which may contain any
garbage and triggering BUG if some bit is set on it. No, you can't do
that. debugobj is the right tool for debugging object lifetime issues
and is already supported.

Thanks.

--
tejun

2013-03-12 05:18:05

by Lei Wen

[permalink] [raw]

Subject: Re: workqueue panic in 3.4 kernel

Tejun,

On Tue, Mar 12, 2013 at 1:12 PM, Tejun Heo <[email protected]> wrote:
> Hello,
>
> On Tue, Mar 12, 2013 at 01:08:15PM +0800, Lei Wen wrote:
>> diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h
>> index 8afab27..425d5a2 100644
>> --- a/include/linux/workqueue.h
>> +++ b/include/linux/workqueue.h
>> @@ -189,12 +189,16 @@ static inline unsigned int work_static(struct
>> work_struct *work) { return 0; }
>> * NOTE! No point in using "atomic_long_set()": using a direct
>> * assignment of the work data initializer allows the compiler
>> * to generate better code.
>> + *
>> + * We take the assumption that work should not be inited if it already
>> + * hold the pending bit, or bug would be reported.
>> */
>> #ifdef CONFIG_LOCKDEP
>> #define __INIT_WORK(_work, _func, _onstack) \
>> do { \
>> static struct lock_class_key __key; \
>> \
>> + BUG_ON(work_pending(_work)); \
>
> You're initializing random piece of memory which may contain any
> garbage and triggering BUG if some bit is set on it. No, you can't do
> that. debugobj is the right tool for debugging object lifetime issues
> and is already supported.

The debugobj is not helping on this issue, I have enabled both
CONFIG_DEBUG_OBJECTS_WORK and CONFIG_DEBUG_OBJECTS.
But find they didn't report any issue at all.

And I am not init random memory, original issue is call init multi-times
for one structure and that piece of memory already been allocated.
And __INIT_WORK shouldn't call over random memory, right?

All this patch is adding a check here.

Thanks,
Lei

2013-03-12 05:24:33

by Tejun Heo

[permalink] [raw]

Subject: Re: workqueue panic in 3.4 kernel

On Tue, Mar 12, 2013 at 01:18:01PM +0800, Lei Wen wrote:
> > You're initializing random piece of memory which may contain any
> > garbage and triggering BUG if some bit is set on it. No, you can't do
> > that. debugobj is the right tool for debugging object lifetime issues
> > and is already supported.
>
> The debugobj is not helping on this issue, I have enabled both
> CONFIG_DEBUG_OBJECTS_WORK and CONFIG_DEBUG_OBJECTS.
> But find they didn't report any issue at all.

It should. No idea why it didn't. Would be interesting to find out
why.

> And I am not init random memory, original issue is call init multi-times
> for one structure and that piece of memory already been allocated.
> And __INIT_WORK shouldn't call over random memory, right?

Memory areas aren't always zero on allocation.

--
tejun

2013-03-12 05:34:59

by Lei Wen

[permalink] [raw]

Subject: Re: workqueue panic in 3.4 kernel

On Tue, Mar 12, 2013 at 1:24 PM, Tejun Heo <[email protected]> wrote:
> On Tue, Mar 12, 2013 at 01:18:01PM +0800, Lei Wen wrote:
>> > You're initializing random piece of memory which may contain any
>> > garbage and triggering BUG if some bit is set on it. No, you can't do
>> > that. debugobj is the right tool for debugging object lifetime issues
>> > and is already supported.
>>
>> The debugobj is not helping on this issue, I have enabled both
>> CONFIG_DEBUG_OBJECTS_WORK and CONFIG_DEBUG_OBJECTS.
>> But find they didn't report any issue at all.
>
> It should. No idea why it didn't. Would be interesting to find out
> why.

No idea about it also...

>
>> And I am not init random memory, original issue is call init multi-times
>> for one structure and that piece of memory already been allocated.
>> And __INIT_WORK shouldn't call over random memory, right?
>
> Memory areas aren't always zero on allocation.

Shouldn't work structure be allocated with kzalloc?

Thanks,
Lei

2013-03-12 05:40:35

by Tejun Heo

[permalink] [raw]

Subject: Re: workqueue panic in 3.4 kernel

On Tue, Mar 12, 2013 at 01:34:56PM +0800, Lei Wen wrote:
> > Memory areas aren't always zero on allocation.
>
> Shouldn't work structure be allocated with kzalloc?

It's not required to. work_struct can also be on stack. It's "init"
after all. Also, if you require clearing the memory before initing,
you would be just shifting problem from INIT to memory clearing.

--
tejun

2013-03-12 06:01:19

by Lei Wen

[permalink] [raw]

Subject: Re: workqueue panic in 3.4 kernel

On Tue, Mar 12, 2013 at 1:40 PM, Tejun Heo <[email protected]> wrote:
> On Tue, Mar 12, 2013 at 01:34:56PM +0800, Lei Wen wrote:
>> > Memory areas aren't always zero on allocation.
>>
>> Shouldn't work structure be allocated with kzalloc?
>
> It's not required to. work_struct can also be on stack. It's "init"
> after all. Also, if you require clearing the memory before initing,
> you would be just shifting problem from INIT to memory clearing.
>

I see...
How about only check those workqueue structure not on stack?
For current onstack usage is rare, and should be easier to check with.

Thanks,
Lei

2013-03-12 06:13:08

by Tejun Heo

[permalink] [raw]

Subject: Re: workqueue panic in 3.4 kernel

On Tue, Mar 12, 2013 at 02:01:16PM +0800, Lei Wen wrote:
> I see...
> How about only check those workqueue structure not on stack?
> For current onstack usage is rare, and should be easier to check with.

No, kzalloc is not required. The memory area can come from any source.
If you're interested in improving the debug situation, please stop
trying to do something impossible (you can't track lifetime of random
memory area without extra bookkeeping) try to find out why debugobj
didn't detect it (did you even enable it? either via
CONFIG_DEBUG_OBJECTS_ENABLE_DEFAULT or "debug_objects" boot param).

--
tejun

2013-03-12 06:41:13

by Lei Wen

[permalink] [raw]

Subject: Re: workqueue panic in 3.4 kernel

On Tue, Mar 12, 2013 at 2:13 PM, Tejun Heo <[email protected]> wrote:
> On Tue, Mar 12, 2013 at 02:01:16PM +0800, Lei Wen wrote:
>> I see...
>> How about only check those workqueue structure not on stack?
>> For current onstack usage is rare, and should be easier to check with.
>
> No, kzalloc is not required. The memory area can come from any source.
> If you're interested in improving the debug situation, please stop
> trying to do something impossible (you can't track lifetime of random
> memory area without extra bookkeeping) try to find out why debugobj
> didn't detect it (did you even enable it? either via
> CONFIG_DEBUG_OBJECTS_ENABLE_DEFAULT or "debug_objects" boot param).
>

I see...
Thanks for the detailed explanation! :)

Thanks,
Lei