Hello, Arun.
On Mon, Sep 29, 2014 at 09:40:50PM +0530, Arun KS wrote:
...
> The value of data is 0xffffffe0, which is basically the value after an
> INIT_WORK() or WORK_DATA_INIT().
> This can happen if a driver calls INIT_WORK on same struct work again
> after queuing it.
>
> From the above details of the work_struct shows that the work is
> queued from kernel/async.c. async_schedule dynamically allocates the
> work_struct and queues it to system_unbonded_wq. And possibility of
> calling INIT_WORK on same work is not there.
>
> After inspecting ramdump for async_entry structure in kernel/async.c
>
> crash> struct async_entry ed7cf140
> struct async_entry {
> domain_list = {
> next = 0xed7cf140,
> prev = 0xed7cf140
> },
> global_list = {
> next = 0xed7cf148,
> prev = 0xed7cf148
> },
> work = {
> data = {
> counter = 0xffffffe0
> },
> entry = {
> next = 0xed7cf154,
> prev = 0xed7cf154
> },
> func = 0xc0140ac4 <async_run_entry_fn>
> },
> cookie = 0x263e5,
> func = 0xc074dda0 <dapm_post_sequence_async>,
> data = 0xed48432c,
> domain = 0xe5457dec
> }
>
> the func points to dapm_post_sequence_async. and you can see the
> domain_list and global_list is empty. Which shows that the work has
> finished execution and there is no pending execution in async.
>
> But how come this struct work was with work queue data structures?
> Is there any corner case in work queue which can miss unlinking the
> struct_work from pool_workqueue after executing them?
I sure hope not. How reproducible is the issue? Can you try w/
CONFIG_DEBUG_OBJECTS_WORK enabled?
Thanks.
--
tejun
Hello Tejun,
On Mon, Oct 6, 2014 at 9:02 PM, Tejun Heo <[email protected]> wrote:
> Hello, Arun.
>
> On Mon, Sep 29, 2014 at 09:40:50PM +0530, Arun KS wrote:
> ...
>> The value of data is 0xffffffe0, which is basically the value after an
>> INIT_WORK() or WORK_DATA_INIT().
>> This can happen if a driver calls INIT_WORK on same struct work again
>> after queuing it.
>>
>> From the above details of the work_struct shows that the work is
>> queued from kernel/async.c. async_schedule dynamically allocates the
>> work_struct and queues it to system_unbonded_wq. And possibility of
>> calling INIT_WORK on same work is not there.
>>
>> After inspecting ramdump for async_entry structure in kernel/async.c
>>
>> crash> struct async_entry ed7cf140
>> struct async_entry {
>> domain_list = {
>> next = 0xed7cf140,
>> prev = 0xed7cf140
>> },
>> global_list = {
>> next = 0xed7cf148,
>> prev = 0xed7cf148
>> },
>> work = {
>> data = {
>> counter = 0xffffffe0
>> },
>> entry = {
>> next = 0xed7cf154,
>> prev = 0xed7cf154
>> },
>> func = 0xc0140ac4 <async_run_entry_fn>
>> },
>> cookie = 0x263e5,
>> func = 0xc074dda0 <dapm_post_sequence_async>,
>> data = 0xed48432c,
>> domain = 0xe5457dec
>> }
>>
>> the func points to dapm_post_sequence_async. and you can see the
>> domain_list and global_list is empty. Which shows that the work has
>> finished execution and there is no pending execution in async.
>>
>> But how come this struct work was with work queue data structures?
>> Is there any corner case in work queue which can miss unlinking the
>> struct_work from pool_workqueue after executing them?
>
> I sure hope not. How reproducible is the issue? Can you try w/
> CONFIG_DEBUG_OBJECTS_WORK enabled?
Thanks for replying.
That was a problem with one of our driver. It was freeing the
memory(struct work) without flushing workqueue.
We caught faulty driver by adding a BUG_ON() in INIT_WORK and looking
at the func pointer in work_struct( which will be pointing to the
faulty driver work function)
1) faulty driver queue_work to system_unbownded_wq
2) free work_struct memory, but it is still queued in the work queue.
3) another driver request the memory from SLAB, go the same memory, it INIT_WORK
4) process work try to execute the work queued by the faulty driver,
result in a crash.
Thanks,
Arun
>
> Thanks.
>
> --
> tejun
Hello, Arun.
On Wed, Oct 08, 2014 at 05:30:20PM +0530, Arun KS wrote:
> > I sure hope not. How reproducible is the issue? Can you try w/
> > CONFIG_DEBUG_OBJECTS_WORK enabled?
>
> Thanks for replying.
> That was a problem with one of our driver. It was freeing the
> memory(struct work) without flushing workqueue.
> We caught faulty driver by adding a BUG_ON() in INIT_WORK and looking
> at the func pointer in work_struct( which will be pointing to the
> faulty driver work function)
>
> 1) faulty driver queue_work to system_unbownded_wq
> 2) free work_struct memory, but it is still queued in the work queue.
> 3) another driver request the memory from SLAB, go the same memory, it INIT_WORK
> 4) process work try to execute the work queued by the faulty driver,
> result in a crash.
Ah, good to hear. I think bugs like the above should be detectable
with CONFIG_DEBUG_OBJECTS_WORK, so if you see something similar next
time, please try it out.
Thanks.
--
tejun