2015-02-02 03:38:31

by Ethan Zhao

[permalink] [raw]
Subject: Re: [PATCH Resend] cpufreq: Set cpufreq_cpu_data to NULL before putting kobject


On 2015/2/2 11:24, Viresh Kumar wrote:
> On 2 February 2015 at 08:50, ethan zhao <[email protected]> wrote:
>> This seems couldn't prevent all the 'bad thing' from happening, E.G.
>>
>>
>> Thread A: Workqueue: kacpi_notify
>>
>> acpi_processor_notify()
>> acpi_processor_ppc_has_changed()
>> cpufreq_update_policy()
>> cpufreq_cpu_get()
> We take cpufreq_driver_lock() here, and so this will
> block thread B.
No, there is no cpufreq_driver_lock acquired between

cpufreq_cpu_get() and cpufreq_cpu_put()


>> beginning the deference of policy Thread B:
>> ... ... __cpufreq_remove_dev_finish()
>> cpufreq_policy_free(policy);
>>
>>
>> Perhaps move policy->rwsem out side the policy structure is a way to avoid
>> it completely.
>> and you could stopping the PPC thread stepping forward as my patch as
>> temporary workaround.
> I couldn't understand your problem completely. Apart from giving a detailed
> look of what's going on both threads, always specify where the BUG actually
> is..
The problem is you are using a rwsem inside policy structure to protect its
assessment, that is bad design.

Thanks,
Ethan


2015-02-02 03:43:10

by Viresh Kumar

[permalink] [raw]
Subject: Re: [PATCH Resend] cpufreq: Set cpufreq_cpu_data to NULL before putting kobject

On 2 February 2015 at 09:08, ethan zhao <[email protected]> wrote:
>> We take cpufreq_driver_lock() here, and so this will
>> block thread B.
>
> No, there is no cpufreq_driver_lock acquired between
>
> cpufreq_cpu_get() and cpufreq_cpu_put()

I am not saying that the lock is taken between them. But within
cpufreq_cpu_get() to make sure policy doesn't get freed while we
are doing kobject_get().

>>> beginning the deference of policy Thread B:
>>> ... ... __cpufreq_remove_dev_finish()
>>> cpufreq_policy_free(policy);
>>>
>>>
>>> Perhaps move policy->rwsem out side the policy structure is a way to
>>> avoid
>>> it completely.
>>> and you could stopping the PPC thread stepping forward as my patch as
>>> temporary workaround.
>>
>> I couldn't understand your problem completely. Apart from giving a
>> detailed
>> look of what's going on both threads, always specify where the BUG
>> actually
>> is..
>
> The problem is you are using a rwsem inside policy structure to protect its
> assessment, that is bad design.

What is the current bug you are facing right now, I haven't understood it well.
Also a lock within the structure isn't new. Its all over the kernel. I
don't understand
why you say its a bad design.

--
viresh

2015-02-02 03:56:58

by Ethan Zhao

[permalink] [raw]
Subject: Re: [PATCH Resend] cpufreq: Set cpufreq_cpu_data to NULL before putting kobject


On 2015/2/2 11:43, Viresh Kumar wrote:
> On 2 February 2015 at 09:08, ethan zhao <[email protected]> wrote:
>>> We take cpufreq_driver_lock() here, and so this will
>>> block thread B.
>> No, there is no cpufreq_driver_lock acquired between
>>
>> cpufreq_cpu_get() and cpufreq_cpu_put()
> I am not saying that the lock is taken between them. But within
> cpufreq_cpu_get() to make sure policy doesn't get freed while we
> are doing kobject_get().
How to prevent the policy to be freed between

cpufreq_cpu_get() and cpufreq_cpu_put() ?

>>>> beginning the deference of policy Thread B:
>>>> ... ... __cpufreq_remove_dev_finish()
>>>> cpufreq_policy_free(policy);
>>>>
>>>>
>>>> Perhaps move policy->rwsem out side the policy structure is a way to
>>>> avoid
>>>> it completely.
>>>> and you could stopping the PPC thread stepping forward as my patch as
>>>> temporary workaround.
>>> I couldn't understand your problem completely. Apart from giving a
>>> detailed
>>> look of what's going on both threads, always specify where the BUG
>>> actually
>>> is..
>> The problem is you are using a rwsem inside policy structure to protect its
>> assessment, that is bad design.
> What is the current bug you are facing right now, I haven't understood it well.
> Also a lock within the structure isn't new. Its all over the kernel. I
> don't understand
> why you say its a bad design.
You are maxing up the water with sand ?

Thanks,
Ethan

>
> --
> viresh

2015-02-02 03:59:58

by Viresh Kumar

[permalink] [raw]
Subject: Re: [PATCH Resend] cpufreq: Set cpufreq_cpu_data to NULL before putting kobject

On 2 February 2015 at 09:26, ethan zhao <[email protected]> wrote:
> How to prevent the policy to be freed between
>
> cpufreq_cpu_get() and cpufreq_cpu_put() ?

kobject_get() increases the reference count of a policy and the policy
will only be freed when this is zero. And it will only be zero once all
cpufreq_cpu_get() are matched with corresponding cpufreq_cpu_put().

> You are maxing up the water with sand ?

:)

2015-02-02 04:06:22

by Ethan Zhao

[permalink] [raw]
Subject: Re: [PATCH Resend] cpufreq: Set cpufreq_cpu_data to NULL before putting kobject



On 2015/2/2 11:59, Viresh Kumar wrote:
> On 2 February 2015 at 09:26, ethan zhao <[email protected]> wrote:
>> How to prevent the policy to be freed between
>>
>> cpufreq_cpu_get() and cpufreq_cpu_put() ?
> kobject_get() increases the reference count of a policy and the policy
> will only be freed when this is zero. And it will only be zero once all
> cpufreq_cpu_get() are matched with corresponding cpufreq_cpu_put().
Is that an idea it supposed to be or fact ?

if (!cpufreq_suspended)
cpufreq_policy_free(policy);

static void cpufreq_policy_free(struct cpufreq_policy *policy)
{
free_cpumask_var(policy->related_cpus);
free_cpumask_var(policy->cpus);
kfree(policy);
}

It seems you just think about it ideally in mind.

Thanks,
Ethan
>
>> You are maxing up the water with sand ?
> :)

2015-02-02 04:09:45

by Viresh Kumar

[permalink] [raw]
Subject: Re: [PATCH Resend] cpufreq: Set cpufreq_cpu_data to NULL before putting kobject

On 2 February 2015 at 09:36, ethan zhao <[email protected]> wrote:
> Is that an idea it supposed to be or fact ?
>
> if (!cpufreq_suspended)
> cpufreq_policy_free(policy);
>
> static void cpufreq_policy_free(struct cpufreq_policy *policy)
> {
> free_cpumask_var(policy->related_cpus);
> free_cpumask_var(policy->cpus);
> kfree(policy);
> }
>
> It seems you just think about it ideally in mind.

if (!cpufreq_suspended)
cpufreq_policy_put_kobj(policy);

static void cpufreq_policy_put_kobj(struct cpufreq_policy *policy)
{
...

kobject_put(kobj);

/*
* We need to make sure that the underlying kobj is
* actually not referenced anymore by anybody before we
* proceed with unloading.
*/
pr_debug("waiting for dropping of refcount\n");
wait_for_completion(cmp);
}

2015-02-02 04:16:58

by Ethan Zhao

[permalink] [raw]
Subject: Re: [PATCH Resend] cpufreq: Set cpufreq_cpu_data to NULL before putting kobject


On 2015/2/2 12:09, Viresh Kumar wrote:
> On 2 February 2015 at 09:36, ethan zhao <[email protected]> wrote:
>> Is that an idea it supposed to be or fact ?
>>
>> if (!cpufreq_suspended)
>> cpufreq_policy_free(policy);
>>
>> static void cpufreq_policy_free(struct cpufreq_policy *policy)
>> {
>> free_cpumask_var(policy->related_cpus);
>> free_cpumask_var(policy->cpus);
>> kfree(policy);
>> }
>>
>> It seems
>> you just think about it ideally in mind.
We am talking about the policy allocation and de-allocation. right ?
I showed you the cpufreq_policy_free(policy) doesn't check kobject
refcount as above.

Hmmm, you are still sleeping in the kobject, wake up and don't mix
water anymore.

Thanks,
Ethan


> if (!cpufreq_suspended)
> cpufreq_policy_put_kobj(policy);
>
> static void cpufreq_policy_put_kobj(struct cpufreq_policy *policy)
> {
> ...
>
> kobject_put(kobj);
>
> /*
> * We need to make sure that the underlying kobj is
> * actually not referenced anymore by anybody before we
> * proceed with unloading.
> */
> pr_debug("waiting for dropping of refcount\n");
> wait_for_completion(cmp);
> }

2015-02-02 04:26:52

by Viresh Kumar

[permalink] [raw]
Subject: Re: [PATCH Resend] cpufreq: Set cpufreq_cpu_data to NULL before putting kobject

On 2 February 2015 at 09:46, ethan zhao <[email protected]> wrote:
> We am talking about the policy allocation and de-allocation. right ?
> I showed you the cpufreq_policy_free(policy) doesn't check kobject
> refcount as above.
>
> Hmmm, you are still sleeping in the kobject, wake up and don't mix
> water anymore.

It would be nice if we give each other the respect we deserve, And talk
about concrete points here.

>> if (!cpufreq_suspended)
>> cpufreq_policy_put_kobj(policy);
>>
>> static void cpufreq_policy_put_kobj(struct cpufreq_policy *policy)
>> {
>> ...
>>
>> kobject_put(kobj);
>>
>> /*
>> * We need to make sure that the underlying kobj is
>> * actually not referenced anymore by anybody before we
>> * proceed with unloading.
>> */
>> pr_debug("waiting for dropping of refcount\n");
>> wait_for_completion(cmp);
>> }

I gave you exactly what you wanted to go through, but it seems you
haven't tried enough.

Before freeing policy with cpufreq_policy_free(), we call
cpufreq_policy_put_kobj(). Now what does this function do? It waits
for the completion to fire (wait_for_completion()). This completion
will only fire when refcount of a kobject becomes zero.

Initially when we create the kobject, it is initialized to one. And the
last kobject_put() you see above in cpufreq_policy_put_kobj()
makes it zero. All other cpufreq_cpu_get() and put() should happen
in pairs, otherwise this refcount will never be zero again.

As soon as the refcount becomes zero, we are sure no one else is
using the policy structure anymore. And so we free it with
cpufreq_policy_free().

That routines doesn't have any tricks and simply frees the policy.
Because, before calling cpufreq_policy_put_kobj(), we have set
the per-cpu variable to NULL, nobody else will get the policy
structure by calling cpufreq_cpu_get(). And that's what my patch
tried to solve.

Let me know if I wasn't explanatory enough..

--
viresh

2015-02-02 04:46:15

by Ethan Zhao

[permalink] [raw]
Subject: Re: [PATCH Resend] cpufreq: Set cpufreq_cpu_data to NULL before putting kobject


On 2015/2/2 12:26, Viresh Kumar wrote:
> On 2 February 2015 at 09:46, ethan zhao <[email protected]> wrote:
>> We am talking about the policy allocation and de-allocation. right ?
>> I showed you the cpufreq_policy_free(policy) doesn't check kobject
>> refcount as above.
>>
>> Hmmm, you are still sleeping in the kobject, wake up and don't mix
>> water anymore.
> It would be nice if we give each other the respect we deserve, And talk
> about concrete points here.
Welcome back to the right way.
>>> if (!cpufreq_suspended)
>>> cpufreq_policy_put_kobj(policy);
>>>
>>> static void cpufreq_policy_put_kobj(struct cpufreq_policy *policy)
>>> {
>>> ...
>>>
>>> kobject_put(kobj);
>>>
>>> /*
>>> * We need to make sure that the underlying kobj is
>>> * actually not referenced anymore by anybody before we
>>> * proceed with unloading.
>>> */
>>> pr_debug("waiting for dropping of refcount\n");
>>> wait_for_completion(cmp);
>>> }
> I gave you exactly what you wanted to go through, but it seems you
> haven't tried enough.
>
> Before freeing policy with cpufreq_policy_free(), we call
> cpufreq_policy_put_kobj(). Now what does this function do? It waits
> for the completion to fire (wait_for_completion()). This completion
> will only fire when refcount of a kobject becomes zero.
>
> Initially when we create the kobject, it is initialized to one. And the
> last kobject_put() you see above in cpufreq_policy_put_kobj()
> makes it zero. All other cpufreq_cpu_get() and put() should happen
> in pairs, otherwise this refcount will never be zero again.
>
> As soon as the refcount becomes zero, we are sure no one else is
> using the policy structure anymore. And so we free it with
> cpufreq_policy_free().
But there is no checking against refcount in or before

cpufreq_policy_free(), that is one issue I mentioned.

>
> That routines doesn't have any tricks and simply frees the policy.
> Because, before calling cpufreq_policy_put_kobj(), we have set
> the per-cpu variable to NULL, nobody else will get the policy
It is possible cpufreq_cpu_get() within the PPC thread was called just
before __cpufreq_remove_dev_finish() is to be called in another thread,
so you set the per_cpu(cpufreq_cpu_data, cpu) to NULL will not prevent
the actions between cpufreq_cpu_get and cpufreq_cpu_put().

And then the freeing happens in __cpufreq_remove_dev_finish().
> structure by calling cpufreq_cpu_get(). And that's what my patch
> tried to solve.
>
> Let me know if I wasn't explanatory enough..

Ethan
>
> --
> viresh

2015-02-02 04:54:12

by Viresh Kumar

[permalink] [raw]
Subject: Re: [PATCH Resend] cpufreq: Set cpufreq_cpu_data to NULL before putting kobject

On 2 February 2015 at 10:15, ethan zhao <[email protected]> wrote:
> On 2015/2/2 12:26, Viresh Kumar wrote:

> But there is no checking against refcount in or before
>
> cpufreq_policy_free(), that is one issue I mentioned.

As I said earlier, the completion will only fire once the refcount
is zero. And so there is no need of any check here.

>> That routines doesn't have any tricks and simply frees the policy.
>> Because, before calling cpufreq_policy_put_kobj(), we have set
>> the per-cpu variable to NULL, nobody else will get the policy
>
> It is possible cpufreq_cpu_get() within the PPC thread was called just
> before __cpufreq_remove_dev_finish() is to be called in another thread,
> so you set the per_cpu(cpufreq_cpu_data, cpu) to NULL will not prevent
> the actions between cpufreq_cpu_get and cpufreq_cpu_put().
>
> And then the freeing happens in __cpufreq_remove_dev_finish().

It will.. You aren't looking closely enough. If cpufreq_cpu_get() is called just
before remove-dev, then cpufreq_cpu_get() will take:

read_lock_irqsave(&cpufreq_driver_lock, flags);

And it will do:

read_unlock_irqrestore(&cpufreq_driver_lock, flags);

only after increasing the refcount with kobject_get().

While on the other side __cpufreq_remove_dev_finish() will do this:

write_lock_irqsave(&cpufreq_driver_lock, flags);
policy = per_cpu(cpufreq_cpu_data, cpu);
per_cpu(cpufreq_cpu_data, cpu) = NULL;
write_unlock_irqrestore(&cpufreq_driver_lock, flags);

So, it will wait for the read_lock in cpufreq_cpu_get() to finish before
setting per-cpu variable to NULL. And so, after kobject_put() in
cpufreq_policy_put_kobj(), we will wait for the completion to fire
and that will only happen once a corresponding cpufreq_cpu_put()
is issued.

2015-02-02 15:05:07

by Ethan Zhao

[permalink] [raw]
Subject: Re: [PATCH Resend] cpufreq: Set cpufreq_cpu_data to NULL before putting kobject


On 2015/2/2 12:54, Viresh Kumar wrote:
> On 2 February 2015 at 10:15, ethan zhao <[email protected]> wrote:
>> On 2015/2/2 12:26, Viresh Kumar wrote:
>> But there is no checking against refcount in or before
>>
>> cpufreq_policy_free(), that is one issue I mentioned.
> As I said earlier, the completion will only fire once the refcount
> is zero. And so there is no need of any check here.
>
>>> That routines doesn't have any tricks and simply frees the policy.
>>> Because, before calling cpufreq_policy_put_kobj(), we have set
>>> the per-cpu variable to NULL, nobody else will get the policy
>> It is possible cpufreq_cpu_get() within the PPC thread was called just
>> before __cpufreq_remove_dev_finish() is to be called in another thread,
>> so you set the per_cpu(cpufreq_cpu_data, cpu) to NULL will not prevent
>> the actions between cpufreq_cpu_get and cpufreq_cpu_put().
>>
>> And then the freeing happens in __cpufreq_remove_dev_finish().
> It will.. You aren't looking closely enough. If cpufreq_cpu_get() is called just
> before remove-dev, then cpufreq_cpu_get() will take:
>
> read_lock_irqsave(&cpufreq_driver_lock, flags);
>
> And it will do:
>
> read_unlock_irqrestore(&cpufreq_driver_lock, flags);
>
> only after increasing the refcount with kobject_get().
>
> While on the other side __cpufreq_remove_dev_finish() will do this:
>
> write_lock_irqsave(&cpufreq_driver_lock, flags);
> policy = per_cpu(cpufreq_cpu_data, cpu);
> per_cpu(cpufreq_cpu_data, cpu) = NULL;
> write_unlock_irqrestore(&cpufreq_driver_lock, flags);
>
> So, it will wait for the read_lock in cpufreq_cpu_get() to finish before
> setting per-cpu variable to NULL. And so, after kobject_put() in
> cpufreq_policy_put_kobj(), we will wait for the completion to fire
Closely enough this time, understood, thanks for your explanation.


Ethan
> and that will only happen once a corresponding cpufreq_cpu_put()
> is issued.

2015-02-25 03:24:26

by ethan zhao

[permalink] [raw]
Subject: Re: [PATCH Resend] cpufreq: Set cpufreq_cpu_data to NULL before putting kobject

Viresh,
With this patch applied, still got the following warning and panic,
seems it needs more care.

54.474618] ------------[ cut here ]------------
[ 54.545816] WARNING: CPU: 0 PID: 213 at include/linux/kref.h:47
kobject_get+0x41/0x50()
[ 54.642595] Modules linked in: i2c_i801(+) mfd_core shpchp(+)
acpi_cpufreq(+) edac_core ioatdma(+) xfs libcrc32c ast syscopyarea ixgbe
sysfillrect sysimgblt sr_mod sd_mod drm_kms_helper igb mdio cdrom e1000e ahci
dca ttm libahci uas drm i2c_algo_bit ptp megaraid_sas libata usb_storage
i2c_core pps_core dm_mirror dm_region_hash dm_log dm_mod
[ 55.007264] CPU: 0 PID: 213 Comm: kworker/0:2 Not tainted
3.18.5
[ 55.099970] Hardware name: Oracle Corporation SUN FIRE X4170 M2 SERVER
/ASSY,MOTHERBOARD,X4170, BIOS 08120104 05/08/2012
[ 55.239736] Workqueue: kacpi_notify acpi_os_execute_deferred
[ 55.308598] 0000000000000000 00000000bd730b61 ffff88046742baf8
ffffffff816b7edb
[ 55.398305] 0000000000000000 0000000000000000 ffff88046742bb38
ffffffff81078ae1
[ 55.488040] ffff88046742bbd8 ffff8806706b3000 0000000000000292
0000000000000000
[ 55.577776] Call Trace:
[ 55.608228] [<ffffffff816b7edb>] dump_stack+0x46/0x58
[ 55.670895] [<ffffffff81078ae1>] warn_slowpath_common+0x81/0xa0
[ 55.743952] [<ffffffff81078bfa>] warn_slowpath_null+0x1a/0x20
[ 55.814929] [<ffffffff8130d0d1>] kobject_get+0x41/0x50
[ 55.878654] [<ffffffff8153e955>] cpufreq_cpu_get+0x75/0xc0
[ 55.946528] [<ffffffff8153f37e>] cpufreq_update_policy+0x2e/0x1f0
[ 56.021682] [<ffffffff810bf9d2>] ? up+0x32/0x50
[ 56.078126] [<ffffffff813ab975>] ? acpi_ns_get_node+0xcb/0xf2
[ 56.148974] [<ffffffff813abdc9>] ? acpi_evaluate_object+0x22c/0x252
[ 56.226066] [<ffffffff813ac3c2>] ? acpi_get_handle+0x95/0xc0
[ 56.295871] [<ffffffff8138a7fb>] ? acpi_has_method+0x25/0x40
[ 56.365661] [<ffffffff813bbcd4>] acpi_processor_ppc_has_changed+0x77/0x82
[ 56.448956] [<ffffffff8108f726>] ? move_linked_works+0x66/0x90
[ 56.520842] [<ffffffff813b87b9>] acpi_processor_notify+0x58/0xe7
[ 56.594807] [<ffffffff8139dfd8>] acpi_ev_notify_dispatch+0x44/0x5c
[ 56.670859] [<ffffffff81388fe3>] acpi_os_execute_deferred+0x15/0x22
[ 56.747936] [<ffffffff8109268e>] process_one_work+0x14e/0x3f0
[ 56.818766] [<ffffffff81092d9b>] worker_thread+0x11b/0x4d0
[ 56.886486] [<ffffffff81092c80>] ? rescuer_thread+0x350/0x350
[ 56.957316] [<ffffffff810984f1>] kthread+0xe1/0x100
[ 57.017742] [<ffffffff81098410>] ? kthread_create_on_node+0x1b0/0x1b0
[ 57.096903] [<ffffffff816bfe7c>] ret_from_fork+0x7c/0xb0
[ 57.162534] [<ffffffff81098410>] ? kthread_create_on_node+0x1b0/0x1b0
[ 57.241680] ---[ end trace dce06bb76f547de5 ]---


Any idea ?

Thanks,
Ethan

On Mon, Feb 2, 2015 at 11:04 PM, ethan zhao <[email protected]> wrote:
>
> On 2015/2/2 12:54, Viresh Kumar wrote:
>>
>> On 2 February 2015 at 10:15, ethan zhao <[email protected]> wrote:
>>>
>>> On 2015/2/2 12:26, Viresh Kumar wrote:
>>> But there is no checking against refcount in or before
>>>
>>> cpufreq_policy_free(), that is one issue I mentioned.
>>
>> As I said earlier, the completion will only fire once the refcount
>> is zero. And so there is no need of any check here.
>>
>>>> That routines doesn't have any tricks and simply frees the policy.
>>>> Because, before calling cpufreq_policy_put_kobj(), we have set
>>>> the per-cpu variable to NULL, nobody else will get the policy
>>>
>>> It is possible cpufreq_cpu_get() within the PPC thread was called just
>>> before __cpufreq_remove_dev_finish() is to be called in another thread,
>>> so you set the per_cpu(cpufreq_cpu_data, cpu) to NULL will not prevent
>>> the actions between cpufreq_cpu_get and cpufreq_cpu_put().
>>>
>>> And then the freeing happens in __cpufreq_remove_dev_finish().
>>
>> It will.. You aren't looking closely enough. If cpufreq_cpu_get() is
>> called just
>> before remove-dev, then cpufreq_cpu_get() will take:
>>
>> read_lock_irqsave(&cpufreq_driver_lock, flags);
>>
>> And it will do:
>>
>> read_unlock_irqrestore(&cpufreq_driver_lock, flags);
>>
>> only after increasing the refcount with kobject_get().
>>
>> While on the other side __cpufreq_remove_dev_finish() will do this:
>>
>> write_lock_irqsave(&cpufreq_driver_lock, flags);
>> policy = per_cpu(cpufreq_cpu_data, cpu);
>> per_cpu(cpufreq_cpu_data, cpu) = NULL;
>> write_unlock_irqrestore(&cpufreq_driver_lock, flags);
>>
>> So, it will wait for the read_lock in cpufreq_cpu_get() to finish before
>> setting per-cpu variable to NULL. And so, after kobject_put() in
>> cpufreq_policy_put_kobj(), we will wait for the completion to fire
>
> Closely enough this time, understood, thanks for your explanation.
>
>
> Ethan
>
>> and that will only happen once a corresponding cpufreq_cpu_put()
>> is issued.
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

2015-02-25 04:35:16

by Viresh Kumar

[permalink] [raw]
Subject: Re: [PATCH Resend] cpufreq: Set cpufreq_cpu_data to NULL before putting kobject

On Wednesday 25 February 2015 08:54 AM, Ethan Zhao wrote:
> Viresh,
> With this patch applied, still got the following warning and panic,
> seems it needs more care.
>
> 54.474618] ------------[ cut here ]------------
> [ 54.545816] WARNING: CPU: 0 PID: 213 at include/linux/kref.h:47
> kobject_get+0x41/0x50()
> [ 54.642595] Modules linked in: i2c_i801(+) mfd_core shpchp(+)
> acpi_cpufreq(+) edac_core ioatdma(+) xfs libcrc32c ast syscopyarea ixgbe
> sysfillrect sysimgblt sr_mod sd_mod drm_kms_helper igb mdio cdrom e1000e ahci
> dca ttm libahci uas drm i2c_algo_bit ptp megaraid_sas libata usb_storage
> i2c_core pps_core dm_mirror dm_region_hash dm_log dm_mod
> [ 55.007264] CPU: 0 PID: 213 Comm: kworker/0:2 Not tainted
> 3.18.5
> [ 55.099970] Hardware name: Oracle Corporation SUN FIRE X4170 M2 SERVER
> /ASSY,MOTHERBOARD,X4170, BIOS 08120104 05/08/2012
> [ 55.239736] Workqueue: kacpi_notify acpi_os_execute_deferred
> [ 55.308598] 0000000000000000 00000000bd730b61 ffff88046742baf8
> ffffffff816b7edb
> [ 55.398305] 0000000000000000 0000000000000000 ffff88046742bb38
> ffffffff81078ae1
> [ 55.488040] ffff88046742bbd8 ffff8806706b3000 0000000000000292
> 0000000000000000
> [ 55.577776] Call Trace:
> [ 55.608228] [<ffffffff816b7edb>] dump_stack+0x46/0x58
> [ 55.670895] [<ffffffff81078ae1>] warn_slowpath_common+0x81/0xa0
> [ 55.743952] [<ffffffff81078bfa>] warn_slowpath_null+0x1a/0x20
> [ 55.814929] [<ffffffff8130d0d1>] kobject_get+0x41/0x50
> [ 55.878654] [<ffffffff8153e955>] cpufreq_cpu_get+0x75/0xc0
> [ 55.946528] [<ffffffff8153f37e>] cpufreq_update_policy+0x2e/0x1f0
> [ 56.021682] [<ffffffff810bf9d2>] ? up+0x32/0x50
> [ 56.078126] [<ffffffff813ab975>] ? acpi_ns_get_node+0xcb/0xf2
> [ 56.148974] [<ffffffff813abdc9>] ? acpi_evaluate_object+0x22c/0x252
> [ 56.226066] [<ffffffff813ac3c2>] ? acpi_get_handle+0x95/0xc0
> [ 56.295871] [<ffffffff8138a7fb>] ? acpi_has_method+0x25/0x40
> [ 56.365661] [<ffffffff813bbcd4>] acpi_processor_ppc_has_changed+0x77/0x82
> [ 56.448956] [<ffffffff8108f726>] ? move_linked_works+0x66/0x90
> [ 56.520842] [<ffffffff813b87b9>] acpi_processor_notify+0x58/0xe7
> [ 56.594807] [<ffffffff8139dfd8>] acpi_ev_notify_dispatch+0x44/0x5c
> [ 56.670859] [<ffffffff81388fe3>] acpi_os_execute_deferred+0x15/0x22
> [ 56.747936] [<ffffffff8109268e>] process_one_work+0x14e/0x3f0
> [ 56.818766] [<ffffffff81092d9b>] worker_thread+0x11b/0x4d0
> [ 56.886486] [<ffffffff81092c80>] ? rescuer_thread+0x350/0x350
> [ 56.957316] [<ffffffff810984f1>] kthread+0xe1/0x100
> [ 57.017742] [<ffffffff81098410>] ? kthread_create_on_node+0x1b0/0x1b0
> [ 57.096903] [<ffffffff816bfe7c>] ret_from_fork+0x7c/0xb0
> [ 57.162534] [<ffffffff81098410>] ? kthread_create_on_node+0x1b0/0x1b0
> [ 57.241680] ---[ end trace dce06bb76f547de5 ]---
>
>
> Any idea ?

No. Santosh reported this to me few days back, I asked him to perform some
testing but don't know what happened after that..

Can you give me full kernel logs along with the crash after this patch.
You will be required to do some testing this time as I don't have any clue
about the problem..


diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index b4375021238f..230a59d2e0d7 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -214,8 +214,10 @@ struct cpufreq_policy *cpufreq_cpu_get(unsigned int cpu)
if (cpufreq_driver) {
/* get the CPU */
policy = per_cpu(cpufreq_cpu_data, cpu);
- if (policy)
+ if (policy) {
kobject_get(&policy->kobj);
+ pr_info("%s: %d", __func__, atomic_read(&policy->kobj.kref.refcount));
+ }
}

read_unlock_irqrestore(&cpufreq_driver_lock, flags);
@@ -233,6 +235,7 @@ void cpufreq_cpu_put(struct cpufreq_policy *policy)
return;

kobject_put(&policy->kobj);
+ pr_info("%s: %d", __func__, atomic_read(&policy->kobj.kref.refcount));
up_read(&cpufreq_rwsem);
}
EXPORT_SYMBOL_GPL(cpufreq_cpu_put);

2015-02-25 05:47:45

by ethan zhao

[permalink] [raw]
Subject: Re: [PATCH Resend] cpufreq: Set cpufreq_cpu_data to NULL before putting kobject

Viresh,

Will do that when I get the test box.

Thanks,
Ethan

On Wed, Feb 25, 2015 at 12:35 PM, viresh kumar <[email protected]> wrote:
> On Wednesday 25 February 2015 08:54 AM, Ethan Zhao wrote:
>> Viresh,
>> With this patch applied, still got the following warning and panic,
>> seems it needs more care.
>>
>> 54.474618] ------------[ cut here ]------------
>> [ 54.545816] WARNING: CPU: 0 PID: 213 at include/linux/kref.h:47
>> kobject_get+0x41/0x50()
>> [ 54.642595] Modules linked in: i2c_i801(+) mfd_core shpchp(+)
>> acpi_cpufreq(+) edac_core ioatdma(+) xfs libcrc32c ast syscopyarea ixgbe
>> sysfillrect sysimgblt sr_mod sd_mod drm_kms_helper igb mdio cdrom e1000e ahci
>> dca ttm libahci uas drm i2c_algo_bit ptp megaraid_sas libata usb_storage
>> i2c_core pps_core dm_mirror dm_region_hash dm_log dm_mod
>> [ 55.007264] CPU: 0 PID: 213 Comm: kworker/0:2 Not tainted
>> 3.18.5
>> [ 55.099970] Hardware name: Oracle Corporation SUN FIRE X4170 M2 SERVER
>> /ASSY,MOTHERBOARD,X4170, BIOS 08120104 05/08/2012
>> [ 55.239736] Workqueue: kacpi_notify acpi_os_execute_deferred
>> [ 55.308598] 0000000000000000 00000000bd730b61 ffff88046742baf8
>> ffffffff816b7edb
>> [ 55.398305] 0000000000000000 0000000000000000 ffff88046742bb38
>> ffffffff81078ae1
>> [ 55.488040] ffff88046742bbd8 ffff8806706b3000 0000000000000292
>> 0000000000000000
>> [ 55.577776] Call Trace:
>> [ 55.608228] [<ffffffff816b7edb>] dump_stack+0x46/0x58
>> [ 55.670895] [<ffffffff81078ae1>] warn_slowpath_common+0x81/0xa0
>> [ 55.743952] [<ffffffff81078bfa>] warn_slowpath_null+0x1a/0x20
>> [ 55.814929] [<ffffffff8130d0d1>] kobject_get+0x41/0x50
>> [ 55.878654] [<ffffffff8153e955>] cpufreq_cpu_get+0x75/0xc0
>> [ 55.946528] [<ffffffff8153f37e>] cpufreq_update_policy+0x2e/0x1f0
>> [ 56.021682] [<ffffffff810bf9d2>] ? up+0x32/0x50
>> [ 56.078126] [<ffffffff813ab975>] ? acpi_ns_get_node+0xcb/0xf2
>> [ 56.148974] [<ffffffff813abdc9>] ? acpi_evaluate_object+0x22c/0x252
>> [ 56.226066] [<ffffffff813ac3c2>] ? acpi_get_handle+0x95/0xc0
>> [ 56.295871] [<ffffffff8138a7fb>] ? acpi_has_method+0x25/0x40
>> [ 56.365661] [<ffffffff813bbcd4>] acpi_processor_ppc_has_changed+0x77/0x82
>> [ 56.448956] [<ffffffff8108f726>] ? move_linked_works+0x66/0x90
>> [ 56.520842] [<ffffffff813b87b9>] acpi_processor_notify+0x58/0xe7
>> [ 56.594807] [<ffffffff8139dfd8>] acpi_ev_notify_dispatch+0x44/0x5c
>> [ 56.670859] [<ffffffff81388fe3>] acpi_os_execute_deferred+0x15/0x22
>> [ 56.747936] [<ffffffff8109268e>] process_one_work+0x14e/0x3f0
>> [ 56.818766] [<ffffffff81092d9b>] worker_thread+0x11b/0x4d0
>> [ 56.886486] [<ffffffff81092c80>] ? rescuer_thread+0x350/0x350
>> [ 56.957316] [<ffffffff810984f1>] kthread+0xe1/0x100
>> [ 57.017742] [<ffffffff81098410>] ? kthread_create_on_node+0x1b0/0x1b0
>> [ 57.096903] [<ffffffff816bfe7c>] ret_from_fork+0x7c/0xb0
>> [ 57.162534] [<ffffffff81098410>] ? kthread_create_on_node+0x1b0/0x1b0
>> [ 57.241680] ---[ end trace dce06bb76f547de5 ]---
>>
>>
>> Any idea ?
>
> No. Santosh reported this to me few days back, I asked him to perform some
> testing but don't know what happened after that..
>
> Can you give me full kernel logs along with the crash after this patch.
> You will be required to do some testing this time as I don't have any clue
> about the problem..
>
>
> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
> index b4375021238f..230a59d2e0d7 100644
> --- a/drivers/cpufreq/cpufreq.c
> +++ b/drivers/cpufreq/cpufreq.c
> @@ -214,8 +214,10 @@ struct cpufreq_policy *cpufreq_cpu_get(unsigned int cpu)
> if (cpufreq_driver) {
> /* get the CPU */
> policy = per_cpu(cpufreq_cpu_data, cpu);
> - if (policy)
> + if (policy) {
> kobject_get(&policy->kobj);
> + pr_info("%s: %d", __func__, atomic_read(&policy->kobj.kref.refcount));
> + }
> }
>
> read_unlock_irqrestore(&cpufreq_driver_lock, flags);
> @@ -233,6 +235,7 @@ void cpufreq_cpu_put(struct cpufreq_policy *policy)
> return;
>
> kobject_put(&policy->kobj);
> + pr_info("%s: %d", __func__, atomic_read(&policy->kobj.kref.refcount));
> up_read(&cpufreq_rwsem);
> }
> EXPORT_SYMBOL_GPL(cpufreq_cpu_put);
>

2015-02-25 16:32:18

by Santosh Shilimkar

[permalink] [raw]
Subject: Re: [PATCH Resend] cpufreq: Set cpufreq_cpu_data to NULL before putting kobject

On 2/24/2015 9:47 PM, Ethan Zhao wrote:
> Viresh,
>
> Will do that when I get the test box.
>
Thanks Ethan.

>
> On Wed, Feb 25, 2015 at 12:35 PM, viresh kumar <[email protected]> wrote:
>> On Wednesday 25 February 2015 08:54 AM, Ethan Zhao wrote:
>>> Viresh,
>>> With this patch applied, still got the following warning and panic,
>>> seems it needs more care.
>>>
>>> 54.474618] ------------[ cut here ]------------
>>> [ 54.545816] WARNING: CPU: 0 PID: 213 at include/linux/kref.h:47
>>> kobject_get+0x41/0x50()
>>> [ 54.642595] Modules linked in: i2c_i801(+) mfd_core shpchp(+)
>>> acpi_cpufreq(+) edac_core ioatdma(+) xfs libcrc32c ast syscopyarea ixgbe
>>> sysfillrect sysimgblt sr_mod sd_mod drm_kms_helper igb mdio cdrom e1000e ahci
>>> dca ttm libahci uas drm i2c_algo_bit ptp megaraid_sas libata usb_storage
>>> i2c_core pps_core dm_mirror dm_region_hash dm_log dm_mod
>>> [ 55.007264] CPU: 0 PID: 213 Comm: kworker/0:2 Not tainted
>>> 3.18.5
>>> [ 55.099970] Hardware name: Oracle Corporation SUN FIRE X4170 M2 SERVER
>>> /ASSY,MOTHERBOARD,X4170, BIOS 08120104 05/08/2012
>>> [ 55.239736] Workqueue: kacpi_notify acpi_os_execute_deferred
>>> [ 55.308598] 0000000000000000 00000000bd730b61 ffff88046742baf8
>>> ffffffff816b7edb
>>> [ 55.398305] 0000000000000000 0000000000000000 ffff88046742bb38
>>> ffffffff81078ae1
>>> [ 55.488040] ffff88046742bbd8 ffff8806706b3000 0000000000000292
>>> 0000000000000000
>>> [ 55.577776] Call Trace:
>>> [ 55.608228] [<ffffffff816b7edb>] dump_stack+0x46/0x58
>>> [ 55.670895] [<ffffffff81078ae1>] warn_slowpath_common+0x81/0xa0
>>> [ 55.743952] [<ffffffff81078bfa>] warn_slowpath_null+0x1a/0x20
>>> [ 55.814929] [<ffffffff8130d0d1>] kobject_get+0x41/0x50
>>> [ 55.878654] [<ffffffff8153e955>] cpufreq_cpu_get+0x75/0xc0
>>> [ 55.946528] [<ffffffff8153f37e>] cpufreq_update_policy+0x2e/0x1f0
>>> [ 56.021682] [<ffffffff810bf9d2>] ? up+0x32/0x50
>>> [ 56.078126] [<ffffffff813ab975>] ? acpi_ns_get_node+0xcb/0xf2
>>> [ 56.148974] [<ffffffff813abdc9>] ? acpi_evaluate_object+0x22c/0x252
>>> [ 56.226066] [<ffffffff813ac3c2>] ? acpi_get_handle+0x95/0xc0
>>> [ 56.295871] [<ffffffff8138a7fb>] ? acpi_has_method+0x25/0x40
>>> [ 56.365661] [<ffffffff813bbcd4>] acpi_processor_ppc_has_changed+0x77/0x82
>>> [ 56.448956] [<ffffffff8108f726>] ? move_linked_works+0x66/0x90
>>> [ 56.520842] [<ffffffff813b87b9>] acpi_processor_notify+0x58/0xe7
>>> [ 56.594807] [<ffffffff8139dfd8>] acpi_ev_notify_dispatch+0x44/0x5c
>>> [ 56.670859] [<ffffffff81388fe3>] acpi_os_execute_deferred+0x15/0x22
>>> [ 56.747936] [<ffffffff8109268e>] process_one_work+0x14e/0x3f0
>>> [ 56.818766] [<ffffffff81092d9b>] worker_thread+0x11b/0x4d0
>>> [ 56.886486] [<ffffffff81092c80>] ? rescuer_thread+0x350/0x350
>>> [ 56.957316] [<ffffffff810984f1>] kthread+0xe1/0x100
>>> [ 57.017742] [<ffffffff81098410>] ? kthread_create_on_node+0x1b0/0x1b0
>>> [ 57.096903] [<ffffffff816bfe7c>] ret_from_fork+0x7c/0xb0
>>> [ 57.162534] [<ffffffff81098410>] ? kthread_create_on_node+0x1b0/0x1b0
>>> [ 57.241680] ---[ end trace dce06bb76f547de5 ]---
>>>
>>>
>>> Any idea ?
>>
>> No. Santosh reported this to me few days back, I asked him to perform some
>> testing but don't know what happened after that..
>>
I didn't get time to re-look into it so far.

Regards,
Santosh