2020-07-16 05:12:02

by Saravana Kannan

[permalink] [raw]
Subject: [PATCH v1] driver core: Fix scheduling while atomic warnings during device link deletion

Marek and Guenter reported that commit 287905e68dd2 ("driver core:
Expose device link details in sysfs") caused sleeping/scheduling while
atomic warnings.

BUG: sleeping function called from invalid context at kernel/locking/mutex.c:935
in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 12, name: kworker/0:1
2 locks held by kworker/0:1/12:
#0: ee8074a8 ((wq_completion)rcu_gp){+.+.}-{0:0}, at: process_one_work+0x174/0x7dc
#1: ee921f20 ((work_completion)(&sdp->work)){+.+.}-{0:0}, at: process_one_work+0x174/0x7dc
Preemption disabled at:
[<c01b10f0>] srcu_invoke_callbacks+0xc0/0x154
----- 8< ----- SNIP
[<c064590c>] (device_del) from [<c0645c9c>] (device_unregister+0x24/0x64)
[<c0645c9c>] (device_unregister) from [<c01b10fc>] (srcu_invoke_callbacks+0xcc/0x154)
[<c01b10fc>] (srcu_invoke_callbacks) from [<c01493c4>] (process_one_work+0x234/0x7dc)
[<c01493c4>] (process_one_work) from [<c01499b0>] (worker_thread+0x44/0x51c)
[<c01499b0>] (worker_thread) from [<c0150bf4>] (kthread+0x158/0x1a0)
[<c0150bf4>] (kthread) from [<c0100114>] (ret_from_fork+0x14/0x20)
Exception stack(0xee921fb0 to 0xee921ff8)

This was caused by the device link device being released in the context
of srcu_invoke_callbacks(). There is no need to wait till the RCU
callback to release the device link device. So release the device
earlier and revert the RCU callback code to what it was before
commit 287905e68dd2 ("driver core: Expose device link details in sysfs")

Fixes: 287905e68dd2 ("driver core: Expose device link details in sysfs")
Reported-by: Marek Szyprowski <[email protected]>
Reported-by: Guenter Roeck <[email protected]>
Signed-off-by: Saravana Kannan <[email protected]>
---
Marek and Guenter,

It haven't had a chance to test this yet. Can one of you please test it
and confirm it fixes the issue?

Thanks,
Saravana

drivers/base/core.c | 10 +++-------
1 file changed, 3 insertions(+), 7 deletions(-)

diff --git a/drivers/base/core.c b/drivers/base/core.c
index 5373ddd029f6..ccb2ce11f5b5 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -306,16 +306,10 @@ static struct attribute *devlink_attrs[] = {
};
ATTRIBUTE_GROUPS(devlink);

-static void devlink_dev_release(struct device *dev)
-{
- kfree(to_devlink(dev));
-}
-
static struct class devlink_class = {
.name = "devlink",
.owner = THIS_MODULE,
.dev_groups = devlink_groups,
- .dev_release = devlink_dev_release,
};

static int devlink_add_symlinks(struct device *dev,
@@ -737,7 +731,7 @@ static void device_link_free(struct device_link *link)

put_device(link->consumer);
put_device(link->supplier);
- device_unregister(&link->link_dev);
+ kfree(link);
}

#ifdef CONFIG_SRCU
@@ -756,6 +750,7 @@ static void __device_link_del(struct kref *kref)
if (link->flags & DL_FLAG_PM_RUNTIME)
pm_runtime_drop_link(link->consumer);

+ device_unregister(&link->link_dev);
list_del_rcu(&link->s_node);
list_del_rcu(&link->c_node);
call_srcu(&device_links_srcu, &link->rcu_head, __device_link_free_srcu);
@@ -771,6 +766,7 @@ static void __device_link_del(struct kref *kref)
if (link->flags & DL_FLAG_PM_RUNTIME)
pm_runtime_drop_link(link->consumer);

+ device_unregister(&link->link_dev);
list_del(&link->s_node);
list_del(&link->c_node);
device_link_free(link);
--
2.28.0.rc0.105.gf9edc3c819-goog


2020-07-16 05:32:02

by Guenter Roeck

[permalink] [raw]
Subject: Re: [PATCH v1] driver core: Fix scheduling while atomic warnings during device link deletion

On 7/15/20 10:08 PM, Saravana Kannan wrote:
> Marek and Guenter reported that commit 287905e68dd2 ("driver core:
> Expose device link details in sysfs") caused sleeping/scheduling while
> atomic warnings.
>
> BUG: sleeping function called from invalid context at kernel/locking/mutex.c:935
> in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 12, name: kworker/0:1
> 2 locks held by kworker/0:1/12:
> #0: ee8074a8 ((wq_completion)rcu_gp){+.+.}-{0:0}, at: process_one_work+0x174/0x7dc
> #1: ee921f20 ((work_completion)(&sdp->work)){+.+.}-{0:0}, at: process_one_work+0x174/0x7dc
> Preemption disabled at:
> [<c01b10f0>] srcu_invoke_callbacks+0xc0/0x154
> ----- 8< ----- SNIP
> [<c064590c>] (device_del) from [<c0645c9c>] (device_unregister+0x24/0x64)
> [<c0645c9c>] (device_unregister) from [<c01b10fc>] (srcu_invoke_callbacks+0xcc/0x154)
> [<c01b10fc>] (srcu_invoke_callbacks) from [<c01493c4>] (process_one_work+0x234/0x7dc)
> [<c01493c4>] (process_one_work) from [<c01499b0>] (worker_thread+0x44/0x51c)
> [<c01499b0>] (worker_thread) from [<c0150bf4>] (kthread+0x158/0x1a0)
> [<c0150bf4>] (kthread) from [<c0100114>] (ret_from_fork+0x14/0x20)
> Exception stack(0xee921fb0 to 0xee921ff8)
>
> This was caused by the device link device being released in the context
> of srcu_invoke_callbacks(). There is no need to wait till the RCU
> callback to release the device link device. So release the device
> earlier and revert the RCU callback code to what it was before
> commit 287905e68dd2 ("driver core: Expose device link details in sysfs")
>
> Fixes: 287905e68dd2 ("driver core: Expose device link details in sysfs")
> Reported-by: Marek Szyprowski <[email protected]>
> Reported-by: Guenter Roeck <[email protected]>
> Signed-off-by: Saravana Kannan <[email protected]>
> ---
> Marek and Guenter,
>
> It haven't had a chance to test this yet. Can one of you please test it
> and confirm it fixes the issue?
>

With this patch applied, the original warning is gone, but I get lots
of other warnings.

WARNING: CPU: 0 PID: 1 at drivers/base/core.c:1790 device_release+0x94/0xa4^M
Device 'regulators:regulator@0:50038000.ethernet' does not have a release() function, it is broken and must be fixed.

WARNING: CPU: 0 PID: 1 at drivers/base/core.c:1790 device_release+0x94/0xa4
Device '53f9c000.gpio:50038000.ethernet' does not have a release() function, it is broken and must be fixed.

WARNING: CPU: 0 PID: 1 at drivers/base/core.c:1790 device_release+0x94/0xa4^M
Device '50030000.tscadc:50030400.tcq' does not have a release() function, it is broken and must be fixed.

and so on. I don't know if this is caused by this patch or by
some other patch in -next.

Guenter

> Thanks,
> Saravana
>
> drivers/base/core.c | 10 +++-------
> 1 file changed, 3 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/base/core.c b/drivers/base/core.c
> index 5373ddd029f6..ccb2ce11f5b5 100644
> --- a/drivers/base/core.c
> +++ b/drivers/base/core.c
> @@ -306,16 +306,10 @@ static struct attribute *devlink_attrs[] = {
> };
> ATTRIBUTE_GROUPS(devlink);
>
> -static void devlink_dev_release(struct device *dev)
> -{
> - kfree(to_devlink(dev));
> -}
> -
> static struct class devlink_class = {
> .name = "devlink",
> .owner = THIS_MODULE,
> .dev_groups = devlink_groups,
> - .dev_release = devlink_dev_release,
> };
>
> static int devlink_add_symlinks(struct device *dev,
> @@ -737,7 +731,7 @@ static void device_link_free(struct device_link *link)
>
> put_device(link->consumer);
> put_device(link->supplier);
> - device_unregister(&link->link_dev);
> + kfree(link);
> }
>
> #ifdef CONFIG_SRCU
> @@ -756,6 +750,7 @@ static void __device_link_del(struct kref *kref)
> if (link->flags & DL_FLAG_PM_RUNTIME)
> pm_runtime_drop_link(link->consumer);
>
> + device_unregister(&link->link_dev);
> list_del_rcu(&link->s_node);
> list_del_rcu(&link->c_node);
> call_srcu(&device_links_srcu, &link->rcu_head, __device_link_free_srcu);
> @@ -771,6 +766,7 @@ static void __device_link_del(struct kref *kref)
> if (link->flags & DL_FLAG_PM_RUNTIME)
> pm_runtime_drop_link(link->consumer);
>
> + device_unregister(&link->link_dev);
> list_del(&link->s_node);
> list_del(&link->c_node);
> device_link_free(link);
>

2020-07-16 05:51:54

by Marek Szyprowski

[permalink] [raw]
Subject: Re: [PATCH v1] driver core: Fix scheduling while atomic warnings during device link deletion

Hi

On 16.07.2020 07:30, Guenter Roeck wrote:
> On 7/15/20 10:08 PM, Saravana Kannan wrote:
>> Marek and Guenter reported that commit 287905e68dd2 ("driver core:
>> Expose device link details in sysfs") caused sleeping/scheduling while
>> atomic warnings.
>>
>> BUG: sleeping function called from invalid context at kernel/locking/mutex.c:935
>> in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 12, name: kworker/0:1
>> 2 locks held by kworker/0:1/12:
>> #0: ee8074a8 ((wq_completion)rcu_gp){+.+.}-{0:0}, at: process_one_work+0x174/0x7dc
>> #1: ee921f20 ((work_completion)(&sdp->work)){+.+.}-{0:0}, at: process_one_work+0x174/0x7dc
>> Preemption disabled at:
>> [<c01b10f0>] srcu_invoke_callbacks+0xc0/0x154
>> ----- 8< ----- SNIP
>> [<c064590c>] (device_del) from [<c0645c9c>] (device_unregister+0x24/0x64)
>> [<c0645c9c>] (device_unregister) from [<c01b10fc>] (srcu_invoke_callbacks+0xcc/0x154)
>> [<c01b10fc>] (srcu_invoke_callbacks) from [<c01493c4>] (process_one_work+0x234/0x7dc)
>> [<c01493c4>] (process_one_work) from [<c01499b0>] (worker_thread+0x44/0x51c)
>> [<c01499b0>] (worker_thread) from [<c0150bf4>] (kthread+0x158/0x1a0)
>> [<c0150bf4>] (kthread) from [<c0100114>] (ret_from_fork+0x14/0x20)
>> Exception stack(0xee921fb0 to 0xee921ff8)
>>
>> This was caused by the device link device being released in the context
>> of srcu_invoke_callbacks(). There is no need to wait till the RCU
>> callback to release the device link device. So release the device
>> earlier and revert the RCU callback code to what it was before
>> commit 287905e68dd2 ("driver core: Expose device link details in sysfs")
>>
>> Fixes: 287905e68dd2 ("driver core: Expose device link details in sysfs")
>> Reported-by: Marek Szyprowski <[email protected]>
>> Reported-by: Guenter Roeck <[email protected]>
>> Signed-off-by: Saravana Kannan <[email protected]>
>> ---
>> Marek and Guenter,
>>
>> It haven't had a chance to test this yet. Can one of you please test it
>> and confirm it fixes the issue?
>>
> With this patch applied, the original warning is gone, but I get lots
> of other warnings.
>
> WARNING: CPU: 0 PID: 1 at drivers/base/core.c:1790 device_release+0x94/0xa4^M
> Device 'regulators:regulator@0:50038000.ethernet' does not have a release() function, it is broken and must be fixed.
>
> WARNING: CPU: 0 PID: 1 at drivers/base/core.c:1790 device_release+0x94/0xa4
> Device '53f9c000.gpio:50038000.ethernet' does not have a release() function, it is broken and must be fixed.
>
> WARNING: CPU: 0 PID: 1 at drivers/base/core.c:1790 device_release+0x94/0xa4^M
> Device '50030000.tscadc:50030400.tcq' does not have a release() function, it is broken and must be fixed.

I confirm that I also get such warnings for every platform device in the
system with this patch applied to linux next-20200715:

------------[ cut here ]------------
WARNING: CPU: 0 PID: 1 at drivers/base/core.c:1790 device_release+0x94/0x98
Device '10023c40.power-domain:13620000.sysmmu' does not have a release()
function, it is broken and must be fixed. See
Documentation/core-api/kobject.rst.
Modules linked in:
CPU: 0 PID: 1 Comm: swapper/0 Not tainted
5.8.0-rc5-next-20200715-00002-g0f637964c4b0 #1270
Hardware name: Samsung Exynos (Flattened Device Tree)
[<c011184c>] (unwind_backtrace) from [<c010d250>] (show_stack+0x10/0x14)
[<c010d250>] (show_stack) from [<c051b8fc>] (dump_stack+0xbc/0xe8)
[<c051b8fc>] (dump_stack) from [<c0126ed8>] (__warn+0xf0/0x108)
[<c0126ed8>] (__warn) from [<c0126f64>] (warn_slowpath_fmt+0x74/0xb8)
[<c0126f64>] (warn_slowpath_fmt) from [<c064a2a0>]
(device_release+0x94/0x98)
[<c064a2a0>] (device_release) from [<c0522178>] (kobject_put+0x104/0x288)
[<c0522178>] (kobject_put) from [<c064b45c>] (__device_link_del+0x38/0xac)
[<c064b45c>] (__device_link_del) from [<c064c1f0>]
(device_links_driver_bound+0x260/0x26c)
[<c064c1f0>] (device_links_driver_bound) from [<c0650af0>]
(driver_bound+0x5c/0x110)
[<c0650af0>] (driver_bound) from [<c0651038>] (really_probe+0x2d4/0x4fc)
[<c0651038>] (really_probe) from [<c06513c8>]
(driver_probe_device+0x78/0x1fc)
[<c06513c8>] (driver_probe_device) from [<c064ee00>]
(bus_for_each_drv+0x74/0xb8)
[<c064ee00>] (bus_for_each_drv) from [<c0650cc4>]
(__device_attach+0xd4/0x16c)
[<c0650cc4>] (__device_attach) from [<c064fdc4>]
(bus_probe_device+0x88/0x90)
[<c064fdc4>] (bus_probe_device) from [<c064c604>]
(fw_devlink_resume+0xa0/0x134)
[<c064c604>] (fw_devlink_resume) from [<c102bfd4>]
(of_platform_default_populate_init+0xa8/0xc0)
[<c102bfd4>] (of_platform_default_populate_init) from [<c0102378>]
(do_one_initcall+0x8c/0x424)
[<c0102378>] (do_one_initcall) from [<c1001158>]
(kernel_init_freeable+0x190/0x204)
[<c1001158>] (kernel_init_freeable) from [<c0ac05d0>]
(kernel_init+0x8/0x118)
[<c0ac05d0>] (kernel_init) from [<c0100114>] (ret_from_fork+0x14/0x20)
Exception stack(0xef0dffb0 to 0xef0dfff8)
ffa0:                                     00000000 00000000 00000000
00000000
ffc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000
ffe0: 00000000 00000000 00000000 00000000 00000013 00000000
irq event stamp: 40543
hardirqs last  enabled at (40551): [<c019d624>] console_unlock+0x430/0x6cc
hardirqs last disabled at (40568): [<c019d348>] console_unlock+0x154/0x6cc
softirqs last  enabled at (40584): [<c010174c>] __do_softirq+0x50c/0x608
softirqs last disabled at (40595): [<c0130218>] irq_exit+0x168/0x16c
---[ end trace 1d4780a89f63483a ]---

> and so on. I don't know if this is caused by this patch or by
> some other patch in -next.

This is caused by patch 287905e68dd2 ("driver core: Expose device link
details in sysfs"). If you revert it, the warning will go away.

Best regards
--
Marek Szyprowski, PhD
Samsung R&D Institute Poland

2020-07-16 18:29:14

by Saravana Kannan

[permalink] [raw]
Subject: Re: [PATCH v1] driver core: Fix scheduling while atomic warnings during device link deletion

On Wed, Jul 15, 2020 at 10:48 PM Marek Szyprowski
<[email protected]> wrote:
>
> Hi
>
> On 16.07.2020 07:30, Guenter Roeck wrote:
> > On 7/15/20 10:08 PM, Saravana Kannan wrote:
> >> Marek and Guenter reported that commit 287905e68dd2 ("driver core:
> >> Expose device link details in sysfs") caused sleeping/scheduling while
> >> atomic warnings.
> >>
> >> BUG: sleeping function called from invalid context at kernel/locking/mutex.c:935
> >> in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 12, name: kworker/0:1
> >> 2 locks held by kworker/0:1/12:
> >> #0: ee8074a8 ((wq_completion)rcu_gp){+.+.}-{0:0}, at: process_one_work+0x174/0x7dc
> >> #1: ee921f20 ((work_completion)(&sdp->work)){+.+.}-{0:0}, at: process_one_work+0x174/0x7dc
> >> Preemption disabled at:
> >> [<c01b10f0>] srcu_invoke_callbacks+0xc0/0x154
> >> ----- 8< ----- SNIP
> >> [<c064590c>] (device_del) from [<c0645c9c>] (device_unregister+0x24/0x64)
> >> [<c0645c9c>] (device_unregister) from [<c01b10fc>] (srcu_invoke_callbacks+0xcc/0x154)
> >> [<c01b10fc>] (srcu_invoke_callbacks) from [<c01493c4>] (process_one_work+0x234/0x7dc)
> >> [<c01493c4>] (process_one_work) from [<c01499b0>] (worker_thread+0x44/0x51c)
> >> [<c01499b0>] (worker_thread) from [<c0150bf4>] (kthread+0x158/0x1a0)
> >> [<c0150bf4>] (kthread) from [<c0100114>] (ret_from_fork+0x14/0x20)
> >> Exception stack(0xee921fb0 to 0xee921ff8)
> >>
> >> This was caused by the device link device being released in the context
> >> of srcu_invoke_callbacks(). There is no need to wait till the RCU
> >> callback to release the device link device. So release the device
> >> earlier and revert the RCU callback code to what it was before
> >> commit 287905e68dd2 ("driver core: Expose device link details in sysfs")
> >>
> >> Fixes: 287905e68dd2 ("driver core: Expose device link details in sysfs")
> >> Reported-by: Marek Szyprowski <[email protected]>
> >> Reported-by: Guenter Roeck <[email protected]>
> >> Signed-off-by: Saravana Kannan <[email protected]>
> >> ---
> >> Marek and Guenter,
> >>
> >> It haven't had a chance to test this yet. Can one of you please test it
> >> and confirm it fixes the issue?
> >>
> > With this patch applied, the original warning is gone, but I get lots
> > of other warnings.
> >
> > WARNING: CPU: 0 PID: 1 at drivers/base/core.c:1790 device_release+0x94/0xa4^M
> > Device 'regulators:regulator@0:50038000.ethernet' does not have a release() function, it is broken and must be fixed.
> >
> > WARNING: CPU: 0 PID: 1 at drivers/base/core.c:1790 device_release+0x94/0xa4
> > Device '53f9c000.gpio:50038000.ethernet' does not have a release() function, it is broken and must be fixed.
> >
> > WARNING: CPU: 0 PID: 1 at drivers/base/core.c:1790 device_release+0x94/0xa4^M
> > Device '50030000.tscadc:50030400.tcq' does not have a release() function, it is broken and must be fixed.
>
> I confirm that I also get such warnings for every platform device in the
> system with this patch applied to linux next-20200715:

Sigh... I should refrain from late night coding. I'll send a fix in a few hours.

-Saravana