2021-04-26 17:10:51

by Andy Shevchenko

[permalink] [raw]
Subject: Sleeping in atomic context on device release due to device links

Hi!

Is the below already fixed somewhere (v5.12 seems still has it)?
Or I missed something?

[ 186.439095] BUG: sleeping function called from invalid context at
drivers/gpio/gpiolib.c:1952
[ 186.451666] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid:
119, name: kworker/0:2
[ 186.463885] 2 locks held by kworker/0:2/119:
[ 186.470831] #0: ffff985d8110d338
((wq_completion)rcu_gp){....}-{0:0}, at: process_one_work+0x1bc/0x4b0
[ 186.484458] #1: ffffb1a2c0367e70
((work_completion)(&sdp->work)){....}-{0:0}, at:
process_one_work+0x1bc/0x4b
0
[ 186.498732] CPU: 0 PID: 119 Comm: kworker/0:2 Not tainted 5.12.0-rc8+ #168
[ 186.508301] Hardware name: Intel Corporation Merrifield/BODEGA BAY,
BIOS 542 2015.01.21:18.19.48
[ 186.521000] Workqueue: rcu_gp srcu_invoke_callbacks
[ 186.528515] Call Trace:
[ 186.532288] dump_stack+0x69/0x8e
[ 186.536964] ___might_sleep.cold+0x95/0xa2
[ 186.543606] gpiod_free_commit+0x25/0x170
[ 186.550163] gpiod_put+0x19/0x40
[ 186.554728] cleanup+0x1b/0x30 [spi_pxa2xx_platform]
[ 186.562246] spidev_release+0x24/0x50
[ 186.567243] device_release+0x34/0x90
[ 186.572228] kobject_put+0x86/0x1d0
[ 186.577035] __device_link_free_srcu+0x47/0x70
[ 186.583942] srcu_invoke_callbacks+0xc8/0x170
[ 186.590720] process_one_work+0x24d/0x4b0
[ 186.597118] worker_thread+0x55/0x3c0
[ 186.602030] ? rescuer_thread+0x390/0x390
[ 186.608373] kthread+0x137/0x150
[ 186.612834] ? __kthread_bind_mask+0x60/0x60
[ 186.619446] ret_from_fork+0x22/0x30


--
With Best Regards,
Andy Shevchenko


2021-04-26 21:27:05

by Saravana Kannan

[permalink] [raw]
Subject: Re: Sleeping in atomic context on device release due to device links

On Mon, Apr 26, 2021 at 10:08 AM Andy Shevchenko
<[email protected]> wrote:
>
> Hi!
>
> Is the below already fixed somewhere (v5.12 seems still has it)?
> Or I missed something?
>
> [ 186.439095] BUG: sleeping function called from invalid context at
> drivers/gpio/gpiolib.c:1952
> [ 186.451666] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid:
> 119, name: kworker/0:2
> [ 186.463885] 2 locks held by kworker/0:2/119:
> [ 186.470831] #0: ffff985d8110d338
> ((wq_completion)rcu_gp){....}-{0:0}, at: process_one_work+0x1bc/0x4b0
> [ 186.484458] #1: ffffb1a2c0367e70
> ((work_completion)(&sdp->work)){....}-{0:0}, at:
> process_one_work+0x1bc/0x4b
> 0
> [ 186.498732] CPU: 0 PID: 119 Comm: kworker/0:2 Not tainted 5.12.0-rc8+ #168
> [ 186.508301] Hardware name: Intel Corporation Merrifield/BODEGA BAY,
> BIOS 542 2015.01.21:18.19.48
> [ 186.521000] Workqueue: rcu_gp srcu_invoke_callbacks
> [ 186.528515] Call Trace:
> [ 186.532288] dump_stack+0x69/0x8e
> [ 186.536964] ___might_sleep.cold+0x95/0xa2
> [ 186.543606] gpiod_free_commit+0x25/0x170
> [ 186.550163] gpiod_put+0x19/0x40
> [ 186.554728] cleanup+0x1b/0x30 [spi_pxa2xx_platform]
> [ 186.562246] spidev_release+0x24/0x50
> [ 186.567243] device_release+0x34/0x90
> [ 186.572228] kobject_put+0x86/0x1d0
> [ 186.577035] __device_link_free_srcu+0x47/0x70
> [ 186.583942] srcu_invoke_callbacks+0xc8/0x170
> [ 186.590720] process_one_work+0x24d/0x4b0
> [ 186.597118] worker_thread+0x55/0x3c0
> [ 186.602030] ? rescuer_thread+0x390/0x390
> [ 186.608373] kthread+0x137/0x150
> [ 186.612834] ? __kthread_bind_mask+0x60/0x60
> [ 186.619446] ret_from_fork+0x22/0x30
>

This took a few hours to debug, but it looks like a SPI framework bug.
Just that some device link code is exposing the bug.

Basically calling the spi controller cleanup in the device's release
op is wrong for many reasons. I'll send a patch for SPI later.

-Saravana

2021-04-28 20:52:27

by Saravana Kannan

[permalink] [raw]
Subject: Re: Sleeping in atomic context on device release due to device links

On Mon, Apr 26, 2021 at 10:08 AM Andy Shevchenko
<[email protected]> wrote:
>
> Hi!
>
> Is the below already fixed somewhere (v5.12 seems still has it)?
> Or I missed something?
>
> [ 186.439095] BUG: sleeping function called from invalid context at
> drivers/gpio/gpiolib.c:1952
> [ 186.451666] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid:
> 119, name: kworker/0:2
> [ 186.463885] 2 locks held by kworker/0:2/119:
> [ 186.470831] #0: ffff985d8110d338
> ((wq_completion)rcu_gp){....}-{0:0}, at: process_one_work+0x1bc/0x4b0
> [ 186.484458] #1: ffffb1a2c0367e70
> ((work_completion)(&sdp->work)){....}-{0:0}, at:
> process_one_work+0x1bc/0x4b
> 0
> [ 186.498732] CPU: 0 PID: 119 Comm: kworker/0:2 Not tainted 5.12.0-rc8+ #168
> [ 186.508301] Hardware name: Intel Corporation Merrifield/BODEGA BAY,
> BIOS 542 2015.01.21:18.19.48
> [ 186.521000] Workqueue: rcu_gp srcu_invoke_callbacks
> [ 186.528515] Call Trace:
> [ 186.532288] dump_stack+0x69/0x8e
> [ 186.536964] ___might_sleep.cold+0x95/0xa2
> [ 186.543606] gpiod_free_commit+0x25/0x170
> [ 186.550163] gpiod_put+0x19/0x40
> [ 186.554728] cleanup+0x1b/0x30 [spi_pxa2xx_platform]
> [ 186.562246] spidev_release+0x24/0x50
> [ 186.567243] device_release+0x34/0x90
> [ 186.572228] kobject_put+0x86/0x1d0
> [ 186.577035] __device_link_free_srcu+0x47/0x70
> [ 186.583942] srcu_invoke_callbacks+0xc8/0x170

Hi Rafael,

So it looks like put_device() is not guaranteed to be atomic and
srcu_invoke_callbacks() runs in atomic context. I haven't dug into why
the SRCU implementation is needed for device links and what needs to
be in the srcu callback vs be done earlier in the actual caller
context.

Can you please look into this and give your thoughts?

-Saravana

> [ 186.590720] process_one_work+0x24d/0x4b0
> [ 186.597118] worker_thread+0x55/0x3c0
> [ 186.602030] ? rescuer_thread+0x390/0x390
> [ 186.608373] kthread+0x137/0x150
> [ 186.612834] ? __kthread_bind_mask+0x60/0x60
> [ 186.619446] ret_from_fork+0x22/0x30
>
>
> --
> With Best Regards,
> Andy Shevchenko