2016-12-10 00:15:35

by Vladimir Zapolskiy

[permalink] [raw]
Subject: [PATCH] driver core: flush async calls before testing driver removal

If CONFIG_DEBUG_TEST_DRIVER_REMOVE option is enabled a number of false
positives are reported for ATA controller drivers, because ATA port
probes are done asynchronously, and the same problem may also touch
other asynchronously probed drivers.

To reduce the rate of false reports on boot call async_synchronize_full()
before attempting to remove a driver, the same is done in delete_module()
syscall for all possible drivers and in __device_release_driver() function
for asynchronously probed drivers.

Fixes: bea5b158ff0d ("driver core: add test of driver remove calls during probe")
Suggested-by: Tejun Heo <[email protected]>
Signed-off-by: Vladimir Zapolskiy <[email protected]>
---
Some time ago the issue was discussed on the linux-ide mailing list, see

https://www.spinics.net/lists/linux-ide/msg53481.html

drivers/base/dd.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/drivers/base/dd.c b/drivers/base/dd.c
index d76cd97..a4feecf 100644
--- a/drivers/base/dd.c
+++ b/drivers/base/dd.c
@@ -384,6 +384,8 @@ static int really_probe(struct device *dev, struct device_driver *drv)
if (test_remove) {
test_remove = false;

+ async_synchronize_full();
+
if (dev->bus->remove)
dev->bus->remove(dev);
else if (drv->remove)
--
2.10.2


2016-12-10 01:59:15

by Dmitry Torokhov

[permalink] [raw]
Subject: Re: [PATCH] driver core: flush async calls before testing driver removal

On Fri, Dec 9, 2016 at 4:15 PM, Vladimir Zapolskiy <[email protected]> wrote:
> If CONFIG_DEBUG_TEST_DRIVER_REMOVE option is enabled a number of false
> positives are reported for ATA controller drivers, because ATA port
> probes are done asynchronously, and the same problem may also touch
> other asynchronously probed drivers.
>
> To reduce the rate of false reports on boot call async_synchronize_full()
> before attempting to remove a driver, the same is done in delete_module()
> syscall for all possible drivers and in __device_release_driver() function
> for asynchronously probed drivers.

I'd say CONFIG_DEBUG_TEST_DRIVER_REMOVE did what it was supposed to do
and uncovered a big in ATA drivers. Since driver core did not
asynchronously scheduled those actions it should not wait for their
completion either, but either ATA core or drivers should wait for
probing to complete before allowing remove() methods to run.

>
> Fixes: bea5b158ff0d ("driver core: add test of driver remove calls during probe")
> Suggested-by: Tejun Heo <[email protected]>
> Signed-off-by: Vladimir Zapolskiy <[email protected]>
> ---
> Some time ago the issue was discussed on the linux-ide mailing list, see
>
> https://www.spinics.net/lists/linux-ide/msg53481.html
>
> drivers/base/dd.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/drivers/base/dd.c b/drivers/base/dd.c
> index d76cd97..a4feecf 100644
> --- a/drivers/base/dd.c
> +++ b/drivers/base/dd.c
> @@ -384,6 +384,8 @@ static int really_probe(struct device *dev, struct device_driver *drv)
> if (test_remove) {
> test_remove = false;
>
> + async_synchronize_full();
> +
> if (dev->bus->remove)
> dev->bus->remove(dev);
> else if (drv->remove)
> --
> 2.10.2
>

Thanks.

--
Dmitry

2016-12-10 07:32:41

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH] driver core: flush async calls before testing driver removal

On Sat, Dec 10, 2016 at 02:15:19AM +0200, Vladimir Zapolskiy wrote:
> If CONFIG_DEBUG_TEST_DRIVER_REMOVE option is enabled a number of false
> positives are reported for ATA controller drivers, because ATA port
> probes are done asynchronously, and the same problem may also touch
> other asynchronously probed drivers.
>
> To reduce the rate of false reports on boot call async_synchronize_full()
> before attempting to remove a driver, the same is done in delete_module()
> syscall for all possible drivers and in __device_release_driver() function
> for asynchronously probed drivers.

__device_release_driver() already calls this function, why call it
again?

thanks,

greg k-h

2016-12-10 12:38:44

by Vladimir Zapolskiy

[permalink] [raw]
Subject: Re: [PATCH] driver core: flush async calls before testing driver removal

Hello Greg,

On 12/10/2016 09:32 AM, Greg Kroah-Hartman wrote:
> On Sat, Dec 10, 2016 at 02:15:19AM +0200, Vladimir Zapolskiy wrote:
>> If CONFIG_DEBUG_TEST_DRIVER_REMOVE option is enabled a number of false
>> positives are reported for ATA controller drivers, because ATA port
>> probes are done asynchronously, and the same problem may also touch
>> other asynchronously probed drivers.
>>
>> To reduce the rate of false reports on boot call async_synchronize_full()
>> before attempting to remove a driver, the same is done in delete_module()
>> syscall for all possible drivers and in __device_release_driver() function
>> for asynchronously probed drivers.
>
> __device_release_driver() already calls this function, why call it
> again?
>

__device_release_driver() is not called on test removal of drivers, if
CONFIG_DEBUG_TEST_DRIVER_REMOVE option is enabled.

This opens a possibility to races like one I've discovered:

https://www.spinics.net/lists/linux-ide/msg53473.html

Next, async_synchronize_full() from __device_release_driver() is not
called in case of removal of ATA controller drivers, because
driver_allows_async_probing(drv) return value is false.

--
With best wishes,
Vladimir

2016-12-10 12:57:31

by Vladimir Zapolskiy

[permalink] [raw]
Subject: Re: [PATCH] driver core: flush async calls before testing driver removal

Hello Dmitry,

On 12/10/2016 03:59 AM, Dmitry Torokhov wrote:
> On Fri, Dec 9, 2016 at 4:15 PM, Vladimir Zapolskiy <[email protected]> wrote:
>> If CONFIG_DEBUG_TEST_DRIVER_REMOVE option is enabled a number of false
>> positives are reported for ATA controller drivers, because ATA port
>> probes are done asynchronously, and the same problem may also touch
>> other asynchronously probed drivers.
>>
>> To reduce the rate of false reports on boot call async_synchronize_full()
>> before attempting to remove a driver, the same is done in delete_module()
>> syscall for all possible drivers and in __device_release_driver() function
>> for asynchronously probed drivers.
>
> I'd say CONFIG_DEBUG_TEST_DRIVER_REMOVE did what it was supposed to do
> and uncovered a big in ATA drivers. Since driver core did not
> asynchronously scheduled those actions it should not wait for their
> completion either, but either ATA core or drivers should wait for
> probing to complete before allowing remove() methods to run.
>

can you please share the idea why?

I haven't managed to find any problems with ATA subsystem and drivers,
my fuzz testing by inserting delays to postpone scheduled execution
of async_port_probe() don't show any problems also.

So I believe the problem is with the test itself.

--
With best wishes,
Vladimir

2016-12-10 13:04:06

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH] driver core: flush async calls before testing driver removal

On Sat, Dec 10, 2016 at 02:38:41PM +0200, Vladimir Zapolskiy wrote:
> Hello Greg,
>
> On 12/10/2016 09:32 AM, Greg Kroah-Hartman wrote:
> > On Sat, Dec 10, 2016 at 02:15:19AM +0200, Vladimir Zapolskiy wrote:
> >> If CONFIG_DEBUG_TEST_DRIVER_REMOVE option is enabled a number of false
> >> positives are reported for ATA controller drivers, because ATA port
> >> probes are done asynchronously, and the same problem may also touch
> >> other asynchronously probed drivers.
> >>
> >> To reduce the rate of false reports on boot call async_synchronize_full()
> >> before attempting to remove a driver, the same is done in delete_module()
> >> syscall for all possible drivers and in __device_release_driver() function
> >> for asynchronously probed drivers.
> >
> > __device_release_driver() already calls this function, why call it
> > again?
> >
>
> __device_release_driver() is not called on test removal of drivers, if
> CONFIG_DEBUG_TEST_DRIVER_REMOVE option is enabled.
>
> This opens a possibility to races like one I've discovered:
>
> https://www.spinics.net/lists/linux-ide/msg53473.html
>
> Next, async_synchronize_full() from __device_release_driver() is not
> called in case of removal of ATA controller drivers, because
> driver_allows_async_probing(drv) return value is false.

Hm, how does this not also get hit if you unbind/bind/unbind/bind/etc.
from userspace as well? I don't think this is a
CONFIG_DEBUG_TEST_DRIVER_REMOVE issue, but just that this option finds
the problem corner cases as you are finding out :)

thanks,

greg k-h

2016-12-11 01:44:41

by Vladimir Zapolskiy

[permalink] [raw]
Subject: Re: [PATCH] driver core: flush async calls before testing driver removal

Hello Greg,

I'm adding Tejun to the list of addressees.

On 12/10/2016 03:04 PM, Greg Kroah-Hartman wrote:
> On Sat, Dec 10, 2016 at 02:38:41PM +0200, Vladimir Zapolskiy wrote:
>> Hello Greg,
>>
>> On 12/10/2016 09:32 AM, Greg Kroah-Hartman wrote:
>>> On Sat, Dec 10, 2016 at 02:15:19AM +0200, Vladimir Zapolskiy wrote:
>>>> If CONFIG_DEBUG_TEST_DRIVER_REMOVE option is enabled a number of false
>>>> positives are reported for ATA controller drivers, because ATA port
>>>> probes are done asynchronously, and the same problem may also touch
>>>> other asynchronously probed drivers.
>>>>
>>>> To reduce the rate of false reports on boot call async_synchronize_full()
>>>> before attempting to remove a driver, the same is done in delete_module()
>>>> syscall for all possible drivers and in __device_release_driver() function
>>>> for asynchronously probed drivers.
>>>
>>> __device_release_driver() already calls this function, why call it
>>> again?
>>>
>>
>> __device_release_driver() is not called on test removal of drivers, if
>> CONFIG_DEBUG_TEST_DRIVER_REMOVE option is enabled.
>>
>> This opens a possibility to races like one I've discovered:
>>
>> https://www.spinics.net/lists/linux-ide/msg53473.html
>>
>> Next, async_synchronize_full() from __device_release_driver() is not
>> called in case of removal of ATA controller drivers, because
>> driver_allows_async_probing(drv) return value is false.
>
> Hm, how does this not also get hit if you unbind/bind/unbind/bind/etc.
> from userspace as well? I don't think this is a
> CONFIG_DEBUG_TEST_DRIVER_REMOVE issue, but just that this option finds
> the problem corner cases as you are finding out :)
>

and you are right, I managed to reproduce exactly the same race as before
running the unmodified kernel built from Torvald's branch head:

root@imx6q-sabresd:~# zcat /proc/config.gz | grep TEST_DRIVER_REMOVE
# CONFIG_DEBUG_TEST_DRIVER_REMOVE is not set

root@imx6q-sabresd:~# uname -a
Linux imx6q-sabresd 4.9.0-rc8-00134-g0451698 #23 SMP Sun Dec 11 03:37:30 EET 2016 armv7l GNU/Linux

root@imx6q-sabresd:~# echo 2200000.sata > /sys/bus/platform/drivers/ahci-imx/unbind ; echo 2200000.sata > /sys/bus/platform/drivers/ahci-imx/bind; echo 2200000.sata > /sys/bus/platform/drivers/ahci-imx/unbind
ahci-imx 2200000.sata: fsl,transmit-level-mV not specified, using 00000024
ahci-imx 2200000.sata: fsl,transmit-boost-mdB not specified, using 00000480
ahci-imx 2200000.sata: fsl,transmit-atten-16ths not specified, using 00002000
ahci-imx 2200000.sata: fsl,receive-eq-mdB not specified, using 05000000
ahci-imx 2200000.sata: SSS flag set, parallel bus scan disabled
ahci-imx 2200000.sata: AHCI 0001.0300 32 slots 1 ports 3 Gbps 0x1 impl platform mode
ahci-imx 2200000.sata: flags: ncq sntf stag pm led clo only pmp pio slum part ccc apst
scsi host0: ahci-imx
ata3: SATA max UDMA/133 mmio [mem 0x02200000-0x02203fff] port 0x100 irq 72
ata3: SATA link down (SStatus 0 SControl 300)
ahci-imx 2200000.sata: no device found, disabling link.
ahci-imx 2200000.sata: pass .hotplug=1 to enable hotplug
------------[ cut here ]------------
WARNING: CPU: 2 PID: 375 at drivers/ata/libata-core.c:6482 ata_port_detach+0x130/0x140
Modules linked in: ahci_imx dw_hdmi_ahb_audio evbug
CPU: 2 PID: 375 Comm: sh Not tainted 4.9.0-rc8-00134-g0451698 #23
Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
Backtrace:
[<>] (dump_backtrace) from [<>] (show_stack+0x20/0x24)
[<>] (show_stack) from [<>] (dump_stack+0xb4/0xe8)
[<>] (dump_stack) from [<>] (__warn+0xe4/0x110)
[<>] (__warn) from [<>] (warn_slowpath_null+0x30/0x38)
[<>] (warn_slowpath_null) from [<>] (ata_port_detach+0x130/0x140)
[<>] (ata_port_detach) from [<>] (ata_platform_remove_one+0x34/0x4c)
[<>] (ata_platform_remove_one) from [<>] (platform_drv_remove+0x34/0x4c)
[<>] (platform_drv_remove) from [<>] (__device_release_driver+0x98/0x134)
[<>] (__device_release_driver) from [<>] (device_release_driver+0x30/0x3c)
[<>] (device_release_driver) from [<>] (unbind_store+0x88/0x108)
[<>] (unbind_store) from [<>] (drv_attr_store+0x30/0x3c)
[<>] (drv_attr_store) from [<>] (sysfs_kf_write+0x58/0x5c)
[<>] (sysfs_kf_write) from [<>] (kernfs_fop_write+0x108/0x21c)
[<>] (kernfs_fop_write) from [<>] (__vfs_write+0x3c/0x128)
[<>] (__vfs_write) from [<>] (vfs_write+0xb0/0x178)
[<>] (vfs_write) from [<>] (SyS_write+0x4c/0xa0)
[<>] (SyS_write) from [<>] (ret_fast_syscall+0x0/0x1c)
---[ end trace 457071853738c95e ]---

Tejun, will you look at the problem again?

--
With best wishes,
Vladimir

2016-12-12 17:50:43

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH] driver core: flush async calls before testing driver removal

Hello,

On Sun, Dec 11, 2016 at 03:44:36AM +0200, Vladimir Zapolskiy wrote:
> On 12/10/2016 03:04 PM, Greg Kroah-Hartman wrote:
> > Hm, how does this not also get hit if you unbind/bind/unbind/bind/etc.
> > from userspace as well? I don't think this is a
> > CONFIG_DEBUG_TEST_DRIVER_REMOVE issue, but just that this option finds
> > the problem corner cases as you are finding out :)
> >
>
> and you are right, I managed to reproduce exactly the same race as before
> running the unmodified kernel built from Torvald's branch head:

Ah, you're right, so this means we need to add flush to all async
probing drivers. Will do so for libata shortly.

Thanks!

--
tejun

2016-12-12 18:34:08

by Rob Herring

[permalink] [raw]
Subject: Re: [PATCH] driver core: flush async calls before testing driver removal

On Mon, Dec 12, 2016 at 11:50 AM, Tejun Heo <[email protected]> wrote:
> Hello,
>
> On Sun, Dec 11, 2016 at 03:44:36AM +0200, Vladimir Zapolskiy wrote:
>> On 12/10/2016 03:04 PM, Greg Kroah-Hartman wrote:
>> > Hm, how does this not also get hit if you unbind/bind/unbind/bind/etc.
>> > from userspace as well? I don't think this is a
>> > CONFIG_DEBUG_TEST_DRIVER_REMOVE issue, but just that this option finds
>> > the problem corner cases as you are finding out :)
>> >
>>
>> and you are right, I managed to reproduce exactly the same race as before
>> running the unmodified kernel built from Torvald's branch head:
>
> Ah, you're right, so this means we need to add flush to all async
> probing drivers. Will do so for libata shortly.

Maybe I'm confused, but don't you need this for all drivers? You need
sync the async SCSI scanning to the driver remove regardless of async
probe. The driver core synchronization is only for synchronizing the
remove with probe AIUI.

Rob

2016-12-12 18:46:12

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH] driver core: flush async calls before testing driver removal

Hello,

On Mon, Dec 12, 2016 at 12:33:36PM -0600, Rob Herring wrote:
> Maybe I'm confused, but don't you need this for all drivers? You need
> sync the async SCSI scanning to the driver remove regardless of async
> probe. The driver core synchronization is only for synchronizing the
> remove with probe AIUI.

Heh, I'm not quite following what you mean. Can you please elaborate?
Also, on the second thought, it probably would be better to flush
async calls before unbind. It's fragile to require indivdiual drivers
to do that and I can't think of benefits of doing so. It's not like
the unbind / unload paths are hot in any way.

Thanks.

--
tejun