2016-10-10 13:17:39

by Laszlo Ersek

[permalink] [raw]
Subject: CONFIG_DEBUG_TEST_DRIVER_REMOVE causes unremovable drivers to bind devices twice

Hi,

Greg asked me to stick to email with this bug report, so I'm reposting
the original kernel bugzilla report to personal addresses, and lkml.

Thanks,
Laszlo

https://bugzilla.kernel.org/show_bug.cgi?id=177021

Bug ID: 177021
Summary: [driver core] CONFIG_DEBUG_TEST_DRIVER_REMOVE causes
unremovable drivers to bind devices twice
Product: Drivers
Version: 2.5
Kernel Version: v4.8-2283-ga3443cd (4.9.0-0.rc0.git2.1.fc26.aarch64)
Hardware: All
OS: Linux
Tree: Mainline
Status: NEW
Severity: normal
Priority: P1
Component: Other
Assignee: [email protected]
Reporter: [email protected]
CC: [email protected], [email protected]
Regression: No

CONFIG_DEBUG_TEST_DRIVER_REMOVE was added in the following commit:

> commit bea5b158ff0da9c7246ff391f754f5f38e34577a
> Author: Rob Herring <[email protected]>
> Date: Thu Aug 11 10:20:58 2016 -0500
>
> driver core: add test of driver remove calls during probe
>
> In recent discussions on ksummit-discuss[1], it was suggested to do a
> sequence of probe, remove, probe for testing driver remove paths. This
> adds a kconfig option for said test.
>
> [1] https://lists.linuxfoundation.org/pipermail/ksummit-discuss/2016-August/003459.html
>
> Suggested-by: Arnd Bergmann <[email protected]>
> Cc: Greg Kroah-Hartman <[email protected]>
> Signed-off-by: Rob Herring <[email protected]>
> Signed-off-by: Greg Kroah-Hartman <[email protected]>
>
> diff --git a/drivers/base/Kconfig b/drivers/base/Kconfig
> index 98504ec99c7d..fdf44cac08e6 100644
> --- a/drivers/base/Kconfig
> +++ b/drivers/base/Kconfig
> @@ -212,6 +212,16 @@ config DEBUG_DEVRES
>
> If you are unsure about this, Say N here.
>
> +config DEBUG_TEST_DRIVER_REMOVE
> + bool "Test driver remove calls during probe"
> + depends on DEBUG_KERNEL
> + help
> + Say Y here if you want the Driver core to test driver remove functions
> + by calling probe, remove, probe. This tests the remove path without
> + having to unbind the driver or unload the driver module.
> +
> + If you are unsure about this, say N here.
> +
> config SYS_HYPERVISOR
> bool
> default n
> diff --git a/drivers/base/dd.c b/drivers/base/dd.c
> index 16688f50729c..4910e6db2a34 100644
> --- a/drivers/base/dd.c
> +++ b/drivers/base/dd.c
> @@ -328,68 +328,89 @@ static DECLARE_WAIT_QUEUE_HEAD(probe_waitqueue);
> static int really_probe(struct device *dev, struct device_driver *drv)
> {
> int ret = -EPROBE_DEFER;
> int local_trigger_count = atomic_read(&deferred_trigger_count);
> + bool test_remove = IS_ENABLED(CONFIG_DEBUG_TEST_DRIVER_REMOVE);
>
> if (defer_all_probes) {
> /*
> * Value of defer_all_probes can be set only by
> * device_defer_all_probes_enable() which, in turn, will call
> * wait_for_device_probe() right after that to avoid any races.
> */
> dev_dbg(dev, "Driver %s force probe deferral\n", drv->name);
> driver_deferred_probe_add(dev);
> return ret;
> }
>
> atomic_inc(&probe_count);
> pr_debug("bus: '%s': %s: probing driver %s with device %s\n",
> drv->bus->name, __func__, drv->name, dev_name(dev));
> WARN_ON(!list_empty(&dev->devres_head));
>
> +re_probe:
> dev->driver = drv;
>
> /* If using pinctrl, bind pins now before probing */
> ret = pinctrl_bind_pins(dev);
> if (ret)
> goto pinctrl_bind_failed;
>
> if (driver_sysfs_add(dev)) {
> printk(KERN_ERR "%s: driver_sysfs_add(%s) failed\n",
> __func__, dev_name(dev));
> goto probe_failed;
> }
>
> if (dev->pm_domain && dev->pm_domain->activate) {
> ret = dev->pm_domain->activate(dev);
> if (ret)
> goto probe_failed;
> }
>
> /*
> * Ensure devices are listed in devices_kset in correct order
> * It's important to move Dev to the end of devices_kset before
> * calling .probe, because it could be recursive and parent Dev
> * should always go first
> */
> devices_kset_move_last(dev);
>
> if (dev->bus->probe) {
> ret = dev->bus->probe(dev);
> if (ret)
> goto probe_failed;
> } else if (drv->probe) {
> ret = drv->probe(dev);
> if (ret)
> goto probe_failed;
> }
>
> + if (test_remove) {
> + test_remove = false;
> +
> + if (dev->bus && dev->bus->remove)
> + dev->bus->remove(dev);
> + else if (drv->remove)
> + drv->remove(dev);
> +
> + devres_release_all(dev);
> + driver_sysfs_remove(dev);
> + dev->driver = NULL;
> + dev_set_drvdata(dev, NULL);
> + if (dev->pm_domain && dev->pm_domain->dismiss)
> + dev->pm_domain->dismiss(dev);
> + pm_runtime_reinit(dev);
> +
> + goto re_probe;
> + }
> +
> pinctrl_init_done(dev);
>
> if (dev->pm_domain && dev->pm_domain->sync)
> dev->pm_domain->sync(dev);
>
> driver_bound(dev);
> ret = 1;
> pr_debug("bus: '%s': %s: bound device %s to driver %s\n",
> drv->bus->name, __func__, dev_name(dev), drv->name);
> goto done;

If none of the remove() callbacks are non-NULL -- that is, when the
device cannot be un-bound --, then none of the remove() callbacks are
called. That's good.

However:
- other resources are nonetheless released (from under the live device),
and
- the original device -- which is still bound -- is re-probed. This
leads to crashes such as the following:

> OF: PCI: host bridge /pcie@10000000 ranges: <------------ first probing
> OF: PCI: IO 0x3eff0000..0x3effffff -> 0x00000000
> OF: PCI: MEM 0x10000000..0x3efeffff -> 0x10000000
> OF: PCI: MEM 0x8000000000..0xffffffffff -> 0x8000000000
> pci-host-generic 3f000000.pcie: ECAM at [mem 0x3f000000-0x3fffffff] for [bus 00-0f]
> pci-host-generic 3f000000.pcie: PCI host bridge to bus 0000:00
> pci_bus 0000:00: root bus resource [bus 00-0f]
> pci_bus 0000:00: root bus resource [io 0x0000-0xffff]
> pci_bus 0000:00: root bus resource [mem 0x10000000-0x3efeffff]
> pci_bus 0000:00: root bus resource [mem 0x8000000000-0xffffffffff]
> pci 0000:00:02.0: BAR 0: assigned [mem 0x8000000000-0x8000003fff 64bit]
> pci 0000:00:03.0: BAR 4: assigned [mem 0x8000004000-0x8000007fff 64bit pref]
> pci 0000:00:01.0: BAR 1: assigned [mem 0x10000000-0x10000fff]
> pci 0000:00:03.0: BAR 1: assigned [mem 0x10001000-0x10001fff]
> pci 0000:00:01.0: BAR 0: assigned [io 0x1000-0x103f]
> pci 0000:00:02.0: enabling device (0000 -> 0002)
> OF: PCI: host bridge /pcie@10000000 ranges: <------------ second probing
> OF: PCI: IO 0x3eff0000..0x3effffff -> 0x00000000
> OF: PCI: MEM 0x10000000..0x3efeffff -> 0x10000000
> OF: PCI: MEM 0x8000000000..0xffffffffff -> 0x8000000000
> ------------[ cut here ]------------
> kernel BUG at lib/ioremap.c:64!
> Internal error: Oops - BUG: 0 [#1] SMP
> Modules linked in:
> CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.9.0-0.rc0.git2.1.fc26.aarch64 #1
> Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015
> task: fffffe00e402e400 task.stack: fffffe00e8028000
> PC is at ioremap_page_range+0x168/0x2f8
> LR is at pci_remap_iospace+0x68/0x88
> pc : [<fffffc0008487c48>] lr : [<fffffc00084fc6e0>] pstate: 80000145
> sp : fffffe00e802ba80
> x29: fffffe00e802ba80 x28: 00000200c01f0000
> x27: 0140000000000800 x26: fffffdff7ee10000
> x25: 0400000000000001 x24: 00e8000000000707
> x23: fffffc0009ff7fd8 x22: fffffdff7ee10000
> x21: fffffdff7ee00000 x20: fffffdff7ee0ffff
> x19: 000000003eff0000 x18: 0000000000000010
> x17: 0000000000000000 x16: 0000000000000000
> x15: 0000000000000006 x14: fffffc0089cc42f7
> x13: fffffc0009cc4305 x12: 00000000000000ed
> x11: 0000000000000006 x10: 00000000000000ee
> x9 : 0000000000000001 x8 : fffffe00e8028000
> x7 : fffffc0009002e68 x6 : 00000200c01f0000
> x5 : fffffe000104f700 x4 : 000000000000ffff
> x3 : fffffc0008ee5eb0 x2 : 0000000040000000
> x1 : 0000000041040000 x0 : 00e800003eff0707
>
> Process swapper/0 (pid: 1, stack limit = 0xfffffe00e8028020)
> Stack: (0xfffffe00e802ba80 to 0xfffffe00e802c000)
> ba80: fffffe00e802bb00 fffffc00084fc6e0 fffffe00f205c980 000000003eff0000
> baa0: 0000000000000000 fffffc0008f8d908 fffffe00f2050d00 fffffc0008c251c8
> bac0: fffffe00e802bb90 0000000000000000 fffffe00f205c980 0000000000000000
> bae0: fffffe00e802bb20 fffffc000851f60c fffffdff7e801384 fffffe00e0105810
> bb00: fffffe00e802bb20 fffffc000851f6e4 fffffe00f205c000 fffffe00e0105810
> bb20: fffffe00e802bbe0 fffffc000851f7e4 fffffe00e0105800 fffffe00e0105810
> bb40: fffffc0008f8dab8 fffffc0009f88000 0000000000000000 0000000000000000
> bb60: fffffc0008d063c0 fffffc0008d86398 fffffc0008e520e0 fffffc000894d8cc
> bb80: fffffe00e802bbb0 fffffc00087b5bc0 fffffe00f2056b80 fffffe00f2053d00
> bba0: 000000003eff0000 fffffe00ffffd068 fffffe00e802bbe0 fffffc000851f7d8
> bbc0: fffffe00e0105800 fffffe00e0105810 fffffc0008f8dab8 fffffc0009f88000
> bbe0: fffffe00e802bc00 fffffc000862a760 00000000fffffffe fffffe00e0105810
> bc00: fffffe00e802bc30 fffffc0008627ac8 fffffe00e0105810 0000000000000000
> bc20: fffffc0008f8dae0 0000000000000000 fffffe00e802bc70 fffffc0008627e54
> bc40: fffffe00e0105810 fffffe00e0105870 fffffc0008f8dae0 0000000000000000
> bc60: fffffc0008fc4000 fffffc0008d20464 fffffe00e802bca0 fffffc00086252a8
> bc80: 0000000000000000 fffffc0008f8dae0 fffffc0008627d28 fffffc0008d86398
> bca0: fffffe00e802bce0 fffffc0008627130 fffffc0008f8dae0 fffffe00f3038900
> bcc0: fffffc0008fc5768 0000000000000000 fffffe00fa703930 fffffe00e81ce198
> bce0: fffffe00e802bd00 fffffc0008626b50 fffffc0008f8dae0 fffffe00f3038900
> bd00: fffffe00e802bd40 fffffc0008628ee0 fffffc0008f8dae0 0000000000000000
> bd20: 0000000000000000 0000000000000006 fffffc0009060000 0000000000000000
> bd40: fffffe00e802bd60 fffffc000862a684 fffffc0008f8dab8 0000000000000000
> bd60: fffffe00e802bd80 fffffc0008d5c5ac fffffc0008d5c594 fffffe00e8028000
> bd80: fffffe00e802bd90 fffffc0008083594 fffffe00e802be00 fffffc0008d20dec
> bda0: 0000000000000102 fffffc0009060000 fffffc0008d863a8 0000000000000006
> bdc0: fffffc0008e51c00 0000000000000000 fffffc0008d20dec fffffc0008bbe670
> bde0: 0000000000000000 0000000600000006 fffffc0008d20464 fffffc0008d063c0
> be00: fffffe00e802bea0 fffffc0008945568 fffffc0008945550 0000000000000000
> be20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> be40: 0000000000000000 0000000000000000 0000000000000000 0000000000000001
> be60: 0000000000000001 0000000000000000 0000000000000000 0000000000000000
> be80: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> bea0: 0000000000000000 fffffc0008083330 fffffc0008945550 0000000000000000
> bec0: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> bee0: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> bf00: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> bf20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> bf40: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> bf60: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> bf80: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> bfa0: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> bfc0: 0000000000000000 0000000000000005 0000000000000000 0000000000000000
> bfe0: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> Call trace:
> Exception stack(0xfffffe00e802b8a0 to 0xfffffe00e802b9d0)
> b8a0: 000000003eff0000 0000040000000000 fffffe00e802ba80 fffffc0008487c48
> b8c0: 0000000080000145 000000000000003d fffffe00e802b960 fffffc0008137f44
> b8e0: 8b6ac059b5e8abc6 fffffe00e402ecd8 0000000000000000 0000000000000000
> b900: fffffe00e402e400 0000000000000000 0000000000000001 fffffc0009cc4000
> b920: 0000000000000002 fffffe00e402ecb0 fffffe00e0105bd0 0000000000000000
> b940: fffffe00e402e400 0000000000000000 00e800003eff0707 0000000041040000
> b960: 0000000040000000 fffffc0008ee5eb0 000000000000ffff fffffe000104f700
> b980: 00000200c01f0000 fffffc0009002e68 fffffe00e8028000 0000000000000001
> b9a0: 00000000000000ee 0000000000000006 00000000000000ed fffffc0009cc4305
> b9c0: fffffc0089cc42f7 0000000000000006
> [<fffffc0008487c48>] ioremap_page_range+0x168/0x2f8
> [<fffffc00084fc6e0>] pci_remap_iospace+0x68/0x88
> [<fffffc000851f6e4>] pci_host_common_probe+0x24c/0x318
> [<fffffc000851f7e4>] gen_pci_probe+0x34/0x40
> [<fffffc000862a760>] platform_drv_probe+0x60/0xc8
> [<fffffc0008627ac8>] driver_probe_device+0x240/0x4a0
> [<fffffc0008627e54>] __driver_attach+0x12c/0x130
> [<fffffc00086252a8>] bus_for_each_dev+0x70/0xb0
> [<fffffc0008627130>] driver_attach+0x30/0x40
> [<fffffc0008626b50>] bus_add_driver+0x200/0x2b8
> [<fffffc0008628ee0>] driver_register+0x68/0x100
> [<fffffc000862a684>] __platform_driver_register+0x54/0x60
> [<fffffc0008d5c5ac>] gen_pci_driver_init+0x18/0x20
> [<fffffc0008083594>] do_one_initcall+0x44/0x138
> [<fffffc0008d20dec>] kernel_init_freeable+0x23c/0x2dc
> [<fffffc0008945568>] kernel_init+0x18/0x110
> [<fffffc0008083330>] ret_from_fork+0x10/0x20
> Code: 54fff801 52800000 1400005e d503201f (d4210000)
> ---[ end trace bf4514e6b7f2008d ]---
> Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
>
> Kernel Offset: disabled
> Memory Limit: none
> ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b

The "drivers/pci/host/pci-host-generic.c" driver has no remove method,
and its probe method works like this:

gen_pci_probe() [drivers/pci/host/pci-host-generic.c]
pci_host_common_probe() [drivers/pci/host/pci-host-common.c]
gen_pci_init() [drivers/pci/host/pci-host-common.c]
gen_pci_parse_request_of_pci_ranges() [drivers/pci/host/pci-host-common.c]
of_pci_get_host_bridge_resources() [drivers/of/of_pci.c]
pr_info(
"host bridge %s ranges:\n",
dev->full_name);
pci_remap_iospace() [drivers/pci/pci.c]

At the second probing attempt, the area is already mapped (the device is
still bound from the first probing), which triggers

BUG_ON(!pte_none(*pte));

in ioremap_pte_range() [lib/ioremap.c].

This is almost a regression because the kernel crashes with valid
drivers. It is not an error for a driver to not provide a remove()
callback, so in this instance CONFIG_DEBUG_TEST_DRIVER_REMOVE does not
expose a driver bug, it breaks with a valid driver. Not a regression for
the upstream kernel after all, because the Kconfig documentation
suggests N as default.

Proposed solution: if none of the remove() methods exist, or the
remove() method that exists fails, then don't release any resources, and
don't re-probe the device.

Thanks!


2016-10-10 13:34:14

by Rob Herring (Arm)

[permalink] [raw]
Subject: Re: CONFIG_DEBUG_TEST_DRIVER_REMOVE causes unremovable drivers to bind devices twice

On Mon, Oct 10, 2016 at 8:17 AM, Laszlo Ersek <[email protected]> wrote:
> Hi,
>
> Greg asked me to stick to email with this bug report, so I'm reposting
> the original kernel bugzilla report to personal addresses, and lkml.
>
> Thanks,
> Laszlo
>
> https://bugzilla.kernel.org/show_bug.cgi?id=177021
>
> Bug ID: 177021
> Summary: [driver core] CONFIG_DEBUG_TEST_DRIVER_REMOVE causes
> unremovable drivers to bind devices twice
> Product: Drivers
> Version: 2.5
> Kernel Version: v4.8-2283-ga3443cd (4.9.0-0.rc0.git2.1.fc26.aarch64)
> Hardware: All
> OS: Linux
> Tree: Mainline
> Status: NEW
> Severity: normal
> Priority: P1
> Component: Other
> Assignee: [email protected]
> Reporter: [email protected]
> CC: [email protected], [email protected]
> Regression: No
>
> CONFIG_DEBUG_TEST_DRIVER_REMOVE was added in the following commit:
>
>> commit bea5b158ff0da9c7246ff391f754f5f38e34577a
>> Author: Rob Herring <[email protected]>
>> Date: Thu Aug 11 10:20:58 2016 -0500
>>
>> driver core: add test of driver remove calls during probe

[...]

> This is almost a regression because the kernel crashes with valid
> drivers. It is not an error for a driver to not provide a remove()
> callback, so in this instance CONFIG_DEBUG_TEST_DRIVER_REMOVE does not
> expose a driver bug, it breaks with a valid driver. Not a regression for
> the upstream kernel after all, because the Kconfig documentation
> suggests N as default.
>
> Proposed solution: if none of the remove() methods exist, or the
> remove() method that exists fails, then don't release any resources, and
> don't re-probe the device.

I was thinking no remove method meant the driver didn't need to do any
explicit clean-up as all resources used devres, but I guess that's not
going to cover things like subsystem de-registration. I'll prepare a
fix.

Rob

2016-10-11 15:28:24

by Rob Herring (Arm)

[permalink] [raw]
Subject: Re: CONFIG_DEBUG_TEST_DRIVER_REMOVE causes unremovable drivers to bind devices twice

+Bjorn

On Mon, Oct 10, 2016 at 8:33 AM, Rob Herring <[email protected]> wrote:
> On Mon, Oct 10, 2016 at 8:17 AM, Laszlo Ersek <[email protected]> wrote:
>> Hi,
>>
>> Greg asked me to stick to email with this bug report, so I'm reposting
>> the original kernel bugzilla report to personal addresses, and lkml.
>>
>> Thanks,
>> Laszlo
>>
>> https://bugzilla.kernel.org/show_bug.cgi?id=177021
>>
>> Bug ID: 177021
>> Summary: [driver core] CONFIG_DEBUG_TEST_DRIVER_REMOVE causes
>> unremovable drivers to bind devices twice
>> Product: Drivers
>> Version: 2.5
>> Kernel Version: v4.8-2283-ga3443cd (4.9.0-0.rc0.git2.1.fc26.aarch64)
>> Hardware: All
>> OS: Linux
>> Tree: Mainline
>> Status: NEW
>> Severity: normal
>> Priority: P1
>> Component: Other
>> Assignee: [email protected]
>> Reporter: [email protected]
>> CC: [email protected], [email protected]
>> Regression: No
>>
>> CONFIG_DEBUG_TEST_DRIVER_REMOVE was added in the following commit:
>>
>>> commit bea5b158ff0da9c7246ff391f754f5f38e34577a
>>> Author: Rob Herring <[email protected]>
>>> Date: Thu Aug 11 10:20:58 2016 -0500
>>>
>>> driver core: add test of driver remove calls during probe
>
> [...]
>
>> This is almost a regression because the kernel crashes with valid
>> drivers. It is not an error for a driver to not provide a remove()
>> callback, so in this instance CONFIG_DEBUG_TEST_DRIVER_REMOVE does not
>> expose a driver bug, it breaks with a valid driver. Not a regression for
>> the upstream kernel after all, because the Kconfig documentation
>> suggests N as default.
>>
>> Proposed solution: if none of the remove() methods exist, or the
>> remove() method that exists fails, then don't release any resources, and
>> don't re-probe the device.
>
> I was thinking no remove method meant the driver didn't need to do any
> explicit clean-up as all resources used devres, but I guess that's not
> going to cover things like subsystem de-registration. I'll prepare a
> fix.

Looking at this some more, I think this should just be keyed off of
suppress_bind_attr. If userspace provides bind/unbind for a driver,
then remove and re-probe should work even if the driver doesn't have a
remove function.

Either the generic PCI host needs to set suppress_bind_attr like many
of the ARM-based PCI host drivers already do or the remove path should
be fixed to support this. Getting PCI hosts to be hot plug-able is a
goal (or maybe supported now?).

Rob