2021-01-08 01:18:55

by David Collins

[permalink] [raw]
Subject: [PATCH] regulator: core: avoid regulator_resolve_supply() race condition

The final step in regulator_register() is to call
regulator_resolve_supply() for each registered regulator
(including the one in the process of being registered). The
regulator_resolve_supply() function first checks if rdev->supply
is NULL, then it performs various steps to try to find the supply.
If successful, rdev->supply is set inside of set_supply().

This procedure can encounter a race condition if two concurrent
tasks call regulator_register() near to each other on separate CPUs
and one of the regulators has rdev->supply_name specified. There
is currently nothing guaranteeing atomicity between the rdev->supply
check and set steps. Thus, both tasks can observe rdev->supply==NULL
in their regulator_resolve_supply() calls. This then results in
both creating a struct regulator for the supply. One ends up
actually stored in rdev->supply and the other is lost (though still
present in the supply's consumer_list).

Here is a kernel log snippet showing the issue:

[ 12.421768] gpu_cc_gx_gdsc: supplied by pm8350_s5_level
[ 12.425854] gpu_cc_gx_gdsc: supplied by pm8350_s5_level
[ 12.429064] debugfs: Directory 'regulator.4-SUPPLY' with parent
'17a00000.rsc:rpmh-regulator-gfxlvl-pm8350_s5_level'
already present!

Avoid this race condition by holding the rdev->mutex lock inside
of regulator_resolve_supply() while checking and setting
rdev->supply.

Signed-off-by: David Collins <[email protected]>
---
drivers/regulator/core.c | 39 ++++++++++++++++++++++++++++-----------
1 file changed, 28 insertions(+), 11 deletions(-)

diff --git a/drivers/regulator/core.c b/drivers/regulator/core.c
index fee9241..3ae5ccd 100644
--- a/drivers/regulator/core.c
+++ b/drivers/regulator/core.c
@@ -1813,23 +1813,34 @@ static int regulator_resolve_supply(struct regulator_dev *rdev)
{
struct regulator_dev *r;
struct device *dev = rdev->dev.parent;
- int ret;
+ int ret = 0;

/* No supply to resolve? */
if (!rdev->supply_name)
return 0;

- /* Supply already resolved? */
+ /* Supply already resolved? (fast-path without locking contention) */
if (rdev->supply)
return 0;

+ /*
+ * Recheck rdev->supply with rdev->mutex lock held to avoid a race
+ * between rdev->supply null check and setting rdev->supply in
+ * set_supply() from concurrent tasks.
+ */
+ regulator_lock(rdev);
+
+ /* Supply just resolved by a concurrent task? */
+ if (rdev->supply)
+ goto out;
+
r = regulator_dev_lookup(dev, rdev->supply_name);
if (IS_ERR(r)) {
ret = PTR_ERR(r);

/* Did the lookup explicitly defer for us? */
if (ret == -EPROBE_DEFER)
- return ret;
+ goto out;

if (have_full_constraints()) {
r = dummy_regulator_rdev;
@@ -1837,15 +1848,18 @@ static int regulator_resolve_supply(struct regulator_dev *rdev)
} else {
dev_err(dev, "Failed to resolve %s-supply for %s\n",
rdev->supply_name, rdev->desc->name);
- return -EPROBE_DEFER;
+ ret = -EPROBE_DEFER;
+ goto out;
}
}

if (r == rdev) {
dev_err(dev, "Supply for %s (%s) resolved to itself\n",
rdev->desc->name, rdev->supply_name);
- if (!have_full_constraints())
- return -EINVAL;
+ if (!have_full_constraints()) {
+ ret = -EINVAL;
+ goto out;
+ }
r = dummy_regulator_rdev;
get_device(&r->dev);
}
@@ -1859,7 +1873,8 @@ static int regulator_resolve_supply(struct regulator_dev *rdev)
if (r->dev.parent && r->dev.parent != rdev->dev.parent) {
if (!device_is_bound(r->dev.parent)) {
put_device(&r->dev);
- return -EPROBE_DEFER;
+ ret = -EPROBE_DEFER;
+ goto out;
}
}

@@ -1867,13 +1882,13 @@ static int regulator_resolve_supply(struct regulator_dev *rdev)
ret = regulator_resolve_supply(r);
if (ret < 0) {
put_device(&r->dev);
- return ret;
+ goto out;
}

ret = set_supply(rdev, r);
if (ret < 0) {
put_device(&r->dev);
- return ret;
+ goto out;
}

/*
@@ -1886,11 +1901,13 @@ static int regulator_resolve_supply(struct regulator_dev *rdev)
if (ret < 0) {
_regulator_put(rdev->supply);
rdev->supply = NULL;
- return ret;
+ goto out;
}
}

- return 0;
+out:
+ regulator_unlock(rdev);
+ return ret;
}

/* Internal regulator request function */
--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project


2021-01-11 16:32:18

by Mark Brown

[permalink] [raw]
Subject: Re: [PATCH] regulator: core: avoid regulator_resolve_supply() race condition

On Thu, 7 Jan 2021 17:16:02 -0800, David Collins wrote:
> The final step in regulator_register() is to call
> regulator_resolve_supply() for each registered regulator
> (including the one in the process of being registered). The
> regulator_resolve_supply() function first checks if rdev->supply
> is NULL, then it performs various steps to try to find the supply.
> If successful, rdev->supply is set inside of set_supply().
>
> [...]

Applied to

https://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator.git for-next

Thanks!

[1/1] regulator: core: avoid regulator_resolve_supply() race condition
commit: eaa7995c529b54d68d97a30f6344cc6ca2f214a7

All being well this means that it will be integrated into the linux-next
tree (usually sometime in the next 24 hours) and sent to Linus during
the next merge window (or sooner if it is a bug fix), however if
problems are discovered then the patch may be dropped or reverted.

You may get further e-mails resulting from automated or manual testing
and review of the tree, please engage with people reporting problems and
send followup patches addressing any issues that are reported if needed.

If any updates are required or you are submitting further changes they
should be sent as incremental updates against current git, existing
patches will not be replaced.

Please add any relevant lists and maintainers to the CCs when replying
to this mail.

Thanks,
Mark

2021-01-13 04:07:47

by Marek Szyprowski

[permalink] [raw]
Subject: Re: [PATCH] regulator: core: avoid regulator_resolve_supply() race condition

Hi,

On 08.01.2021 02:16, David Collins wrote:
> The final step in regulator_register() is to call
> regulator_resolve_supply() for each registered regulator
> (including the one in the process of being registered). The
> regulator_resolve_supply() function first checks if rdev->supply
> is NULL, then it performs various steps to try to find the supply.
> If successful, rdev->supply is set inside of set_supply().
>
> This procedure can encounter a race condition if two concurrent
> tasks call regulator_register() near to each other on separate CPUs
> and one of the regulators has rdev->supply_name specified. There
> is currently nothing guaranteeing atomicity between the rdev->supply
> check and set steps. Thus, both tasks can observe rdev->supply==NULL
> in their regulator_resolve_supply() calls. This then results in
> both creating a struct regulator for the supply. One ends up
> actually stored in rdev->supply and the other is lost (though still
> present in the supply's consumer_list).
>
> Here is a kernel log snippet showing the issue:
>
> [ 12.421768] gpu_cc_gx_gdsc: supplied by pm8350_s5_level
> [ 12.425854] gpu_cc_gx_gdsc: supplied by pm8350_s5_level
> [ 12.429064] debugfs: Directory 'regulator.4-SUPPLY' with parent
> '17a00000.rsc:rpmh-regulator-gfxlvl-pm8350_s5_level'
> already present!
>
> Avoid this race condition by holding the rdev->mutex lock inside
> of regulator_resolve_supply() while checking and setting
> rdev->supply.
>
> Signed-off-by: David Collins <[email protected]>

This patch landed in linux next-20210112 as commit eaa7995c529b
("regulator: core: avoid regulator_resolve_supply() race condition"). I
found that it triggers a following lockdep warning during the DWC3
driver registration on some Exynos based boards (this log is from
Samsung Exynos5420-based Peach-Pit board):

======================================================
WARNING: possible circular locking dependency detected
5.11.0-rc1-00008-geaa7995c529b #10095 Not tainted
------------------------------------------------------
swapper/0/1 is trying to acquire lock:
c12e1b80 (regulator_list_mutex){+.+.}-{3:3}, at:
regulator_lock_dependent+0x4c/0x2b0

but task is already holding lock:
df7190c0 (regulator_ww_class_mutex){+.+.}-{3:3}, at:
regulator_resolve_supply+0x44/0x318

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #2 (regulator_ww_class_mutex){+.+.}-{3:3}:
       ww_mutex_lock+0x48/0x88
       regulator_lock_recursive+0x84/0x1f4
       regulator_lock_dependent+0x184/0x2b0
       regulator_enable+0x30/0xe4
       dwc3_exynos_probe+0x17c/0x2c0
       platform_probe+0x80/0xc0
       really_probe+0x1c4/0x4e4
       driver_probe_device+0x78/0x1d8
       device_driver_attach+0x58/0x60
       __driver_attach+0xfc/0x160
       bus_for_each_dev+0x6c/0xb8
       bus_add_driver+0x170/0x20c
       driver_register+0x78/0x10c
       do_one_initcall+0x88/0x438
       kernel_init_freeable+0x18c/0x1dc
       kernel_init+0x8/0x118
       ret_from_fork+0x14/0x38
       0x0

-> #1 (regulator_ww_class_acquire){+.+.}-{0:0}:
       regulator_enable+0x30/0xe4
       dwc3_exynos_probe+0x17c/0x2c0
       platform_probe+0x80/0xc0
       really_probe+0x1c4/0x4e4
       driver_probe_device+0x78/0x1d8
       device_driver_attach+0x58/0x60
       __driver_attach+0xfc/0x160
       bus_for_each_dev+0x6c/0xb8
       bus_add_driver+0x170/0x20c
       driver_register+0x78/0x10c
       do_one_initcall+0x88/0x438
       kernel_init_freeable+0x18c/0x1dc
       kernel_init+0x8/0x118
       ret_from_fork+0x14/0x38
       0x0

-> #0 (regulator_list_mutex){+.+.}-{3:3}:
       lock_acquire+0x2e4/0x5dc
       __mutex_lock+0xa4/0xb60
       mutex_lock_nested+0x1c/0x24
       regulator_lock_dependent+0x4c/0x2b0
       regulator_enable+0x30/0xe4
       regulator_resolve_supply+0x1cc/0x318
       regulator_register_resolve_supply+0x14/0x78
       class_for_each_device+0x68/0xe8
       regulator_register+0xa2c/0xc9c
       devm_regulator_register+0x40/0x70
       tps65090_regulator_probe+0x150/0x648
       platform_probe+0x80/0xc0
       really_probe+0x1c4/0x4e4
       driver_probe_device+0x78/0x1d8
       bus_for_each_drv+0x78/0xbc
       __device_attach+0xe8/0x180
       bus_probe_device+0x88/0x90
       device_add+0x4c4/0x7e8
       platform_device_add+0x120/0x25c
       mfd_add_devices+0x580/0x60c
       tps65090_i2c_probe+0xb8/0x184
       i2c_device_probe+0x234/0x2a4
       really_probe+0x1c4/0x4e4
       driver_probe_device+0x78/0x1d8
       bus_for_each_drv+0x78/0xbc
       __device_attach+0xe8/0x180
       bus_probe_device+0x88/0x90
       device_add+0x4c4/0x7e8
       i2c_new_client_device+0x15c/0x27c
       of_i2c_register_devices+0x114/0x184
       i2c_register_adapter+0x1d8/0x6dc
       ec_i2c_probe+0xc8/0x124
       platform_probe+0x80/0xc0
       really_probe+0x1c4/0x4e4
       driver_probe_device+0x78/0x1d8
       bus_for_each_drv+0x78/0xbc
       __device_attach+0xe8/0x180
       bus_probe_device+0x88/0x90
       device_add+0x4c4/0x7e8
       of_platform_device_create_pdata+0x90/0xc8
       of_platform_bus_create+0x1a0/0x4ec
       of_platform_populate+0x88/0x120
       devm_of_platform_populate+0x40/0x80
       cros_ec_register+0x174/0x308
       cros_ec_spi_probe+0x16c/0x1ec
       spi_probe+0x88/0xac
       really_probe+0x1c4/0x4e4
       driver_probe_device+0x78/0x1d8
       device_driver_attach+0x58/0x60
       __driver_attach+0xfc/0x160
       bus_for_each_dev+0x6c/0xb8
       bus_add_driver+0x170/0x20c
       driver_register+0x78/0x10c
       do_one_initcall+0x88/0x438
       kernel_init_freeable+0x18c/0x1dc
       kernel_init+0x8/0x118
       ret_from_fork+0x14/0x38
       0x0

other info that might help us debug this:

Chain exists of:
  regulator_list_mutex --> regulator_ww_class_acquire -->
regulator_ww_class_mutex

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(regulator_ww_class_mutex);
                               lock(regulator_ww_class_acquire);
                               lock(regulator_ww_class_mutex);
  lock(regulator_list_mutex);

 *** DEADLOCK ***

5 locks held by swapper/0/1:
 #0: dfb6e4c8 (&dev->mutex){....}-{3:3}, at: device_driver_attach+0x18/0x60
 #1: c1fedcd8 (&dev->mutex){....}-{3:3}, at: __device_attach+0x34/0x180
 #2: df53a4e8 (&dev->mutex){....}-{3:3}, at: __device_attach+0x34/0x180
 #3: df5224d8 (&dev->mutex){....}-{3:3}, at: __device_attach+0x34/0x180
 #4: df7190c0 (regulator_ww_class_mutex){+.+.}-{3:3}, at:
regulator_resolve_supply+0x44/0x318

stack backtrace:
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.11.0-rc1-00008-geaa7995c529b
#10095
Hardware name: Samsung Exynos (Flattened Device Tree)
[<c01116e8>] (unwind_backtrace) from [<c010cf58>] (show_stack+0x10/0x14)
[<c010cf58>] (show_stack) from [<c0b38ffc>] (dump_stack+0xa4/0xc4)
[<c0b38ffc>] (dump_stack) from [<c0193458>] (check_noncircular+0x14c/0x164)
[<c0193458>] (check_noncircular) from [<c0196b90>]
(__lock_acquire+0x1830/0x31cc)
[<c0196b90>] (__lock_acquire) from [<c01991e4>] (lock_acquire+0x2e4/0x5dc)
[<c01991e4>] (lock_acquire) from [<c0b4043c>] (__mutex_lock+0xa4/0xb60)
[<c0b4043c>] (__mutex_lock) from [<c0b40f14>] (mutex_lock_nested+0x1c/0x24)
[<c0b40f14>] (mutex_lock_nested) from [<c05ccd94>]
(regulator_lock_dependent+0x4c/0x2b0)
[<c05ccd94>] (regulator_lock_dependent) from [<c05d220c>]
(regulator_enable+0x30/0xe4)
[<c05d220c>] (regulator_enable) from [<c05d248c>]
(regulator_resolve_supply+0x1cc/0x318)
[<c05d248c>] (regulator_resolve_supply) from [<c05d2974>]
(regulator_register_resolve_supply+0x14/0x78)
[<c05d2974>] (regulator_register_resolve_supply) from [<c06a3000>]
(class_for_each_device+0x68/0xe8)
[<c06a3000>] (class_for_each_device) from [<c05d3e20>]
(regulator_register+0xa2c/0xc9c)
[<c05d3e20>] (regulator_register) from [<c05d5c70>]
(devm_regulator_register+0x40/0x70)
[<c05d5c70>] (devm_regulator_register) from [<c05dea58>]
(tps65090_regulator_probe+0x150/0x648)
[<c05dea58>] (tps65090_regulator_probe) from [<c06a3fe8>]
(platform_probe+0x80/0xc0)
[<c06a3fe8>] (platform_probe) from [<c06a1114>] (really_probe+0x1c4/0x4e4)
[<c06a1114>] (really_probe) from [<c06a14ac>]
(driver_probe_device+0x78/0x1d8)
[<c06a14ac>] (driver_probe_device) from [<c069f1a4>]
(bus_for_each_drv+0x78/0xbc)
[<c069f1a4>] (bus_for_each_drv) from [<c06a0eb0>]
(__device_attach+0xe8/0x180)
[<c06a0eb0>] (__device_attach) from [<c069ff50>]
(bus_probe_device+0x88/0x90)
[<c069ff50>] (bus_probe_device) from [<c069dbac>] (device_add+0x4c4/0x7e8)
[<c069dbac>] (device_add) from [<c06a3bac>]
(platform_device_add+0x120/0x25c)
[<c06a3bac>] (platform_device_add) from [<c06d5c7c>]
(mfd_add_devices+0x580/0x60c)
[<c06d5c7c>] (mfd_add_devices) from [<c06d80e8>]
(tps65090_i2c_probe+0xb8/0x184)
[<c06d80e8>] (tps65090_i2c_probe) from [<c0822520>]
(i2c_device_probe+0x234/0x2a4)
[<c0822520>] (i2c_device_probe) from [<c06a1114>] (really_probe+0x1c4/0x4e4)
[<c06a1114>] (really_probe) from [<c06a14ac>]
(driver_probe_device+0x78/0x1d8)
[<c06a14ac>] (driver_probe_device) from [<c069f1a4>]
(bus_for_each_drv+0x78/0xbc)
[<c069f1a4>] (bus_for_each_drv) from [<c06a0eb0>]
(__device_attach+0xe8/0x180)
[<c06a0eb0>] (__device_attach) from [<c069ff50>]
(bus_probe_device+0x88/0x90)
[<c069ff50>] (bus_probe_device) from [<c069dbac>] (device_add+0x4c4/0x7e8)
[<c069dbac>] (device_add) from [<c0824aec>]
(i2c_new_client_device+0x15c/0x27c)
[<c0824aec>] (i2c_new_client_device) from [<c08285e0>]
(of_i2c_register_devices+0x114/0x184)
[<c08285e0>] (of_i2c_register_devices) from [<c08254b8>]
(i2c_register_adapter+0x1d8/0x6dc)
[<c08254b8>] (i2c_register_adapter) from [<c082dd1c>]
(ec_i2c_probe+0xc8/0x124)
[<c082dd1c>] (ec_i2c_probe) from [<c06a3fe8>] (platform_probe+0x80/0xc0)
[<c06a3fe8>] (platform_probe) from [<c06a1114>] (really_probe+0x1c4/0x4e4)
[<c06a1114>] (really_probe) from [<c06a14ac>]
(driver_probe_device+0x78/0x1d8)
[<c06a14ac>] (driver_probe_device) from [<c069f1a4>]
(bus_for_each_drv+0x78/0xbc)
[<c069f1a4>] (bus_for_each_drv) from [<c06a0eb0>]
(__device_attach+0xe8/0x180)
[<c06a0eb0>] (__device_attach) from [<c069ff50>]
(bus_probe_device+0x88/0x90)
[<c069ff50>] (bus_probe_device) from [<c069dbac>] (device_add+0x4c4/0x7e8)
[<c069dbac>] (device_add) from [<c08b140c>]
(of_platform_device_create_pdata+0x90/0xc8)
[<c08b140c>] (of_platform_device_create_pdata) from [<c08b15f0>]
(of_platform_bus_create+0x1a0/0x4ec)
[<c08b15f0>] (of_platform_bus_create) from [<c08b1af0>]
(of_platform_populate+0x88/0x120)
[<c08b1af0>] (of_platform_populate) from [<c08b1bdc>]
(devm_of_platform_populate+0x40/0x80)
[<c08b1bdc>] (devm_of_platform_populate) from [<c08b72fc>]
(cros_ec_register+0x174/0x308)
[<c08b72fc>] (cros_ec_register) from [<c08b868c>]
(cros_ec_spi_probe+0x16c/0x1ec)
[<c08b868c>] (cros_ec_spi_probe) from [<c071b2f4>] (spi_probe+0x88/0xac)
[<c071b2f4>] (spi_probe) from [<c06a1114>] (really_probe+0x1c4/0x4e4)
[<c06a1114>] (really_probe) from [<c06a14ac>]
(driver_probe_device+0x78/0x1d8)
[<c06a14ac>] (driver_probe_device) from [<c06a19c4>]
(device_driver_attach+0x58/0x60)
[<c06a19c4>] (device_driver_attach) from [<c06a1ac8>]
(__driver_attach+0xfc/0x160)
[<c06a1ac8>] (__driver_attach) from [<c069f0cc>]
(bus_for_each_dev+0x6c/0xb8)
[<c069f0cc>] (bus_for_each_dev) from [<c06a0204>]
(bus_add_driver+0x170/0x20c)
[<c06a0204>] (bus_add_driver) from [<c06a2968>] (driver_register+0x78/0x10c)
[<c06a2968>] (driver_register) from [<c0102428>]
(do_one_initcall+0x88/0x438)
[<c0102428>] (do_one_initcall) from [<c1101104>]
(kernel_init_freeable+0x18c/0x1dc)
[<c1101104>] (kernel_init_freeable) from [<c0b3c65c>]
(kernel_init+0x8/0x118)
[<c0b3c65c>] (kernel_init) from [<c010011c>] (ret_from_fork+0x14/0x38)
Exception stack(0xc1ce3fb0 to 0xc1ce3ff8)
3fa0:                                     00000000 00000000 00000000
00000000
3fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000
3fe0: 00000000 00000000 00000000 00000000 00000013 00000000

I didn't analyze it yet if this warning is really an issue or just a
false positive. If you have any hints or comments let me know.

> ---
> drivers/regulator/core.c | 39 ++++++++++++++++++++++++++++-----------
> 1 file changed, 28 insertions(+), 11 deletions(-)
>
> diff --git a/drivers/regulator/core.c b/drivers/regulator/core.c
> index fee9241..3ae5ccd 100644
> --- a/drivers/regulator/core.c
> +++ b/drivers/regulator/core.c
> @@ -1813,23 +1813,34 @@ static int regulator_resolve_supply(struct regulator_dev *rdev)
> {
> struct regulator_dev *r;
> struct device *dev = rdev->dev.parent;
> - int ret;
> + int ret = 0;
>
> /* No supply to resolve? */
> if (!rdev->supply_name)
> return 0;
>
> - /* Supply already resolved? */
> + /* Supply already resolved? (fast-path without locking contention) */
> if (rdev->supply)
> return 0;
>
> + /*
> + * Recheck rdev->supply with rdev->mutex lock held to avoid a race
> + * between rdev->supply null check and setting rdev->supply in
> + * set_supply() from concurrent tasks.
> + */
> + regulator_lock(rdev);
> +
> + /* Supply just resolved by a concurrent task? */
> + if (rdev->supply)
> + goto out;
> +
> r = regulator_dev_lookup(dev, rdev->supply_name);
> if (IS_ERR(r)) {
> ret = PTR_ERR(r);
>
> /* Did the lookup explicitly defer for us? */
> if (ret == -EPROBE_DEFER)
> - return ret;
> + goto out;
>
> if (have_full_constraints()) {
> r = dummy_regulator_rdev;
> @@ -1837,15 +1848,18 @@ static int regulator_resolve_supply(struct regulator_dev *rdev)
> } else {
> dev_err(dev, "Failed to resolve %s-supply for %s\n",
> rdev->supply_name, rdev->desc->name);
> - return -EPROBE_DEFER;
> + ret = -EPROBE_DEFER;
> + goto out;
> }
> }
>
> if (r == rdev) {
> dev_err(dev, "Supply for %s (%s) resolved to itself\n",
> rdev->desc->name, rdev->supply_name);
> - if (!have_full_constraints())
> - return -EINVAL;
> + if (!have_full_constraints()) {
> + ret = -EINVAL;
> + goto out;
> + }
> r = dummy_regulator_rdev;
> get_device(&r->dev);
> }
> @@ -1859,7 +1873,8 @@ static int regulator_resolve_supply(struct regulator_dev *rdev)
> if (r->dev.parent && r->dev.parent != rdev->dev.parent) {
> if (!device_is_bound(r->dev.parent)) {
> put_device(&r->dev);
> - return -EPROBE_DEFER;
> + ret = -EPROBE_DEFER;
> + goto out;
> }
> }
>
> @@ -1867,13 +1882,13 @@ static int regulator_resolve_supply(struct regulator_dev *rdev)
> ret = regulator_resolve_supply(r);
> if (ret < 0) {
> put_device(&r->dev);
> - return ret;
> + goto out;
> }
>
> ret = set_supply(rdev, r);
> if (ret < 0) {
> put_device(&r->dev);
> - return ret;
> + goto out;
> }
>
> /*
> @@ -1886,11 +1901,13 @@ static int regulator_resolve_supply(struct regulator_dev *rdev)
> if (ret < 0) {
> _regulator_put(rdev->supply);
> rdev->supply = NULL;
> - return ret;
> + goto out;
> }
> }
>
> - return 0;
> +out:
> + regulator_unlock(rdev);
> + return ret;
> }
>
> /* Internal regulator request function */

Best regards
--
Marek Szyprowski, PhD
Samsung R&D Institute Poland

2021-01-18 20:55:31

by Mark Brown

[permalink] [raw]
Subject: Re: [PATCH] regulator: core: avoid regulator_resolve_supply() race condition

On Tue, Jan 12, 2021 at 10:34:19PM +0100, Marek Szyprowski wrote:

> ======================================================
> WARNING: possible circular locking dependency detected
> 5.11.0-rc1-00008-geaa7995c529b #10095 Not tainted
> ------------------------------------------------------
> swapper/0/1 is trying to acquire lock:
> c12e1b80 (regulator_list_mutex){+.+.}-{3:3}, at:
> regulator_lock_dependent+0x4c/0x2b0

If you're sending backtraces or other enormous reports like this please
run them through addr2line first so that things are a bit more leigible.

> but task is already holding lock:
> df7190c0 (regulator_ww_class_mutex){+.+.}-{3:3}, at:
> regulator_resolve_supply+0x44/0x318
>
> which lock already depends on the new lock.

Does this help (completely untested):

diff --git a/drivers/regulator/core.c b/drivers/regulator/core.c
index 3ae5ccd9277d..7d1422b00974 100644
--- a/drivers/regulator/core.c
+++ b/drivers/regulator/core.c
@@ -1823,17 +1823,6 @@ static int regulator_resolve_supply(struct regulator_dev *rdev)
if (rdev->supply)
return 0;

- /*
- * Recheck rdev->supply with rdev->mutex lock held to avoid a race
- * between rdev->supply null check and setting rdev->supply in
- * set_supply() from concurrent tasks.
- */
- regulator_lock(rdev);
-
- /* Supply just resolved by a concurrent task? */
- if (rdev->supply)
- goto out;
-
r = regulator_dev_lookup(dev, rdev->supply_name);
if (IS_ERR(r)) {
ret = PTR_ERR(r);
@@ -1885,10 +1874,23 @@ static int regulator_resolve_supply(struct regulator_dev *rdev)
goto out;
}

+ /*
+ * Recheck rdev->supply with rdev->mutex lock held to avoid a race
+ * between rdev->supply null check and setting rdev->supply in
+ * set_supply() from concurrent tasks.
+ */
+ regulator_lock(rdev);
+
+ /* Supply just resolved by a concurrent task? */
+ if (rdev->supply) {
+ put_device(&r->dev);
+ goto out_rdev_lock;
+ }
+
ret = set_supply(rdev, r);
if (ret < 0) {
put_device(&r->dev);
- goto out;
+ goto out_rdev_lock;
}

/*
@@ -1901,12 +1903,13 @@ static int regulator_resolve_supply(struct regulator_dev *rdev)
if (ret < 0) {
_regulator_put(rdev->supply);
rdev->supply = NULL;
- goto out;
+ goto out_rdev_lock;
}
}

-out:
+out_rdev_lock:
regulator_unlock(rdev);
+out:
return ret;
}


Attachments:
(No filename) (2.34 kB)
signature.asc (499.00 B)
Download all attachments

2021-01-19 04:37:55

by Naresh Kamboju

[permalink] [raw]
Subject: Re: [PATCH] regulator: core: avoid regulator_resolve_supply() race condition

Hi,

On Wed, 13 Jan 2021 at 03:21, Marek Szyprowski <[email protected]> wrote:
>
> Hi,
>

<trim>

>
> This patch landed in linux next-20210112 as commit eaa7995c529b
> ("regulator: core: avoid regulator_resolve_supply() race condition"). I
> found that it triggers a following lockdep warning during the DWC3
> driver registration on some Exynos based boards (this log is from
> Samsung Exynos5420-based Peach-Pit board):
>
> ======================================================
> WARNING: possible circular locking dependency detected
> 5.11.0-rc1-00008-geaa7995c529b #10095 Not tainted
> ------------------------------------------------------
> swapper/0/1 is trying to acquire lock:
> c12e1b80 (regulator_list_mutex){+.+.}-{3:3}, at:
> regulator_lock_dependent+0x4c/0x2b0
>
> but task is already holding lock:
> df7190c0 (regulator_ww_class_mutex){+.+.}-{3:3}, at:
> regulator_resolve_supply+0x44/0x318

LKFT testing also found this lockdep warning on
arm64 - hi6220-hikey while booting.

[ 0.635532] WARNING: possible recursive locking detected
[ 0.635558] 5.11.0-rc3-next-20210118 #1 Not tainted
[ 0.635585] --------------------------------------------
[ 0.635611] swapper/0/1 is trying to acquire lock:
[ 0.635636] ffff000000a13158
(regulator_ww_class_mutex){+.+.}-{3:3}, at:
regulator_lock_recursive+0x9c/0x1e8
[ 0.635721]
[ 0.635721] but task is already holding lock:
[ 0.635749] ffff000000a13958
(regulator_ww_class_mutex){+.+.}-{3:3}, at:
regulator_resolve_supply+0x70/0x2f0
[ 0.635817]
[ 0.635817] other info that might help us debug this:
[ 0.635847] Possible unsafe locking scenario:
[ 0.635847]
[ 0.635875] CPU0
[ 0.635892] ----
[ 0.635909] lock(regulator_ww_class_mutex);
[ 0.635942] lock(regulator_ww_class_mutex);
[ 0.635974]
[ 0.635974] *** DEADLOCK ***
[ 0.635974]
[ 0.636002] May be due to missing lock nesting notation
[ 0.636002]
[ 0.636033] 4 locks held by swapper/0/1:
[ 0.636057] #0: ffff000000a02988 (&dev->mutex){....}-{3:3}, at:
__device_driver_lock+0x38/0x70
[ 0.636131] #1: ffff000000a13958
(regulator_ww_class_mutex){+.+.}-{3:3}, at:
regulator_resolve_supply+0x70/0x2f0
[ 0.636205] #2: ffff800012b102c0
(regulator_list_mutex){+.+.}-{3:3}, at:
regulator_lock_dependent+0x5c/0x290
[ 0.636280] #3: ffff8000137e3918
(regulator_ww_class_acquire){+.+.}-{0:0}, at:
regulator_enable+0x40/0xe0
[ 0.636352]
[ 0.636352] stack backtrace:
[ 0.636378] CPU: 0 PID: 1 Comm: swapper/0 Not tainted
5.11.0-rc3-next-20210118 #1
[ 0.636415] Hardware name: HiKey Development Board (DT)
[ 0.636443] Call trace:
[ 0.636460] dump_backtrace+0x0/0x1f0
[ 0.636490] show_stack+0x2c/0x80
[ 0.636516] dump_stack+0xf8/0x160
[ 0.636543] __lock_acquire+0xa3c/0x1718
[ 0.636571] lock_acquire+0x3d8/0x4f0
[ 0.636596] __ww_mutex_lock.constprop.14+0xbc/0xf68
[ 0.636628] ww_mutex_lock+0x6c/0x3e8
[ 0.636653] regulator_lock_recursive+0x9c/0x1e8
[ 0.636683] regulator_lock_dependent+0x198/0x290
[ 0.636713] regulator_enable+0x40/0xe0
[ 0.636739] regulator_resolve_supply+0x1e8/0x2f0
[ 0.636767] regulator_register_resolve_supply+0x24/0x80
[ 0.636797] class_for_each_device+0x78/0xf8
[ 0.636825] regulator_register+0x840/0xbb0
[ 0.636851] devm_regulator_register+0x50/0xa8
[ 0.636879] reg_fixed_voltage_probe+0x224/0x410
[ 0.636908] platform_probe+0x6c/0xd8
[ 0.636932] really_probe+0x2b8/0x520
[ 0.636960] driver_probe_device+0xf4/0x168
[ 0.636988] device_driver_attach+0x74/0x98
[ 0.637014] __driver_attach+0xc4/0x178
[ 0.637039] bus_for_each_dev+0x84/0xd8
[ 0.637066] driver_attach+0x30/0x40
[ 0.637092] bus_add_driver+0x170/0x258
[ 0.637119] driver_register+0x64/0x118
[ 0.637144] __platform_driver_register+0x34/0x40
[ 0.637172] regulator_fixed_voltage_init+0x20/0x28
[ 0.637205] do_one_initcall+0x94/0x4a0
[ 0.637231] kernel_init_freeable+0x2f0/0x344
[ 0.637261] kernel_init+0x18/0x120

Reported-by: Naresh Kamboju <[email protected]>

Full boot log here:
https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20210118/testrun/3771538/suite/linux-log-parser/test/check-kernel-warning-2159912/log

metadata:
git branch: master
git repo: https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
git describe: next-20210112
kernel-config:
http://snapshots.linaro.org/openembedded/lkft/lkft/sumo/hikey/lkft/linux-next/935/config


--
Linaro LKFT
https://lkft.linaro.org

2021-01-21 09:46:06

by Marek Szyprowski

[permalink] [raw]
Subject: Re: [PATCH] regulator: core: avoid regulator_resolve_supply() race condition

Hi Mark,

On 18.01.2021 21:49, Mark Brown wrote:
> On Tue, Jan 12, 2021 at 10:34:19PM +0100, Marek Szyprowski wrote:
>> ======================================================
>> WARNING: possible circular locking dependency detected
>> 5.11.0-rc1-00008-geaa7995c529b #10095 Not tainted
>> ------------------------------------------------------
>> swapper/0/1 is trying to acquire lock:
>> c12e1b80 (regulator_list_mutex){+.+.}-{3:3}, at:
>> regulator_lock_dependent+0x4c/0x2b0
> If you're sending backtraces or other enormous reports like this please
> run them through addr2line first so that things are a bit more leigible.

Well, I had a little time to process that issue, so I just copy-pasted
the kernel log with the hope it will be useful. The trace is really
long, but the function call stack is imho readable.

If you need more details about any specific trace, just ask. I don't
know any good method of processing the raw kernel logs with addr2line
and keeping things readable.

>> but task is already holding lock:
>> df7190c0 (regulator_ww_class_mutex){+.+.}-{3:3}, at:
>> regulator_resolve_supply+0x44/0x318
>>
>> which lock already depends on the new lock.
> Does this help (completely untested):

Sadly nope. I get same warning:

======================================================
WARNING: possible circular locking dependency detected
5.11.0-rc3-next-20210118-00005-g56a65ff7ca8b #10162 Not tainted
------------------------------------------------------
swapper/0/1 is trying to acquire lock:
c12e1e40 (regulator_list_mutex){+.+.}-{3:3}, at:
regulator_lock_dependent+0x4c/0x2b4

but task is already holding lock:
df4fe8c0 (regulator_ww_class_mutex){+.+.}-{3:3}, at:
regulator_resolve_supply+0x98/0x320

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #2 (regulator_ww_class_mutex){+.+.}-{3:3}:
       ww_mutex_lock+0x48/0x88
       regulator_lock_recursive+0x84/0x1f4
       regulator_lock_dependent+0x188/0x2b4
       regulator_enable+0x30/0xe4
       dwc3_exynos_probe+0x17c/0x2c0
       platform_probe+0x80/0xc0
       really_probe+0x1d4/0x4ec
       driver_probe_device+0x78/0x1d8
       device_driver_attach+0x58/0x60
       __driver_attach+0xfc/0x160
       bus_for_each_dev+0x6c/0xb8
       bus_add_driver+0x170/0x20c
       driver_register+0x78/0x10c
       do_one_initcall+0x88/0x438
       kernel_init_freeable+0x190/0x1e0
       kernel_init+0x8/0x118
       ret_from_fork+0x14/0x38
       0x0

-> #1 (regulator_ww_class_acquire){+.+.}-{0:0}:
       regulator_enable+0x30/0xe4
       dwc3_exynos_probe+0x17c/0x2c0
       platform_probe+0x80/0xc0
       really_probe+0x1d4/0x4ec
       driver_probe_device+0x78/0x1d8
       device_driver_attach+0x58/0x60
       __driver_attach+0xfc/0x160
       bus_for_each_dev+0x6c/0xb8
       bus_add_driver+0x170/0x20c
       driver_register+0x78/0x10c
       do_one_initcall+0x88/0x438
       kernel_init_freeable+0x190/0x1e0
       kernel_init+0x8/0x118
       ret_from_fork+0x14/0x38
       0x0

-> #0 (regulator_list_mutex){+.+.}-{3:3}:
       lock_acquire+0x314/0x5d0
       __mutex_lock+0xa4/0xb60
       mutex_lock_nested+0x1c/0x24
       regulator_lock_dependent+0x4c/0x2b4
       regulator_enable+0x30/0xe4
       regulator_resolve_supply+0x1d0/0x320
       regulator_register_resolve_supply+0x14/0x78
       class_for_each_device+0x68/0xe8
       regulator_register+0xa30/0xca0
       devm_regulator_register+0x40/0x70
       tps65090_regulator_probe+0x150/0x648
       platform_probe+0x80/0xc0
       really_probe+0x1d4/0x4ec
       driver_probe_device+0x78/0x1d8
       bus_for_each_drv+0x78/0xbc
       __device_attach+0xe8/0x180
       bus_probe_device+0x88/0x90
       device_add+0x4c8/0x7ec
       platform_device_add+0x120/0x25c
       mfd_add_devices+0x580/0x60c
       tps65090_i2c_probe+0xb8/0x184
       i2c_device_probe+0x234/0x2a4
       really_probe+0x1d4/0x4ec
       driver_probe_device+0x78/0x1d8
       bus_for_each_drv+0x78/0xbc
       __device_attach+0xe8/0x180
       bus_probe_device+0x88/0x90
       device_add+0x4c8/0x7ec
       i2c_new_client_device+0x15c/0x27c
       of_i2c_register_devices+0x114/0x184
       i2c_register_adapter+0x1d8/0x6dc
       ec_i2c_probe+0xc8/0x124
       platform_probe+0x80/0xc0
       really_probe+0x1d4/0x4ec
       driver_probe_device+0x78/0x1d8
       bus_for_each_drv+0x78/0xbc
       __device_attach+0xe8/0x180
       bus_probe_device+0x88/0x90
       device_add+0x4c8/0x7ec
       of_platform_device_create_pdata+0x90/0xc8
       of_platform_bus_create+0x1a0/0x4ec
       of_platform_populate+0x88/0x120
       devm_of_platform_populate+0x40/0x80
       cros_ec_register+0x174/0x308
       cros_ec_spi_probe+0x16c/0x1ec
       spi_probe+0x88/0xac
       really_probe+0x1d4/0x4ec
       driver_probe_device+0x78/0x1d8
       device_driver_attach+0x58/0x60
       __driver_attach+0xfc/0x160
       bus_for_each_dev+0x6c/0xb8
       bus_add_driver+0x170/0x20c
       driver_register+0x78/0x10c
       do_one_initcall+0x88/0x438
       kernel_init_freeable+0x190/0x1e0
       kernel_init+0x8/0x118
       ret_from_fork+0x14/0x38
       0x0

other info that might help us debug this:

Chain exists of:
  regulator_list_mutex --> regulator_ww_class_acquire -->
regulator_ww_class_mutex

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(regulator_ww_class_mutex);
                               lock(regulator_ww_class_acquire);
                               lock(regulator_ww_class_mutex);
  lock(regulator_list_mutex);

 *** DEADLOCK ***

5 locks held by swapper/0/1:
 #0: dfbef0c8 (&dev->mutex){....}-{3:3}, at: device_driver_attach+0x18/0x60
 #1: df4f84d8 (&dev->mutex){....}-{3:3}, at: __device_attach+0x34/0x180
 #2: df4f98e8 (&dev->mutex){....}-{3:3}, at: __device_attach+0x34/0x180
 #3: df509cd8 (&dev->mutex){....}-{3:3}, at: __device_attach+0x34/0x180
 #4: df4fe8c0 (regulator_ww_class_mutex){+.+.}-{3:3}, at:
regulator_resolve_supply+0x98/0x320

stack backtrace:
CPU: 3 PID: 1 Comm: swapper/0 Not tainted
5.11.0-rc3-next-20210118-00005-g56a65ff7ca8b #10162
Hardware name: Samsung Exynos (Flattened Device Tree)
[<c01116e8>] (unwind_backtrace) from [<c010cf58>] (show_stack+0x10/0x14)
[<c010cf58>] (show_stack) from [<c0b443c0>] (dump_stack+0xa4/0xc4)
[<c0b443c0>] (dump_stack) from [<c01932e0>] (check_noncircular+0x14c/0x164)
[<c01932e0>] (check_noncircular) from [<c0196a08>]
(__lock_acquire+0x181c/0x3204)
[<c0196a08>] (__lock_acquire) from [<c01990cc>] (lock_acquire+0x314/0x5d0)
[<c01990cc>] (lock_acquire) from [<c0b4bd54>] (__mutex_lock+0xa4/0xb60)
[<c0b4bd54>] (__mutex_lock) from [<c0b4c82c>] (mutex_lock_nested+0x1c/0x24)
[<c0b4c82c>] (mutex_lock_nested) from [<c05d4544>]
(regulator_lock_dependent+0x4c/0x2b4)
[<c05d4544>] (regulator_lock_dependent) from [<c05d99c0>]
(regulator_enable+0x30/0xe4)
[<c05d99c0>] (regulator_enable) from [<c05d9c44>]
(regulator_resolve_supply+0x1d0/0x320)
[<c05d9c44>] (regulator_resolve_supply) from [<c05da130>]
(regulator_register_resolve_supply+0x14/0x78)
[<c05da130>] (regulator_register_resolve_supply) from [<c06aba80>]
(class_for_each_device+0x68/0xe8)
[<c06aba80>] (class_for_each_device) from [<c05db5e0>]
(regulator_register+0xa30/0xca0)
[<c05db5e0>] (regulator_register) from [<c05dd430>]
(devm_regulator_register+0x40/0x70)
[<c05dd430>] (devm_regulator_register) from [<c05e6218>]
(tps65090_regulator_probe+0x150/0x648)
[<c05e6218>] (tps65090_regulator_probe) from [<c06aca70>]
(platform_probe+0x80/0xc0)
[<c06aca70>] (platform_probe) from [<c06a9b9c>] (really_probe+0x1d4/0x4ec)
[<c06a9b9c>] (really_probe) from [<c06a9f2c>]
(driver_probe_device+0x78/0x1d8)
[<c06a9f2c>] (driver_probe_device) from [<c06a7c24>]
(bus_for_each_drv+0x78/0xbc)
[<c06a7c24>] (bus_for_each_drv) from [<c06a9928>]
(__device_attach+0xe8/0x180)
[<c06a9928>] (__device_attach) from [<c06a89d0>]
(bus_probe_device+0x88/0x90)
[<c06a89d0>] (bus_probe_device) from [<c06a662c>] (device_add+0x4c8/0x7ec)
[<c06a662c>] (device_add) from [<c06ac634>]
(platform_device_add+0x120/0x25c)
[<c06ac634>] (platform_device_add) from [<c06de87c>]
(mfd_add_devices+0x580/0x60c)
[<c06de87c>] (mfd_add_devices) from [<c06e0ce8>]
(tps65090_i2c_probe+0xb8/0x184)
[<c06e0ce8>] (tps65090_i2c_probe) from [<c082d2b8>]
(i2c_device_probe+0x234/0x2a4)
[<c082d2b8>] (i2c_device_probe) from [<c06a9b9c>] (really_probe+0x1d4/0x4ec)
[<c06a9b9c>] (really_probe) from [<c06a9f2c>]
(driver_probe_device+0x78/0x1d8)
[<c06a9f2c>] (driver_probe_device) from [<c06a7c24>]
(bus_for_each_drv+0x78/0xbc)
[<c06a7c24>] (bus_for_each_drv) from [<c06a9928>]
(__device_attach+0xe8/0x180)
[<c06a9928>] (__device_attach) from [<c06a89d0>]
(bus_probe_device+0x88/0x90)
[<c06a89d0>] (bus_probe_device) from [<c06a662c>] (device_add+0x4c8/0x7ec)
[<c06a662c>] (device_add) from [<c082f884>]
(i2c_new_client_device+0x15c/0x27c)
[<c082f884>] (i2c_new_client_device) from [<c08332dc>]
(of_i2c_register_devices+0x114/0x184)
[<c08332dc>] (of_i2c_register_devices) from [<c0830250>]
(i2c_register_adapter+0x1d8/0x6dc)
[<c0830250>] (i2c_register_adapter) from [<c0838a1c>]
(ec_i2c_probe+0xc8/0x124)
[<c0838a1c>] (ec_i2c_probe) from [<c06aca70>] (platform_probe+0x80/0xc0)
[<c06aca70>] (platform_probe) from [<c06a9b9c>] (really_probe+0x1d4/0x4ec)
[<c06a9b9c>] (really_probe) from [<c06a9f2c>]
(driver_probe_device+0x78/0x1d8)
[<c06a9f2c>] (driver_probe_device) from [<c06a7c24>]
(bus_for_each_drv+0x78/0xbc)
[<c06a7c24>] (bus_for_each_drv) from [<c06a9928>]
(__device_attach+0xe8/0x180)
[<c06a9928>] (__device_attach) from [<c06a89d0>]
(bus_probe_device+0x88/0x90)
[<c06a89d0>] (bus_probe_device) from [<c06a662c>] (device_add+0x4c8/0x7ec)
[<c06a662c>] (device_add) from [<c08bba20>]
(of_platform_device_create_pdata+0x90/0xc8)
[<c08bba20>] (of_platform_device_create_pdata) from [<c08bbc04>]
(of_platform_bus_create+0x1a0/0x4ec)
[<c08bbc04>] (of_platform_bus_create) from [<c08bc104>]
(of_platform_populate+0x88/0x120)
[<c08bc104>] (of_platform_populate) from [<c08bc1f0>]
(devm_of_platform_populate+0x40/0x80)
[<c08bc1f0>] (devm_of_platform_populate) from [<c08c1910>]
(cros_ec_register+0x174/0x308)
[<c08c1910>] (cros_ec_register) from [<c08c2ca0>]
(cros_ec_spi_probe+0x16c/0x1ec)
[<c08c2ca0>] (cros_ec_spi_probe) from [<c07240fc>] (spi_probe+0x88/0xac)
[<c07240fc>] (spi_probe) from [<c06a9b9c>] (really_probe+0x1d4/0x4ec)
[<c06a9b9c>] (really_probe) from [<c06a9f2c>]
(driver_probe_device+0x78/0x1d8)
[<c06a9f2c>] (driver_probe_device) from [<c06aa444>]
(device_driver_attach+0x58/0x60)
[<c06aa444>] (device_driver_attach) from [<c06aa548>]
(__driver_attach+0xfc/0x160)
[<c06aa548>] (__driver_attach) from [<c06a7b4c>]
(bus_for_each_dev+0x6c/0xb8)
[<c06a7b4c>] (bus_for_each_dev) from [<c06a8c84>]
(bus_add_driver+0x170/0x20c)
[<c06a8c84>] (bus_add_driver) from [<c06ab3e8>] (driver_register+0x78/0x10c)
[<c06ab3e8>] (driver_register) from [<c0102428>]
(do_one_initcall+0x88/0x438)
[<c0102428>] (do_one_initcall) from [<c11010d4>]
(kernel_init_freeable+0x190/0x1e0)
[<c11010d4>] (kernel_init_freeable) from [<c0b47db0>]
(kernel_init+0x8/0x118)
[<c0b47db0>] (kernel_init) from [<c010011c>] (ret_from_fork+0x14/0x38)
Exception stack(0xc1ce3fb0 to 0xc1ce3ff8)

Best regards

--
Marek Szyprowski, PhD
Samsung R&D Institute Poland

2021-01-21 15:49:33

by Mark Brown

[permalink] [raw]
Subject: Re: [PATCH] regulator: core: avoid regulator_resolve_supply() race condition

On Thu, Jan 21, 2021 at 10:41:59AM +0100, Marek Szyprowski wrote:
> On 18.01.2021 21:49, Mark Brown wrote:

> > Does this help (completely untested):

> Sadly nope. I get same warning:

Try this instead:

diff --git a/drivers/regulator/core.c b/drivers/regulator/core.c
index 3ae5ccd9277d..31503776dbd7 100644
--- a/drivers/regulator/core.c
+++ b/drivers/regulator/core.c
@@ -1823,17 +1823,6 @@ static int regulator_resolve_supply(struct regulator_dev *rdev)
if (rdev->supply)
return 0;

- /*
- * Recheck rdev->supply with rdev->mutex lock held to avoid a race
- * between rdev->supply null check and setting rdev->supply in
- * set_supply() from concurrent tasks.
- */
- regulator_lock(rdev);
-
- /* Supply just resolved by a concurrent task? */
- if (rdev->supply)
- goto out;
-
r = regulator_dev_lookup(dev, rdev->supply_name);
if (IS_ERR(r)) {
ret = PTR_ERR(r);
@@ -1885,12 +1874,29 @@ static int regulator_resolve_supply(struct regulator_dev *rdev)
goto out;
}

+ /*
+ * Recheck rdev->supply with rdev->mutex lock held to avoid a race
+ * between rdev->supply null check and setting rdev->supply in
+ * set_supply() from concurrent tasks.
+ */
+ regulator_lock(rdev);
+
+ /* Supply just resolved by a concurrent task? */
+ if (rdev->supply) {
+ regulator_unlock(rdev);
+ put_device(&r->dev);
+ return ret;
+ }
+
ret = set_supply(rdev, r);
if (ret < 0) {
+ regulator_unlock(rdev);
put_device(&r->dev);
- goto out;
+ return ret;
}

+ regulator_unlock(rdev);
+
/*
* In set_machine_constraints() we may have turned this regulator on
* but we couldn't propagate to the supply if it hadn't been resolved
@@ -1901,12 +1907,11 @@ static int regulator_resolve_supply(struct regulator_dev *rdev)
if (ret < 0) {
_regulator_put(rdev->supply);
rdev->supply = NULL;
- goto out;
+ goto out_rdev_lock;
}
}

out:
- regulator_unlock(rdev);
return ret;
}


Attachments:
(No filename) (1.95 kB)
signature.asc (499.00 B)
Download all attachments

2021-01-21 20:33:33

by Marek Szyprowski

[permalink] [raw]
Subject: Re: [PATCH] regulator: core: avoid regulator_resolve_supply() race condition

Hi Mark,

On 21.01.2021 16:44, Mark Brown wrote:
> On Thu, Jan 21, 2021 at 10:41:59AM +0100, Marek Szyprowski wrote:
>> On 18.01.2021 21:49, Mark Brown wrote:
>>> Does this help (completely untested):
>> Sadly nope. I get same warning:
> Try this instead:
>
> diff --git a/drivers/regulator/core.c b/drivers/regulator/core.c
> index 3ae5ccd9277d..31503776dbd7 100644
> --- a/drivers/regulator/core.c
> +++ b/drivers/regulator/core.c
> @@ -1823,17 +1823,6 @@ static int regulator_resolve_supply(struct regulator_dev *rdev)
> if (rdev->supply)
> return 0;
>
> - /*
> - * Recheck rdev->supply with rdev->mutex lock held to avoid a race
> - * between rdev->supply null check and setting rdev->supply in
> - * set_supply() from concurrent tasks.
> - */
> - regulator_lock(rdev);
> -
> - /* Supply just resolved by a concurrent task? */
> - if (rdev->supply)
> - goto out;
> -
> r = regulator_dev_lookup(dev, rdev->supply_name);
> if (IS_ERR(r)) {
> ret = PTR_ERR(r);
> @@ -1885,12 +1874,29 @@ static int regulator_resolve_supply(struct regulator_dev *rdev)
> goto out;
> }
>
> + /*
> + * Recheck rdev->supply with rdev->mutex lock held to avoid a race
> + * between rdev->supply null check and setting rdev->supply in
> + * set_supply() from concurrent tasks.
> + */
> + regulator_lock(rdev);
> +
> + /* Supply just resolved by a concurrent task? */
> + if (rdev->supply) {
> + regulator_unlock(rdev);
> + put_device(&r->dev);
> + return ret;
> + }
> +
> ret = set_supply(rdev, r);
> if (ret < 0) {
> + regulator_unlock(rdev);
> put_device(&r->dev);
> - goto out;
> + return ret;
> }
>
> + regulator_unlock(rdev);
> +
> /*
> * In set_machine_constraints() we may have turned this regulator on
> * but we couldn't propagate to the supply if it hadn't been resolved
> @@ -1901,12 +1907,11 @@ static int regulator_resolve_supply(struct regulator_dev *rdev)
> if (ret < 0) {
> _regulator_put(rdev->supply);
> rdev->supply = NULL;
> - goto out;
> + goto out_rdev_lock;

drivers/regulator/core.c:1910:4: error: label ‘out_rdev_lock’ used but
not defined

> }
> }
>
> out:
> - regulator_unlock(rdev);
> return ret;
> }
>

It looks that it finally fixes the locking issue, with the above goto
removed completely to fix build. Feel free to add:

Reported-by: Marek Szyprowski <[email protected]>

Tested-by: Marek Szyprowski <[email protected]>

Best regards

--
Marek Szyprowski, PhD
Samsung R&D Institute Poland

2021-01-23 01:59:13

by David Collins

[permalink] [raw]
Subject: Re: [PATCH] regulator: core: avoid regulator_resolve_supply() race condition

Hello Mark,

On 1/21/21 12:30 PM, Marek Szyprowski wrote:
> Hi Mark,
>
> On 21.01.2021 16:44, Mark Brown wrote:
>> On Thu, Jan 21, 2021 at 10:41:59AM +0100, Marek Szyprowski wrote:
>>> On 18.01.2021 21:49, Mark Brown wrote:
>>>> Does this help (completely untested):
>>> Sadly nope. I get same warning:
>> Try this instead:
>>
>> diff --git a/drivers/regulator/core.c b/drivers/regulator/core.c
>> index 3ae5ccd9277d..31503776dbd7 100644
>> --- a/drivers/regulator/core.c
>> +++ b/drivers/regulator/core.c
>> @@ -1823,17 +1823,6 @@ static int regulator_resolve_supply(struct regulator_dev *rdev)
>> if (rdev->supply)
>> return 0;
>>
>> - /*
>> - * Recheck rdev->supply with rdev->mutex lock held to avoid a race
>> - * between rdev->supply null check and setting rdev->supply in
>> - * set_supply() from concurrent tasks.
>> - */
>> - regulator_lock(rdev);
>> -
>> - /* Supply just resolved by a concurrent task? */
>> - if (rdev->supply)
>> - goto out;
>> -
>> r = regulator_dev_lookup(dev, rdev->supply_name);
>> if (IS_ERR(r)) {
>> ret = PTR_ERR(r);
>> @@ -1885,12 +1874,29 @@ static int regulator_resolve_supply(struct regulator_dev *rdev)
>> goto out;
>> }
>>
>> + /*
>> + * Recheck rdev->supply with rdev->mutex lock held to avoid a race
>> + * between rdev->supply null check and setting rdev->supply in
>> + * set_supply() from concurrent tasks.
>> + */
>> + regulator_lock(rdev);
>> +
>> + /* Supply just resolved by a concurrent task? */
>> + if (rdev->supply) {
>> + regulator_unlock(rdev);
>> + put_device(&r->dev);
>> + return ret;
>> + }
>> +
>> ret = set_supply(rdev, r);
>> if (ret < 0) {
>> + regulator_unlock(rdev);
>> put_device(&r->dev);
>> - goto out;
>> + return ret;
>> }
>>
>> + regulator_unlock(rdev);
>> +
>> /*
>> * In set_machine_constraints() we may have turned this regulator on
>> * but we couldn't propagate to the supply if it hadn't been resolved
>> @@ -1901,12 +1907,11 @@ static int regulator_resolve_supply(struct regulator_dev *rdev)
>> if (ret < 0) {
>> _regulator_put(rdev->supply);
>> rdev->supply = NULL;
>> - goto out;
>> + goto out_rdev_lock;
>
> drivers/regulator/core.c:1910:4: error: label ‘out_rdev_lock’ used but
> not defined
>
>> }
>> }
>>
>> out:
>> - regulator_unlock(rdev);
>> return ret;
>> }
>>
>
> It looks that it finally fixes the locking issue, with the above goto
> removed completely to fix build. Feel free to add:
>
> Reported-by: Marek Szyprowski <[email protected]>
>
> Tested-by: Marek Szyprowski <[email protected]>

Thank you for making this fix. I'm sorry that I missed the potential
deadlock issue resulting from the regulator_enable() call inside
regulator_resolve_supply() with rdev->mutex locked. Your fix avoids
deadlock while still ensuring that the there isn't a set supply race
condition.

Take care,
David

--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project