2020-08-09 22:29:42

by Michał Mirosław

[permalink] [raw]
Subject: regulator: deadlock vs memory reclaim

Hi guys,

Commit f8702f9e4aa7 ("regulator: core: Use ww_mutex for regulators locking")
from Nov 2018 tried to fix possible deadlocks when handling coupled
regulators. Unfortunately it introduced another possible deadlock,
as discovered by lockdep (see below), instead.

regulator_lock_dependent() starts by taking regulator_list_mutex, The
same mutex covers eg. regulator initialization, including memory allocations
that happen there. This will deadlock when you have filesystem on eg. eMMC
(which uses a regulator to control module voltages) and you register
a new regulator (hotplug a device?) when under memory pressure.

There is also another problem with regulator_lock_dependent(): all the
w/w rollback stuff is useless: because of the outer lock, there can only
be one contendant doing multiple-lock-grabbing procedure. In this setup,
the procedure cannot detect other processes waiting on
regulator_lock_dependent() and it cannot signal (wound a transaction of)
current holders of locks taken by regulator_lock().

Basically, we have a BKL for regulator_enable() and we're using ww_mutex
as a recursive mutex with no deadlock prevention whatsoever. The locks
also seem to cover way to much (eg. initialization even before making the
regulator visible to the system).

To fix the regulator vs memory reclaim path I tried pushing all allocations
out of protected sections. After doing a few patches, though, I'm not sure
I'm going in the right direction. Your thoughts?

Best Regards
Micha??Miros?aw


======================================================
WARNING: possible circular locking dependency detected
5.7.13+ #537 Not tainted
------------------------------------------------------
kworker/u2:11/184 is trying to acquire lock:
c0e5d8d8 (regulator_list_mutex){+.+.}-{3:3}, at: regulator_lock_dependent+0x54/0x2c0

but task is already holding lock:
cca919dc (&sbi->write_io[i][j].io_rwsem){++++}-{3:3}, at: __submit_merged_write_cond+0xac/0x154

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #2 (&sbi->write_io[i][j].io_rwsem){++++}-{3:3}:
down_write+0x40/0x90
f2fs_submit_page_write+0x5c/0x660
do_write_page+0x60/0x12c
f2fs_outplace_write_data+0x64/0x134
f2fs_do_write_data_page+0x57c/0x9b4
f2fs_write_single_data_page+0x7f0/0x834
f2fs_write_data_page+0x74/0xac
shrink_page_list+0xaac/0x1010
shrink_inactive_list+0x200/0x43c
shrink_node+0x5ac/0x980
kswapd+0x444/0x930
kthread+0x168/0x16c
ret_from_fork+0x14/0x20
0x0

-> #1 (fs_reclaim){+.+.}-{0:0}:
fs_reclaim_acquire.part.11+0x40/0x50
fs_reclaim_acquire+0x24/0x28
__kmalloc+0x54/0x218
regulator_register+0x878/0x15dc
dummy_regulator_probe+0x60/0xa8
platform_drv_probe+0x58/0xa8
really_probe+0x1d8/0x468
driver_probe_device+0x174/0x1d0
device_driver_attach+0x68/0x70
__driver_attach+0xa0/0x154
bus_for_each_dev+0x84/0xc4
driver_attach+0x2c/0x30
bus_add_driver+0x128/0x208
driver_register+0x84/0x118
__platform_driver_register+0x50/0x58
regulator_dummy_init+0x74/0x98
regulator_init+0xa0/0xc0
do_one_initcall+0x7c/0x274
kernel_init_freeable+0x180/0x1f0
kernel_init+0x18/0x120
ret_from_fork+0x14/0x20
0x0

-> #0 (regulator_list_mutex){+.+.}-{3:3}:
__lock_acquire+0x15e8/0x2ddc
lock_acquire+0xf4/0x410
__mutex_lock+0x8c/0x628
mutex_lock_nested+0x2c/0x34
regulator_lock_dependent+0x54/0x2c0
regulator_get_voltage+0x38/0xec
regulator_is_supported_voltage+0x44/0x104
mmc_regulator_set_voltage_if_supported+0x2c/0x68
mmc_regulator_set_vqmmc+0xf0/0x190
sdhci_start_signal_voltage_switch+0xf4/0x34c
sdhci_runtime_resume_host+0x88/0x26c
sdhci_at91_runtime_resume+0x3c/0x54
pm_generic_runtime_resume+0x3c/0x48
__rpm_callback+0x84/0x13c
rpm_callback+0x64/0x90
rpm_resume+0x5d4/0x740
__pm_runtime_resume+0x5c/0x74
__mmc_claim_host+0x1d8/0x228
mmc_get_card+0x38/0x3c
mmc_mq_queue_rq+0x270/0x278
__blk_mq_try_issue_directly+0x150/0x1dc
blk_mq_request_issue_directly+0x58/0x88
blk_mq_try_issue_list_directly+0x60/0xdc
blk_mq_sched_insert_requests+0x15c/0x1d8
blk_mq_flush_plug_list+0x1ac/0x278
blk_flush_plug_list+0xd8/0xf4
blk_mq_make_request+0x31c/0x6d0
generic_make_request+0xc8/0x318
submit_bio+0x44/0x180
__submit_bio+0x74/0x41c
__submit_merged_bio+0x6c/0x1cc
__submit_merged_write_cond+0xe0/0x154
f2fs_write_cache_pages+0x75c/0x7c0
f2fs_write_data_pages+0x350/0x3c0
do_writepages+0x54/0xf0
__writeback_single_inode+0x60/0x4cc
writeback_sb_inodes+0x200/0x4b4
__writeback_inodes_wb+0x70/0xbc
wb_writeback+0x2c0/0x3a0
wb_workfn+0x3cc/0x568
process_one_work+0x2c4/0x684
worker_thread+0x60/0x588
kthread+0x168/0x16c
ret_from_fork+0x14/0x20
0x0

other info that might help us debug this:

Chain exists of:
regulator_list_mutex --> fs_reclaim --> &sbi->write_io[i][j].io_rwsem

Possible unsafe locking scenario:

CPU0 CPU1
---- ----
lock(&sbi->write_io[i][j].io_rwsem);
lock(fs_reclaim);
lock(&sbi->write_io[i][j].io_rwsem);
lock(regulator_list_mutex);

*** DEADLOCK ***

6 locks held by kworker/u2:11/184:
#0: cd703ea4 ((wq_completion)writeback){+.+.}-{0:0}, at: process_one_work+0x22c/0x684
#1: cd71fef0 ((work_completion)(&(&wb->dwork)->work)){+.+.}-{0:0}, at: process_one_work+0x22c/0x684
#2: cca9787c (&type->s_umount_key#25){.+.+}-{3:3}, at: trylock_super+0x24/0x68
#3: cca96094 (&sbi->writepages){+.+.}-{3:3}, at: f2fs_write_data_pages+0x338/0x3c0
#4: cca919dc (&sbi->write_io[i][j].io_rwsem){++++}-{3:3}, at: __submit_merged_write_cond+0xac/0x154
#5: cb69cfe0 (hctx->srcu){....}-{0:0}, at: hctx_lock+0x60/0xb8

stack backtrace:
CPU: 0 PID: 184 Comm: kworker/u2:11 Not tainted 5.7.13+ #537
Hardware name: Atmel SAMA5
Workqueue: writeback wb_workfn (flush-179:0)
Backtrace:
[<c010d5a8>] (dump_backtrace) from [<c010d848>] (show_stack+0x20/0x24)
r7:c140ecb8 r6:c1326a98 r5:c1337b38 r4:c1327ff8
[<c010d828>] (show_stack) from [<c04cb068>] (dump_stack+0x24/0x28)
[<c04cb044>] (dump_stack) from [<c015f9b0>] (print_circular_bug+0x27c/0x2d8)
[<c015f734>] (print_circular_bug) from [<c015fb58>] (check_noncircular+0x14c/0x204)
r10:c14ca130 r9:cd564700 r8:00000000 r7:cd71f2e8 r6:cd5646c0 r5:cd564700
r4:c0e08a88
[<c015fa0c>] (check_noncircular) from [<c01628e4>] (__lock_acquire+0x15e8/0x2ddc)
r9:cd564700 r8:00000006 r7:cd564200 r6:c1327ff8 r5:cd5646c0 r4:00000005
[<c01612fc>] (__lock_acquire) from [<c0164a0c>] (lock_acquire+0xf4/0x410)
r10:00000000 r9:c0e5d8d8 r8:00000000 r7:cb532800 r6:c0ea2bc8 r5:ffffe000
r4:c0e08a88
[<c0164918>] (lock_acquire) from [<c098197c>] (__mutex_lock+0x8c/0x628)
r10:c14ca130 r9:c0158a08 r8:c0e08a88 r7:cb532800 r6:00000000 r5:00000000
r4:c0e5d8a4
[<c09818f0>] (__mutex_lock) from [<c0981f44>] (mutex_lock_nested+0x2c/0x34)
r10:ffffe000 r9:c0158a08 r8:c0e08a88 r7:cb532800 r6:c0e5d634 r5:00000000
r4:cd71f4ec
[<c0981f18>] (mutex_lock_nested) from [<c051c110>] (regulator_lock_dependent+0x54/0x2c0)
[<c051c0bc>] (regulator_lock_dependent) from [<c051c3b4>] (regulator_get_voltage+0x38/0xec)
r10:ffffe000 r9:c0158a08 r8:cb532800 r7:0036ee80 r6:002dc6c0 r5:cb5ff300
r4:c0e08a88
[<c051c37c>] (regulator_get_voltage) from [<c051e08c>] (regulator_is_supported_voltage+0x44/0x104)
r6:002dc6c0 r5:cb5ff300 r4:0036ee80
[<c051e048>] (regulator_is_supported_voltage) from [<c067774c>] (mmc_regulator_set_voltage_if_supported+0x2c/0x68)
r9:c0158a08 r8:00004087 r7:002dc6c0 r6:00325aa0 r5:cb5ff300 r4:0036ee80
[<c0677720>] (mmc_regulator_set_voltage_if_supported) from [<c0677878>] (mmc_regulator_set_vqmmc+0xf0/0x190)
r7:0036ee80 r6:00325aa0 r5:002dc6c0 r4:cd635000
[<c0677788>] (mmc_regulator_set_vqmmc) from [<c0683230>] (sdhci_start_signal_voltage_switch+0xf4/0x34c)
r7:c0ea1ce5 r6:cd63536c r5:00000000 r4:cd635000
[<c068313c>] (sdhci_start_signal_voltage_switch) from [<c0683c0c>] (sdhci_runtime_resume_host+0x88/0x26c)
r10:ffffe000 r9:c0158a08 r8:00004087 r7:cd63536c r6:cd635734 r5:cd635000
r4:cd635540 r3:c068313c
[<c0683b84>] (sdhci_runtime_resume_host) from [<c068873c>] (sdhci_at91_runtime_resume+0x3c/0x54)
r10:ffffe000 r9:c0158a08 r8:00000000 r7:c0558ac4 r6:cd635540 r5:cd55c810
r4:00000000
[<c0688700>] (sdhci_at91_runtime_resume) from [<c0558b00>] (pm_generic_runtime_resume+0x3c/0x48)
r7:c0558ac4 r6:04a08060 r5:cd55c91c r4:cd55c810
[<c0558ac4>] (pm_generic_runtime_resume) from [<c055bbfc>] (__rpm_callback+0x84/0x13c)
[<c055bb78>] (__rpm_callback) from [<c055bd18>] (rpm_callback+0x64/0x90)
r9:c0158a08 r8:00000004 r7:cd52911c r6:04a08060 r5:ffffe000 r4:cd55c810
[<c055bcb4>] (rpm_callback) from [<c055b898>] (rpm_resume+0x5d4/0x740)
r7:cd52911c r6:c0558ac4 r5:cd529010 r4:cd55c810
[<c055b2c4>] (rpm_resume) from [<c055ba60>] (__pm_runtime_resume+0x5c/0x74)
r10:cd635000 r9:ffffe000 r8:cb59640c r7:60070013 r6:00000004 r5:cd55c91c
r4:cd55c810
[<c055ba04>] (__pm_runtime_resume) from [<c0666bbc>] (__mmc_claim_host+0x1d8/0x228)
r7:00000000 r6:60070013 r5:cd635348 r4:00000000
[<c06669e4>] (__mmc_claim_host) from [<c0666c44>] (mmc_get_card+0x38/0x3c)
r10:00000001 r9:cd635000 r8:cb5964b8 r7:cb568000 r6:cb596410 r5:cb59640c
r4:cb568000
[<c0666c0c>] (mmc_get_card) from [<c067d188>] (mmc_mq_queue_rq+0x270/0x278)
r5:ccb24a40 r4:cb596408
[<c067cf18>] (mmc_mq_queue_rq) from [<c0444848>] (__blk_mq_try_issue_directly+0x150/0x1dc)
r10:c0e08a88 r9:00000021 r8:cd71f800 r7:00000001 r6:cb69ce00 r5:ccb24a40
r4:c0e08a88
[<c04446f8>] (__blk_mq_try_issue_directly) from [<c0445614>] (blk_mq_request_issue_directly+0x58/0x88)
r9:cd6bc680 r8:cb69ad78 r7:00000000 r6:ccb24a40 r5:cb69ce00 r4:c0e08a88
[<c04455bc>] (blk_mq_request_issue_directly) from [<c04456a4>] (blk_mq_try_issue_list_directly+0x60/0xdc)
r7:cb69ce00 r6:00000000 r5:ccb24a40 r4:cd71f8ac
[<c0445644>] (blk_mq_try_issue_list_directly) from [<c0449eac>] (blk_mq_sched_insert_requests+0x15c/0x1d8)
r7:00000000 r6:cd71f8ac r5:cb69a7b8 r4:cb69ce00
[<c0449d50>] (blk_mq_sched_insert_requests) from [<c04454f0>] (blk_mq_flush_plug_list+0x1ac/0x278)
r9:cd71f8a4 r8:00000000 r7:cd71fe00 r6:cd6bc680 r5:cb69ce00 r4:00000002
[<c0445344>] (blk_mq_flush_plug_list) from [<c043932c>] (blk_flush_plug_list+0xd8/0xf4)
r10:00000100 r9:00000122 r8:00000000 r7:cd71fe00 r6:c0e08a88 r5:cd71f8ec
r4:cd71fdf8
[<c0439254>] (blk_flush_plug_list) from [<c0444ca8>] (blk_mq_make_request+0x31c/0x6d0)
r10:c0ea52f8 r9:c0e08a88 r8:cd71fdf8 r7:c0e08a88 r6:00000000 r5:ccb24ec0
r4:cb69a7b8
[<c044498c>] (blk_mq_make_request) from [<c0437938>] (generic_make_request+0xc8/0x318)
r10:cd547500 r9:c0e08a88 r8:00000022 r7:00000000 r6:c0e1e1b4 r5:00000000
r4:cb69a7b8
[<c0437870>] (generic_make_request) from [<c0437bcc>] (submit_bio+0x44/0x180)
r10:00063800 r9:c0ea4db8 r8:00000a00 r7:00000000 r6:c0ea1b75 r5:cd547500
r4:c0e08a88
[<c0437b88>] (submit_bio) from [<c03c27c0>] (__submit_bio+0x74/0x41c)
r10:00063800 r9:c0ea4db8 r8:cca97800 r7:00000000 r6:c0ea1b75 r5:c0e08a88
r4:cd547500
[<c03c274c>] (__submit_bio) from [<c03c2bd4>] (__submit_merged_bio+0x6c/0x1cc)
r10:00063800 r9:c0ea4e00 r8:cca96000 r7:00000000 r6:00000000 r5:cd547500
r4:cca91938
[<c03c2b68>] (__submit_merged_bio) from [<c03c2e14>] (__submit_merged_write_cond+0xe0/0x154)
r10:00063800 r9:cca919a4 r8:cca96000 r7:00000000 r6:00000001 r5:00000000
r4:00000138 r3:cd564200
[<c03c2d34>] (__submit_merged_write_cond) from [<c03cc6a0>] (f2fs_write_cache_pages+0x75c/0x7c0)
r10:cc374208 r9:00000000 r8:fffff000 r7:cd71fd40 r6:00000006 r5:cc3740c0
r4:00000000
[<c03cbf44>] (f2fs_write_cache_pages) from [<c03cca54>] (f2fs_write_data_pages+0x350/0x3c0)
r10:ffffe000 r9:c0ea4cb0 r8:cca96060 r7:cc374208 r6:c0e08a88 r5:cd71fd40
r4:cc3740c0
[<c03cc704>] (f2fs_write_data_pages) from [<c01f6234>] (do_writepages+0x54/0xf0)
r10:ffffe000 r9:cc374208 r8:c01f3324 r7:c0e08a88 r6:cd71fd40 r5:cd71fd40
r4:cc374208
[<c01f61e0>] (do_writepages) from [<c0286014>] (__writeback_single_inode+0x60/0x4cc)
r8:cc3740c0 r7:c0ea3948 r6:cd71fd40 r5:c0ea1a63 r4:cc3740c0
[<c0285fb4>] (__writeback_single_inode) from [<c0286680>] (writeback_sb_inodes+0x200/0x4b4)
r10:ffffe000 r9:c0e08a88 r8:cc3740c0 r7:c0e1e3b0 r6:cd71fe78 r5:cb596048
r4:cc3741b8
[<c0286480>] (writeback_sb_inodes) from [<c02869a4>] (__writeback_inodes_wb+0x70/0xbc)
r10:cc265178 r9:c0e1e3b0 r8:cb59605c r7:cd71fe78 r6:00000001 r5:cb596048
r4:cca97800
[<c0286934>] (__writeback_inodes_wb) from [<c0286cb0>] (wb_writeback+0x2c0/0x3a0)
r10:cb596110 r9:c0bdf648 r8:c0bf40d8 r7:c0ea3948 r6:00001a63 r5:cd71fe78
r4:cb596048
[<c02869f0>] (wb_writeback) from [<c0288c2c>] (wb_workfn+0x3cc/0x568)
r10:cb596110 r9:cb596048 r8:c0ea3948 r7:00000000 r6:c0ea195b r5:00002a5b
r4:cb59613c
[<c0288860>] (wb_workfn) from [<c013f188>] (process_one_work+0x2c4/0x684)
r10:c0e08a88 r9:c0ea28f8 r8:cd445f00 r7:cd40de00 r6:c0ea195b r5:cd642680
r4:cb59613c
[<c013eec4>] (process_one_work) from [<c013f5a8>] (worker_thread+0x60/0x588)
r10:cd40de00 r9:c0e1e3b0 r8:cd40de38 r7:00000088 r6:cd40de00 r5:cd642694
r4:cd642680
[<c013f548>] (worker_thread) from [<c01465a0>] (kthread+0x168/0x16c)
r10:cd6f1e44 r9:c013f548 r8:cd642680 r7:cd71e000 r6:00000000 r5:cd65fc00
r4:cd69edc0
[<c0146438>] (kthread) from [<c0100114>] (ret_from_fork+0x14/0x20)
Exception stack(0xcd71ffb0 to 0xcd71fff8)
ffa0: 00000000 00000000 00000000 00000000
ffc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
ffe0: 00000000 00000000 00000000 00000000 00000013 00000000
r10:00000000 r9:00000000 r8:00000000 r7:00000000 r6:00000000 r5:c0146438
r4:cd69edc0


2020-08-10 00:13:21

by Dmitry Osipenko

[permalink] [raw]
Subject: Re: regulator: deadlock vs memory reclaim

10.08.2020 01:25, Michał Mirosław пишет:
> Hi guys,
>
> Commit f8702f9e4aa7 ("regulator: core: Use ww_mutex for regulators locking")
> from Nov 2018 tried to fix possible deadlocks when handling coupled
> regulators. Unfortunately it introduced another possible deadlock,
> as discovered by lockdep (see below), instead.
>
> regulator_lock_dependent() starts by taking regulator_list_mutex, The
> same mutex covers eg. regulator initialization, including memory allocations
> that happen there. This will deadlock when you have filesystem on eg. eMMC
> (which uses a regulator to control module voltages) and you register
> a new regulator (hotplug a device?) when under memory pressure.
>
> There is also another problem with regulator_lock_dependent(): all the
> w/w rollback stuff is useless: because of the outer lock, there can only
> be one contendant doing multiple-lock-grabbing procedure. In this setup,
> the procedure cannot detect other processes waiting on
> regulator_lock_dependent() and it cannot signal (wound a transaction of)
> current holders of locks taken by regulator_lock().
>
> Basically, we have a BKL for regulator_enable() and we're using ww_mutex
> as a recursive mutex with no deadlock prevention whatsoever. The locks
> also seem to cover way to much (eg. initialization even before making the
> regulator visible to the system).
>
> To fix the regulator vs memory reclaim path I tried pushing all allocations
> out of protected sections. After doing a few patches, though, I'm not sure
> I'm going in the right direction. Your thoughts?

IIRC, taking the regulator_list_mutex within regulator_lock_dependent()
is needed in order to protect the case of decoupling regulators.

Perhaps moving out allocations or making them GFP_NOWAIT should be the
easiest solution.

2020-08-10 15:40:50

by Mark Brown

[permalink] [raw]
Subject: Re: regulator: deadlock vs memory reclaim

On Mon, Aug 10, 2020 at 12:25:37AM +0200, Michał Mirosław wrote:

> regulator_lock_dependent() starts by taking regulator_list_mutex, The
> same mutex covers eg. regulator initialization, including memory allocations
> that happen there. This will deadlock when you have filesystem on eg. eMMC
> (which uses a regulator to control module voltages) and you register
> a new regulator (hotplug a device?) when under memory pressure.

OK, that's very much a corner case, it only applies in the case of
coupled regulators. The most obvious thing here would be to move the
allocations on registration out of the locked region, we really only
need this in the regulator_find_coupler() call I think. If the
regulator isn't coupled we don't need to take the lock at all.

> Basically, we have a BKL for regulator_enable() and we're using ww_mutex
> as a recursive mutex with no deadlock prevention whatsoever. The locks
> also seem to cover way to much (eg. initialization even before making the
> regulator visible to the system).

Could you be more specific about what you're looking at here? There's
nothing too obvious jumping out from the code here other than the bit
around the coupling allocation, otherwise it looks like we're locking
list walks.


Attachments:
(No filename) (1.25 kB)
signature.asc (499.00 B)
Download all attachments

2020-08-10 16:10:40

by Michał Mirosław

[permalink] [raw]
Subject: Re: regulator: deadlock vs memory reclaim

On Mon, Aug 10, 2020 at 04:39:28PM +0100, Mark Brown wrote:
> On Mon, Aug 10, 2020 at 12:25:37AM +0200, Micha? Miros?aw wrote:
>
> > regulator_lock_dependent() starts by taking regulator_list_mutex, The
> > same mutex covers eg. regulator initialization, including memory allocations
> > that happen there. This will deadlock when you have filesystem on eg. eMMC
> > (which uses a regulator to control module voltages) and you register
> > a new regulator (hotplug a device?) when under memory pressure.
>
> OK, that's very much a corner case, it only applies in the case of
> coupled regulators. The most obvious thing here would be to move the
> allocations on registration out of the locked region, we really only
> need this in the regulator_find_coupler() call I think. If the
> regulator isn't coupled we don't need to take the lock at all.

Currently, regulator_lock_dependent() is called by eg. regulator_enable() and
regulator_get_voltage(), so actually any regulator can deadlock this way.
I concur that the locking rules can (and need to) be relaxed.

> > Basically, we have a BKL for regulator_enable() and we're using ww_mutex
> > as a recursive mutex with no deadlock prevention whatsoever. The locks
> > also seem to cover way to much (eg. initialization even before making the
> > regulator visible to the system).
>
> Could you be more specific about what you're looking at here? There's
> nothing too obvious jumping out from the code here other than the bit
> around the coupling allocation, otherwise it looks like we're locking
> list walks.

When you look at the regulator API (regulator_enable() and friends),
then in their implementation we always start by .._lock_dependent(),
which takes regulator_list_mutex around its work. This mutex is what
makes the code deadlock-prone vs memory allocations. I have a feeling
that this lock is a workaround for historical requirements (recursive
locking of regulator_dev) that might be no longer needed or is just
too defensive programming. Hence my other patches and this inquiry.

Best Regards,
Micha??Miros?aw

2020-08-10 17:35:15

by Mark Brown

[permalink] [raw]
Subject: Re: regulator: deadlock vs memory reclaim

On Mon, Aug 10, 2020 at 06:09:36PM +0200, Michał Mirosław wrote:
> On Mon, Aug 10, 2020 at 04:39:28PM +0100, Mark Brown wrote:
> > On Mon, Aug 10, 2020 at 12:25:37AM +0200, Michał Mirosław wrote:

> > > regulator_lock_dependent() starts by taking regulator_list_mutex, The
> > > same mutex covers eg. regulator initialization, including memory allocations
> > > that happen there. This will deadlock when you have filesystem on eg. eMMC
> > > (which uses a regulator to control module voltages) and you register
> > > a new regulator (hotplug a device?) when under memory pressure.

> > OK, that's very much a corner case, it only applies in the case of
> > coupled regulators. The most obvious thing here would be to move the
> > allocations on registration out of the locked region, we really only
> > need this in the regulator_find_coupler() call I think. If the
> > regulator isn't coupled we don't need to take the lock at all.

> Currently, regulator_lock_dependent() is called by eg. regulator_enable() and
> regulator_get_voltage(), so actually any regulator can deadlock this way.

The initialization cases that are the trigger are only done for coupled
regulators though AFAICT, otherwise we're not doing allocations with the
lock held and should be able to progress.

> > > Basically, we have a BKL for regulator_enable() and we're using ww_mutex
> > > as a recursive mutex with no deadlock prevention whatsoever. The locks
> > > also seem to cover way to much (eg. initialization even before making the
> > > regulator visible to the system).

> > Could you be more specific about what you're looking at here? There's
> > nothing too obvious jumping out from the code here other than the bit
> > around the coupling allocation, otherwise it looks like we're locking
> > list walks.

> When you look at the regulator API (regulator_enable() and friends),
> then in their implementation we always start by .._lock_dependent(),
> which takes regulator_list_mutex around its work. This mutex is what
> makes the code deadlock-prone vs memory allocations. I have a feeling
> that this lock is a workaround for historical requirements (recursive
> locking of regulator_dev) that might be no longer needed or is just
> too defensive programming. Hence my other patches and this inquiry.

We need to both walk up the tree for operations that involve the parent
and rope in any peers that are coupled, the idea is basically to avoid
changes to the coupling while we're trying to figure that out.


Attachments:
(No filename) (2.49 kB)
signature.asc (499.00 B)
Download all attachments

2020-08-10 19:26:53

by Michał Mirosław

[permalink] [raw]
Subject: Re: regulator: deadlock vs memory reclaim

On Mon, Aug 10, 2020 at 06:31:36PM +0100, Mark Brown wrote:
> On Mon, Aug 10, 2020 at 06:09:36PM +0200, Micha? Miros?aw wrote:
> > On Mon, Aug 10, 2020 at 04:39:28PM +0100, Mark Brown wrote:
> > > On Mon, Aug 10, 2020 at 12:25:37AM +0200, Micha? Miros?aw wrote:
> > > > regulator_lock_dependent() starts by taking regulator_list_mutex, The
> > > > same mutex covers eg. regulator initialization, including memory allocations
> > > > that happen there. This will deadlock when you have filesystem on eg. eMMC
> > > > (which uses a regulator to control module voltages) and you register
> > > > a new regulator (hotplug a device?) when under memory pressure.
> > > OK, that's very much a corner case, it only applies in the case of
> > > coupled regulators. The most obvious thing here would be to move the
> > > allocations on registration out of the locked region, we really only
> > > need this in the regulator_find_coupler() call I think. If the
> > > regulator isn't coupled we don't need to take the lock at all.
> > Currently, regulator_lock_dependent() is called by eg. regulator_enable() and
> > regulator_get_voltage(), so actually any regulator can deadlock this way.
> The initialization cases that are the trigger are only done for coupled
> regulators though AFAICT, otherwise we're not doing allocations with the
> lock held and should be able to progress.

I caught a few lockdep complaints that suggest otherwise, but I'm still
looking into that.

> > > > Basically, we have a BKL for regulator_enable() and we're using ww_mutex
> > > > as a recursive mutex with no deadlock prevention whatsoever. The locks
> > > > also seem to cover way to much (eg. initialization even before making the
> > > > regulator visible to the system).
> > > Could you be more specific about what you're looking at here? There's
> > > nothing too obvious jumping out from the code here other than the bit
> > > around the coupling allocation, otherwise it looks like we're locking
> > > list walks.
> > When you look at the regulator API (regulator_enable() and friends),
> > then in their implementation we always start by .._lock_dependent(),
> > which takes regulator_list_mutex around its work. This mutex is what
> > makes the code deadlock-prone vs memory allocations. I have a feeling
> > that this lock is a workaround for historical requirements (recursive
> > locking of regulator_dev) that might be no longer needed or is just
> > too defensive programming. Hence my other patches and this inquiry.
> We need to both walk up the tree for operations that involve the parent
> and rope in any peers that are coupled, the idea is basically to avoid
> changes to the coupling while we're trying to figure that out.

This part I understand and this is what ww_mutex should take care of.
But there is another lock that's taken around ww_mutex locking transaction
that causes the trouble. I would expect that the rdev->mutex should be
enough to guard against changes in coupled regulators list, but this
might need a fix in the registration phase to guarantee that (it
probably should do the same ww_mutex locking dance as .._lock_dependent()
does now).

Best Regards,
Micha??Miros?aw

2020-08-10 19:45:12

by Dmitry Osipenko

[permalink] [raw]
Subject: Re: regulator: deadlock vs memory reclaim

10.08.2020 22:25, Michał Mirosław пишет:
>>>>> regulator_lock_dependent() starts by taking regulator_list_mutex, The
>>>>> same mutex covers eg. regulator initialization, including memory allocations
>>>>> that happen there. This will deadlock when you have filesystem on eg. eMMC
>>>>> (which uses a regulator to control module voltages) and you register
>>>>> a new regulator (hotplug a device?) when under memory pressure.
>>>> OK, that's very much a corner case, it only applies in the case of
>>>> coupled regulators. The most obvious thing here would be to move the
>>>> allocations on registration out of the locked region, we really only
>>>> need this in the regulator_find_coupler() call I think. If the
>>>> regulator isn't coupled we don't need to take the lock at all.
>>> Currently, regulator_lock_dependent() is called by eg. regulator_enable() and
>>> regulator_get_voltage(), so actually any regulator can deadlock this way.
>> The initialization cases that are the trigger are only done for coupled
>> regulators though AFAICT, otherwise we're not doing allocations with the
>> lock held and should be able to progress.
>
> I caught a few lockdep complaints that suggest otherwise, but I'm still
> looking into that.

The problem looks obvious to me. The regulator_init_coupling() is
protected with the list_mutex, the regulator_lock_dependent() also
protected with the list_mutex. Hence if offending reclaim happens from
init_coupling(), then there is a lockup.

1. mutex_lock(&regulator_list_mutex);

2. regulator_init_coupling()

3. kzalloc()

4. reclaim ...

5. regulator_get_voltage()

6. regulator_lock_dependent()

7. mutex_lock(&regulator_list_mutex);

It should be enough just to keep the regulator_find_coupler() under
lock, or even completely remove the locking around init_coupling(). I
think it should be better to keep the find_coupler() protected.

Michał, does this fix yours problem?

--- >8 ---
diff --git a/drivers/regulator/core.c b/drivers/regulator/core.c
index 75ff7c563c5d..513f95c6f837 100644
--- a/drivers/regulator/core.c
+++ b/drivers/regulator/core.c
@@ -5040,7 +5040,10 @@ static int regulator_init_coupling(struct
regulator_dev *rdev)
if (!of_check_coupling_data(rdev))
return -EPERM;

+ mutex_lock(&regulator_list_mutex);
rdev->coupling_desc.coupler = regulator_find_coupler(rdev);
+ mutex_unlock(&regulator_list_mutex);
+
if (IS_ERR(rdev->coupling_desc.coupler)) {
err = PTR_ERR(rdev->coupling_desc.coupler);
rdev_err(rdev, "failed to get coupler: %d\n", err);
@@ -5248,9 +5251,7 @@ regulator_register(const struct regulator_desc
*regulator_desc,
if (ret < 0)
goto wash;

- mutex_lock(&regulator_list_mutex);
ret = regulator_init_coupling(rdev);
- mutex_unlock(&regulator_list_mutex);
if (ret < 0)
goto wash;


--- >8 ---

2020-08-10 19:53:21

by Mark Brown

[permalink] [raw]
Subject: Re: regulator: deadlock vs memory reclaim

On Mon, Aug 10, 2020 at 10:41:54PM +0300, Dmitry Osipenko wrote:
> 10.08.2020 22:25, Michał Mirosław пишет:

> >> The initialization cases that are the trigger are only done for coupled
> >> regulators though AFAICT, otherwise we're not doing allocations with the
> >> lock held and should be able to progress.

> > I caught a few lockdep complaints that suggest otherwise, but I'm still
> > looking into that.

> The problem looks obvious to me. The regulator_init_coupling() is
> protected with the list_mutex, the regulator_lock_dependent() also
> protected with the list_mutex. Hence if offending reclaim happens from
> init_coupling(), then there is a lockup.

We may also have problems if I/O triggers allocations for some reason,
though that's also going to be a limited set of cases. Might be what
lockdep was showing though.

> It should be enough just to keep the regulator_find_coupler() under
> lock, or even completely remove the locking around init_coupling(). I
> think it should be better to keep the find_coupler() protected.

> Michał, does this fix yours problem?

That was the sort of thing I was thinking about here - it should at
least be an improvement if nothing else.


Attachments:
(No filename) (1.20 kB)
signature.asc (499.00 B)
Download all attachments

2020-08-10 20:19:26

by Dmitry Osipenko

[permalink] [raw]
Subject: Re: regulator: deadlock vs memory reclaim

10.08.2020 23:09, Michał Mirosław пишет:
> At first I also thought so, but there's more. Below is a lockdep
> complaint with your patch applied. I did a similar patch and then two more
> (following) and that is still not enough (sysfs/debugfs do allocations,
> too).

Then it should be good to move the locking for init_coupling() like I
suggested and use GFP_NOWAIT for the two other cases. It all could be a
single small patch. Could you please check whether GFP_NOWAIT helps?

2020-08-10 20:22:50

by Michał Mirosław

[permalink] [raw]
Subject: Re: regulator: deadlock vs memory reclaim

On Mon, Aug 10, 2020 at 11:15:28PM +0300, Dmitry Osipenko wrote:
> 10.08.2020 23:09, Michał Mirosław пишет:
> > At first I also thought so, but there's more. Below is a lockdep
> > complaint with your patch applied. I did a similar patch and then two more
> > (following) and that is still not enough (sysfs/debugfs do allocations,
> > too).
> Then it should be good to move the locking for init_coupling() like I
> suggested and use GFP_NOWAIT for the two other cases. It all could be a
> single small patch. Could you please check whether GFP_NOWAIT helps?

This would be equivalent to my patches. Problem with sysfs and debugfs
remains as they don't have the option of GFP_NOWAIT. This needs to be
moved outside of the locks.

Best Regards,
Michał Mirosław

2020-08-10 20:24:50

by Dmitry Osipenko

[permalink] [raw]
Subject: Re: regulator: deadlock vs memory reclaim

10.08.2020 23:18, Michał Mirosław пишет:
> On Mon, Aug 10, 2020 at 11:15:28PM +0300, Dmitry Osipenko wrote:
>> 10.08.2020 23:09, Michał Mirosław пишет:
>>> At first I also thought so, but there's more. Below is a lockdep
>>> complaint with your patch applied. I did a similar patch and then two more
>>> (following) and that is still not enough (sysfs/debugfs do allocations,
>>> too).
>> Then it should be good to move the locking for init_coupling() like I
>> suggested and use GFP_NOWAIT for the two other cases. It all could be a
>> single small patch. Could you please check whether GFP_NOWAIT helps?
>
> This would be equivalent to my patches. Problem with sysfs and debugfs
> remains as they don't have the option of GFP_NOWAIT. This needs to be
> moved outside of the locks.

Ah okay, you meant the debugfs core. I see now, thanks.

2020-08-10 20:57:36

by Dmitry Osipenko

[permalink] [raw]
Subject: Re: regulator: deadlock vs memory reclaim

10.08.2020 23:21, Dmitry Osipenko пишет:
> 10.08.2020 23:18, Michał Mirosław пишет:
>> On Mon, Aug 10, 2020 at 11:15:28PM +0300, Dmitry Osipenko wrote:
>>> 10.08.2020 23:09, Michał Mirosław пишет:
>>>> At first I also thought so, but there's more. Below is a lockdep
>>>> complaint with your patch applied. I did a similar patch and then two more
>>>> (following) and that is still not enough (sysfs/debugfs do allocations,
>>>> too).
>>> Then it should be good to move the locking for init_coupling() like I
>>> suggested and use GFP_NOWAIT for the two other cases. It all could be a
>>> single small patch. Could you please check whether GFP_NOWAIT helps?
>>
>> This would be equivalent to my patches. Problem with sysfs and debugfs
>> remains as they don't have the option of GFP_NOWAIT. This needs to be
>> moved outside of the locks.
>
> Ah okay, you meant the debugfs core. I see now, thanks.
>

This indeed needs a capital solution.

It's not obvious how to fix it.. we can probably remove taking the
list_mutex from lock_dependent(), but this still won't help the case of
memory reclaiming because reclaim may cause touching the already locked
regulator. IIUC, the case of memory reclaiming under regulator lock was
always dangerous and happened to work by chance before, correct?

2020-08-10 21:24:41

by Dmitry Osipenko

[permalink] [raw]
Subject: Re: regulator: deadlock vs memory reclaim

10.08.2020 23:56, Dmitry Osipenko пишет:
> 10.08.2020 23:21, Dmitry Osipenko пишет:
>> 10.08.2020 23:18, Michał Mirosław пишет:
>>> On Mon, Aug 10, 2020 at 11:15:28PM +0300, Dmitry Osipenko wrote:
>>>> 10.08.2020 23:09, Michał Mirosław пишет:
>>>>> At first I also thought so, but there's more. Below is a lockdep
>>>>> complaint with your patch applied. I did a similar patch and then two more
>>>>> (following) and that is still not enough (sysfs/debugfs do allocations,
>>>>> too).
>>>> Then it should be good to move the locking for init_coupling() like I
>>>> suggested and use GFP_NOWAIT for the two other cases. It all could be a
>>>> single small patch. Could you please check whether GFP_NOWAIT helps?
>>>
>>> This would be equivalent to my patches. Problem with sysfs and debugfs
>>> remains as they don't have the option of GFP_NOWAIT. This needs to be
>>> moved outside of the locks.
>>
>> Ah okay, you meant the debugfs core. I see now, thanks.
>>
>
> This indeed needs a capital solution.
>
> It's not obvious how to fix it.. we can probably remove taking the
> list_mutex from lock_dependent(), but this still won't help the case of
> memory reclaiming because reclaim may cause touching the already locked
> regulator. IIUC, the case of memory reclaiming under regulator lock was
> always dangerous and happened to work by chance before, correct?
>

And like Mark mentioned before, this situation also potentially may
happen from other paths.

2020-08-11 00:08:35

by Michał Mirosław

[permalink] [raw]
Subject: Re: regulator: deadlock vs memory reclaim

On Mon, Aug 10, 2020 at 11:56:13PM +0300, Dmitry Osipenko wrote:
> 10.08.2020 23:21, Dmitry Osipenko пишет:
> > 10.08.2020 23:18, Michał Mirosław пишет:
> >> On Mon, Aug 10, 2020 at 11:15:28PM +0300, Dmitry Osipenko wrote:
> >>> 10.08.2020 23:09, Michał Mirosław пишет:
> >>>> At first I also thought so, but there's more. Below is a lockdep
> >>>> complaint with your patch applied. I did a similar patch and then two more
> >>>> (following) and that is still not enough (sysfs/debugfs do allocations,
> >>>> too).
> >>> Then it should be good to move the locking for init_coupling() like I
> >>> suggested and use GFP_NOWAIT for the two other cases. It all could be a
> >>> single small patch. Could you please check whether GFP_NOWAIT helps?
> >>
> >> This would be equivalent to my patches. Problem with sysfs and debugfs
> >> remains as they don't have the option of GFP_NOWAIT. This needs to be
> >> moved outside of the locks.
> >
> > Ah okay, you meant the debugfs core. I see now, thanks.
> >
>
> This indeed needs a capital solution.
>
> It's not obvious how to fix it.. we can probably remove taking the
> list_mutex from lock_dependent(), but this still won't help the case of
> memory reclaiming because reclaim may cause touching the already locked
> regulator. IIUC, the case of memory reclaiming under regulator lock was
> always dangerous and happened to work by chance before, correct?

I just noticed that locking in regulator_resolve_coupling() is bogus.
This all holds up because regulator_list_mutex is held during the call.
Feel free to test a patch below.

I'm working my way to push allocations outside of the locks, but the
coupling-related locking will need to be fixed regardless.

Best Regards,
Michał Mirosław

---->8<----

[PATCH] regulator: remove superfluous lock in regulator_resolve_coupling()

The code modifies rdev, but locks c_rdev instead. The bug remains:
stored c_rdev could be freed just after unlock anyway. This doesn't blow
up because regulator_list_mutex taken outside holds it together.

Signed-off-by: Michał Mirosław <[email protected]>
---
drivers/regulator/core.c | 4 ----
1 file changed, 4 deletions(-)

diff --git a/drivers/regulator/core.c b/drivers/regulator/core.c
index 94f9225869da..e519bc9a860d 100644
--- a/drivers/regulator/core.c
+++ b/drivers/regulator/core.c
@@ -4859,13 +4859,9 @@ static void regulator_resolve_coupling(struct regulator_dev *rdev)
return;
}

- regulator_lock(c_rdev);
-
c_desc->coupled_rdevs[i] = c_rdev;
c_desc->n_resolved++;

- regulator_unlock(c_rdev);
-
regulator_resolve_coupling(c_rdev);
}
}
--
2.20.1

2020-08-11 15:46:14

by Dmitry Osipenko

[permalink] [raw]
Subject: Re: regulator: deadlock vs memory reclaim

11.08.2020 03:07, Michał Mirosław пишет:
...
> I just noticed that locking in regulator_resolve_coupling() is bogus.
> This all holds up because regulator_list_mutex is held during the call.
> Feel free to test a patch below.
>
> I'm working my way to push allocations outside of the locks, but the
> coupling-related locking will need to be fixed regardless.
>
> Best Regards,
> Michał Mirosław
>
> ---->8<----
>
> [PATCH] regulator: remove superfluous lock in regulator_resolve_coupling()
>
> The code modifies rdev, but locks c_rdev instead. The bug remains:
> stored c_rdev could be freed just after unlock anyway. This doesn't blow
> up because regulator_list_mutex taken outside holds it together.
>
> Signed-off-by: Michał Mirosław <[email protected]>
> ---
> drivers/regulator/core.c | 4 ----
> 1 file changed, 4 deletions(-)
>
> diff --git a/drivers/regulator/core.c b/drivers/regulator/core.c
> index 94f9225869da..e519bc9a860d 100644
> --- a/drivers/regulator/core.c
> +++ b/drivers/regulator/core.c
> @@ -4859,13 +4859,9 @@ static void regulator_resolve_coupling(struct regulator_dev *rdev)
> return;
> }
>
> - regulator_lock(c_rdev);
> -
> c_desc->coupled_rdevs[i] = c_rdev;
> c_desc->n_resolved++;
>
> - regulator_unlock(c_rdev);
> -
> regulator_resolve_coupling(c_rdev);
> }
> }
>

The change looks like a good cleanup to me, thanks. I think that c_rdev
locking was accidentally left from some older version of the patch that
introduced the coupling support. There shouldn't be any real bug in this
code.

IIRC, at some point I changed the code to disallow consumers to get a
partially coupled regulator and then protected the resolve_coupling()
with list_mutex, but seems missed to remove that c_rdev locking. Hence
there shouldn't be a need to lock regulators individually during the
resolve because nothing should touch the coupled regulators until all
the coupling has been resolved.