2022-01-18 03:03:59

by Tetsuo Handa

[permalink] [raw]
Subject: Re: regression: LTP isofs testcase failure - bisection fingers "loop: make autoclear operation asynchronous"

On 2022/01/18 0:34, Mike Galbraith wrote:
> Greetings,
>
> LTP's isofs testcase began failing this cycle, and bisected as below,
> which a revert then confirmed. Full testcase output attached.
>

Thanks for reporting.
Discussion is continued at https://lkml.kernel.org/r/[email protected] .


2022-01-19 15:52:08

by Mike Galbraith

[permalink] [raw]
Subject: Re: regression: LTP isofs testcase failure - bisection fingers "loop: make autoclear operation asynchronous"

On Tue, 2022-01-18 at 06:57 +0900, Tetsuo Handa wrote:
> On 2022/01/18 0:34, Mike Galbraith wrote:
> > Greetings,
> >
> > LTP's isofs testcase began failing this cycle, and bisected as below,
> > which a revert then confirmed.  Full testcase output attached.
> >
>
> Thanks for reporting.
> Discussion is continued at https://lkml.kernel.org/r/[email protected] .

FWIW, simply immediately flushing that shiny new rundown_work turned
LTP's thumb back to upward pointing.

---
drivers/block/loop.c | 1 +
1 file changed, 1 insertion(+)

--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -1197,6 +1197,7 @@ static void loop_schedule_rundown(struct
kobject_get(&bdev->bd_device.kobj);
INIT_WORK(&lo->rundown_work, loop_rundown_workfn);
queue_work(system_long_wq, &lo->rundown_work);
+ flush_work(&lo->rundown_work);
}

static int loop_clr_fd(struct loop_device *lo)

2022-01-19 19:12:48

by Tetsuo Handa

[permalink] [raw]
Subject: Re: regression: LTP isofs testcase failure - bisection fingers "loop: make autoclear operation asynchronous"

On 2022/01/18 12:21, Mike Galbraith wrote:
> On Tue, 2022-01-18 at 06:57 +0900, Tetsuo Handa wrote:
>> On 2022/01/18 0:34, Mike Galbraith wrote:
>>> Greetings,
>>>
>>> LTP's isofs testcase began failing this cycle, and bisected as below,
>>> which a revert then confirmed. Full testcase output attached.
>>>
>>
>> Thanks for reporting.
>> Discussion is continued at https://lkml.kernel.org/r/[email protected] .
>
> FWIW, simply immediately flushing that shiny new rundown_work turned
> LTP's thumb back to upward pointing.

Unfortunately, that reintroduces the circular locking dependency problem
which my patch tried to solve.

[ 159.316295]
[ 159.316759] ======================================================
[ 159.318129] WARNING: possible circular locking dependency detected
[ 159.319446] 5.16.0+ #48 Not tainted
[ 159.320270] ------------------------------------------------------
[ 159.321660] kworker/0:1/11 is trying to acquire lock:
[ 159.322859] ffff888074368538 ((wq_completion)loop0){+.+.}-{0:0}, at: flush_workqueue+0x9c/0x890
[ 159.324744]
[ 159.324744] but task is already holding lock:
[ 159.326028] ffff888100847dd8 ((work_completion)(&lo->rundown_work)){+.+.}-{0:0}, at: process_one_work+0x3b5/0x600
[ 159.328411]
[ 159.328411] which lock already depends on the new lock.
[ 159.328411]
[ 159.330227]
[ 159.330227] the existing dependency chain (in reverse order) is:
[ 159.331831]
[ 159.331831] -> #7 ((work_completion)(&lo->rundown_work)){+.+.}-{0:0}:
[ 159.335509] lock_acquire+0xe9/0x230
[ 159.336497] __flush_work+0x5f/0xd0
[ 159.337425] blkdev_put+0x285/0x2e0
[ 159.338356] blkdev_close+0x33/0x40
[ 159.339253] __fput+0x1f7/0x3d0
[ 159.340230] task_work_run+0xba/0xf0
[ 159.341149] exit_to_user_mode_loop+0x144/0x160
[ 159.342295] exit_to_user_mode_prepare+0xbd/0x130
[ 159.343423] syscall_exit_to_user_mode+0x26/0x60
[ 159.344655] do_syscall_64+0x49/0x90
[ 159.345596] entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 159.346814]
[ 159.346814] -> #6 (&disk->open_mutex){+.+.}-{3:3}:
[ 159.348188] lock_acquire+0xe9/0x230
[ 159.349115] __mutex_lock_common+0xf1/0x1340
[ 159.350285] mutex_lock_nested+0x17/0x20
[ 159.351345] blkdev_get_by_dev+0x106/0x490
[ 159.352506] swsusp_check+0x35/0x1a0
[ 159.353493] software_resume+0x5f/0x210
[ 159.354620] resume_store+0x7b/0xa0
[ 159.355690] kernfs_fop_write_iter+0x1c6/0x270
[ 159.356851] vfs_write+0x51e/0x5c0
[ 159.357835] ksys_write+0x90/0x110
[ 159.358757] do_syscall_64+0x3d/0x90
[ 159.359644] entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 159.360853]
[ 159.360853] -> #5 (system_transition_mutex/1){+.+.}-{3:3}:
[ 159.362489] lock_acquire+0xe9/0x230
[ 159.363437] __mutex_lock_common+0xf1/0x1340
[ 159.364661] mutex_lock_nested+0x17/0x20
[ 159.365725] software_resume+0x4d/0x210
[ 159.366787] resume_store+0x7b/0xa0
[ 159.367806] kernfs_fop_write_iter+0x1c6/0x270
[ 159.368874] vfs_write+0x51e/0x5c0
[ 159.369782] ksys_write+0x90/0x110
[ 159.370698] do_syscall_64+0x3d/0x90
[ 159.371644] entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 159.372832]
[ 159.372832] -> #4 (&of->mutex){+.+.}-{3:3}:
[ 159.374122] lock_acquire+0xe9/0x230
[ 159.375048] __mutex_lock_common+0xf1/0x1340
[ 159.376088] mutex_lock_nested+0x17/0x20
[ 159.377073] kernfs_seq_start+0x2f/0x190
[ 159.378202] seq_read_iter+0x1dc/0x6d0
[ 159.379351] vfs_read+0x50b/0x5a0
[ 159.380231] ksys_read+0x90/0x110
[ 159.381047] do_syscall_64+0x3d/0x90
[ 159.382012] entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 159.383206]
[ 159.383206] -> #3 (&p->lock){+.+.}-{3:3}:
[ 159.384400] lock_acquire+0xe9/0x230
[ 159.385386] __mutex_lock_common+0xf1/0x1340
[ 159.386479] mutex_lock_nested+0x17/0x20
[ 159.387487] seq_read_iter+0x57/0x6d0
[ 159.388439] generic_file_splice_read+0x1fb/0x2b0
[ 159.389538] splice_direct_to_actor+0x223/0x4e0
[ 159.390610] do_splice_direct+0x112/0x180
[ 159.391587] do_sendfile+0x295/0x610
[ 159.392489] __se_sys_sendfile64+0xb0/0xe0
[ 159.393475] do_syscall_64+0x3d/0x90
[ 159.394533] entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 159.397834]
[ 159.397834] -> #2 (sb_writers#7){.+.+}-{0:0}:
[ 159.399840] lock_acquire+0xe9/0x230
[ 159.401096] loop_process_work+0xaca/0xec0 [loop]
[ 159.402497] process_one_work+0x3ee/0x600
[ 159.403685] worker_thread+0x4d7/0x960
[ 159.404746] kthread+0x178/0x1a0
[ 159.405772] ret_from_fork+0x1f/0x30
[ 159.406858]
[ 159.406858] -> #1 ((work_completion)(&worker->work)){+.+.}-{0:0}:
[ 159.408601] lock_acquire+0xe9/0x230
[ 159.409506] process_one_work+0x3d3/0x600
[ 159.410461] worker_thread+0x4d7/0x960
[ 159.411531] kthread+0x178/0x1a0
[ 159.412387] ret_from_fork+0x1f/0x30
[ 159.413315]
[ 159.413315] -> #0 ((wq_completion)loop0){+.+.}-{0:0}:
[ 159.414960] validate_chain+0x2364/0x3aa0
[ 159.415942] __lock_acquire+0xcd4/0x1050
[ 159.417021] lock_acquire+0xe9/0x230
[ 159.418016] flush_workqueue+0xb8/0x890
[ 159.419126] drain_workqueue+0xa4/0x1c0
[ 159.420105] destroy_workqueue+0x4e/0x600
[ 159.421103] __loop_clr_fd+0xed/0x440 [loop]
[ 159.422203] loop_rundown_workfn+0x3f/0xc0 [loop]
[ 159.423357] process_one_work+0x3ee/0x600
[ 159.424357] worker_thread+0x4d7/0x960
[ 159.425299] kthread+0x178/0x1a0
[ 159.426140] ret_from_fork+0x1f/0x30
[ 159.427100]
[ 159.427100] other info that might help us debug this:
[ 159.427100]
[ 159.429256] Chain exists of:
[ 159.429256] (wq_completion)loop0 --> &disk->open_mutex --> (work_completion)(&lo->rundown_work)
[ 159.429256]
[ 159.432130] Possible unsafe locking scenario:
[ 159.432130]
[ 159.433368] CPU0 CPU1
[ 159.434337] ---- ----
[ 159.435375] lock((work_completion)(&lo->rundown_work));
[ 159.436552] lock(&disk->open_mutex);
[ 159.437859] lock((work_completion)(&lo->rundown_work));
[ 159.439558] lock((wq_completion)loop0);
[ 159.440451]
[ 159.440451] *** DEADLOCK ***
[ 159.440451]
[ 159.441684] 2 locks held by kworker/0:1/11:
[ 159.442719] #0: ffff888100082138 ((wq_completion)events_long){+.+.}-{0:0}, at: process_one_work+0x383/0x600
[ 159.444841] #1: ffff888100847dd8 ((work_completion)(&lo->rundown_work)){+.+.}-{0:0}, at: process_one_work+0x3b5/0x600
[ 159.447171]
[ 159.447171] stack backtrace:
[ 159.448108] CPU: 0 PID: 11 Comm: kworker/0:1 Kdump: loaded Not tainted 5.16.0+ #48
[ 159.449757] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 02/27/2020
[ 159.451960] Workqueue: events_long loop_rundown_workfn [loop]
[ 159.453169] Call Trace:
[ 159.453784] <TASK>
[ 159.454353] dump_stack_lvl+0x8e/0xe1
[ 159.455197] check_noncircular+0x1c7/0x1e0
[ 159.456081] ? preempt_count_sub+0xf/0xc0
[ 159.457081] validate_chain+0x2364/0x3aa0
[ 159.457950] ? validate_chain+0x6f4/0x3aa0
[ 159.458868] ? mark_lock+0xbb/0x290
[ 159.459662] __lock_acquire+0xcd4/0x1050
[ 159.460545] lock_acquire+0xe9/0x230
[ 159.461356] ? flush_workqueue+0x9c/0x890
[ 159.462411] flush_workqueue+0xb8/0x890
[ 159.463269] ? flush_workqueue+0x9c/0x890
[ 159.464157] ? lock_release+0x382/0x400
[ 159.464969] ? check_chain_key+0x1ec/0x280
[ 159.465947] drain_workqueue+0xa4/0x1c0
[ 159.466845] destroy_workqueue+0x4e/0x600
[ 159.467855] ? init_wait_entry+0x60/0x60
[ 159.468858] __loop_clr_fd+0xed/0x440 [loop]
[ 159.469850] loop_rundown_workfn+0x3f/0xc0 [loop]
[ 159.470930] process_one_work+0x3ee/0x600
[ 159.471850] worker_thread+0x4d7/0x960
[ 159.472667] ? _raw_spin_unlock_irqrestore+0x3f/0xb0
[ 159.473786] ? rcu_lock_release+0x20/0x20
[ 159.476282] kthread+0x178/0x1a0
[ 159.477562] ? kthread_blkcg+0x50/0x50
[ 159.478691] ret_from_fork+0x1f/0x30
[ 159.479949] </TASK>

2022-01-21 20:04:55

by Mike Galbraith

[permalink] [raw]
Subject: Re: regression: LTP isofs testcase failure - bisection fingers "loop: make autoclear operation asynchronous"

On Tue, 2022-01-18 at 15:19 +0900, Tetsuo Handa wrote:
> On 2022/01/18 12:21, Mike Galbraith wrote:
> > On Tue, 2022-01-18 at 06:57 +0900, Tetsuo Handa wrote:
> > > On 2022/01/18 0:34, Mike Galbraith wrote:
> > > > Greetings,
> > > >
> > > > LTP's isofs testcase began failing this cycle, and bisected as below,
> > > > which a revert then confirmed.  Full testcase output attached.
> > > >
> > >
> > > Thanks for reporting.
> > > Discussion is continued at https://lkml.kernel.org/r/[email protected] .
> >
> > FWIW, simply immediately flushing that shiny new rundown_work turned
> > LTP's thumb back to upward pointing.
>
> Unfortunately, that reintroduces the circular locking dependency problem
> which my patch tried to solve.

Hm, lockdep was green (and is, just double checked out of curiosity)
with the LTP testcase (shrug).

Happily, and as previously noted, both patch options at the link you
provided restored the LTP testcase's happy camper status.

-Mike

2022-02-09 11:33:39

by Mike Galbraith

[permalink] [raw]
Subject: Re: regression: LTP isofs testcase failure - bisection fingers "loop: make autoclear operation asynchronous"

On Tue, 2022-01-18 at 06:57 +0900, Tetsuo Handa wrote:
> On 2022/01/18 0:34, Mike Galbraith wrote:
> > Greetings,
> >
> > LTP's isofs testcase began failing this cycle, and bisected as below,
> > which a revert then confirmed.  Full testcase output attached.
> >
>
> Thanks for reporting.
> Discussion is continued at https://lkml.kernel.org/r/[email protected] .

<legalese-disclaimer-font> ping