While running "mkfs -t ext4" on arm64 juno-r2 device connected with SSD drive
the following kernel warning reported on stable rc 5.9.13-rc1 kernel.
Steps to reproduce:
------------------
# boot arm64 Juno-r2 device with stable-rc 5.9.13-rc1.
# Connect SSD drive
# Format the file system ext4 type
mkfs -t ext4 <SSD-drive>
# you will notice this warning
Crash log:
--------------
Writing superblocks and filesystem accounting information: 0/895
[ 86.131095]
[ 86.132592] =====================================
[ 86.137300] WARNING: bad unlock balance detected!
[ 86.142012] 5.9.13-rc1 #1 Not tainted
[ 86.145675] -------------------------------------
[ 86.150384] mkfs.ext4/426 is trying to release lock (rcu_read_lock) at:
[ 86.157020] [<ffff80001063478c>] blk_queue_exit+0xcc/0x1b0
[ 86.162511] but there are no more locks to release!
[ 86.167392]
[ 86.167392] other info that might help us debug this:
[ 86.173929] no locks held by mkfs.ext4/426.
[ 86.178114]
[ 86.178114] stack backtrace:
[ 86.182478] CPU: 1 PID: 426 Comm: mkfs.ext4 Not tainted 5.9.13-rc1 #1
[ 86.188926] Hardware name: ARM Juno development board (r2) (DT)
[ 86.194853] Call trace:
[ 86.197302] dump_backtrace+0x0/0x1f8
[ 86.200967] show_stack+0x2c/0x38
[ 86.204287] dump_stack+0xec/0x158
[ 86.207694] print_unlock_imbalance_bug+0xec/0xf0
[ 86.212404] lock_release+0x300/0x388
[ 86.216070] blk_queue_exit+0xe0/0x1b0
[ 86.219822] blk_mq_submit_bio+0x250/0xa08
[ 86.223922] submit_bio_noacct+0x468/0x518
[ 86.228022] submit_bio+0x4c/0x230
[ 86.231429] submit_bh_wbc+0x17c/0x218
[ 86.235182] __block_write_full_page+0x210/0x648
[ 86.239805] block_write_full_page+0x8c/0x150
[ 86.244167] blkdev_writepage+0x30/0x40
[ 86.248008] __writepage+0x38/0xd8
[ 86.251412] write_cache_pages+0x1fc/0x590
[ 86.255513] generic_writepages+0x64/0xa0
[ 86.259526] blkdev_writepages+0x28/0x38
[ 86.263452] do_writepages+0x6c/0x138
[ 86.267118] __filemap_fdatawrite_range+0x10c/0x148
[ 86.272001] file_write_and_wait_range+0x6c/0xd0
[ 86.276623] blkdev_fsync+0x3c/0x68
[ 86.280113] vfs_fsync_range+0x4c/0x90
[ 86.283864] do_fsync+0x48/0x78
[ 86.287007] __arm64_sys_fsync+0x24/0x38
[ 86.290933] el0_svc_common.constprop.3+0x7c/0x198
[ 86.295729] do_el0_svc+0x34/0xa0
[ 86.299047] el0_sync_handler+0x16c/0x210
[ 86.303060] el0_sync+0x140/0x180
Reported-by: Naresh Kamboju <[email protected]>
Full test log link,
https://qa-reports.linaro.org/lkft/linux-stable-rc-linux-5.9.y/build/v5.9.12-47-g1372e1af58d4/testrun/3538037/suite/linux-log-parser/test/check-kernel-exception-2012808/log
metadata:
git branch: linux-5.9.y
git repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
git commit: 1372e1af58d410676db7917cc3484ca22d471623
git describe: v5.9.12-47-g1372e1af58d4
make_kernelversion: 5.9.13-rc1
kernel-config:
http://snapshots.linaro.org/openembedded/lkft/lkft/sumo/juno/lkft/linux-stable-rc-5.9/47/config
--
Linaro LKFT
https://lkft.linaro.org
On Mon, Dec 07, 2020 at 11:17:29AM +0530, Naresh Kamboju wrote:
> While running "mkfs -t ext4" on arm64 juno-r2 device connected with SSD drive
> the following kernel warning reported on stable rc 5.9.13-rc1 kernel.
>
> Steps to reproduce:
> ------------------
> # boot arm64 Juno-r2 device with stable-rc 5.9.13-rc1.
> # Connect SSD drive
> # Format the file system ext4 type
> mkfs -t ext4 <SSD-drive>
> # you will notice this warning
Does it happen easily? Can you bisect?
> Crash log:
> --------------
> Writing superblocks and filesystem accounting information: 0/895
> [ 86.131095]
> [ 86.132592] =====================================
> [ 86.137300] WARNING: bad unlock balance detected!
> [ 86.142012] 5.9.13-rc1 #1 Not tainted
> [ 86.145675] -------------------------------------
> [ 86.150384] mkfs.ext4/426 is trying to release lock (rcu_read_lock) at:
> [ 86.157020] [<ffff80001063478c>] blk_queue_exit+0xcc/0x1b0
> [ 86.162511] but there are no more locks to release!
This really doesn't make much sense. blk_queue_exit() in 5.9.12 does:
percpu_ref_put(&q->q_usage_counter);
(literally, that's the entire function)
percpu_ref_put() does:
rcu_read_lock();
if (__ref_is_percpu(ref, &percpu_count))
this_cpu_sub(*percpu_count, nr);
else if (unlikely(atomic_long_sub_and_test(nr, &ref->count)))
ref->release(ref);
rcu_read_unlock();
Unless ->release() has an unbalanced rcu_read_unlock(), there definitely
is a lock to release! Some archaeology says that ->release is
blk_queue_usage_counter_release(), which calls
wake_up_all(&q->mq_freeze_wq);
which doesn't appear to use RCU at all. So this trace makes no sense,
and all I can do is ask you to bisect it.
On Mon, 7 Dec 2020 at 11:37, Matthew Wilcox <[email protected]> wrote:
>
> On Mon, Dec 07, 2020 at 11:17:29AM +0530, Naresh Kamboju wrote:
> > While running "mkfs -t ext4" on arm64 juno-r2 device connected with SSD drive
> > the following kernel warning reported on stable rc 5.9.13-rc1 kernel.
> >
> > Steps to reproduce:
> > ------------------
> > # boot arm64 Juno-r2 device with stable-rc 5.9.13-rc1.
> > # Connect SSD drive
> > # Format the file system ext4 type
> > mkfs -t ext4 <SSD-drive>
> > # you will notice this warning
>
> Does it happen easily? Can you bisect?
I have been running multi test loops to reproduce this problem but no
luck yet :(
Since it is hard to reproduce we can not bisect.
- Naresh