LinuxLists.cc - possible deadlock in shmem

2019-12-26 21:26:48

Subject: possible deadlock in shmem_fallocate (4)

Hello,

syzbot found the following crash on:

HEAD commit: 46cf053e Linux 5.5-rc3
git tree: upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=162124aee00000
kernel config: https://syzkaller.appspot.com/x/.config?x=ed9d672709340e35
dashboard link: https://syzkaller.appspot.com/bug?extid=7a0d9d0b26efefe61780
compiler: gcc (GCC) 9.0.0 20181231 (experimental)

Unfortunately, I don't have any reproducer for this crash yet.

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: [email protected]

======================================================
WARNING: possible circular locking dependency detected
5.5.0-rc3-syzkaller #0 Not tainted
------------------------------------------------------
kswapd0/1852 is trying to acquire lock:
ffff888098919cd8 (&sb->s_type->i_mutex_key#13){+.+.}, at: inode_lock
include/linux/fs.h:791 [inline]
ffff888098919cd8 (&sb->s_type->i_mutex_key#13){+.+.}, at:
shmem_fallocate+0x15a/0xd40 mm/shmem.c:2735

but task is already holding lock:
ffffffff89a41e00 (fs_reclaim){+.+.}, at: __fs_reclaim_acquire+0x0/0x30
mm/page_alloc.c:4922

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #1 (fs_reclaim){+.+.}:
__fs_reclaim_acquire mm/page_alloc.c:4084 [inline]
fs_reclaim_acquire.part.0+0x24/0x30 mm/page_alloc.c:4095
fs_reclaim_acquire mm/page_alloc.c:4695 [inline]
prepare_alloc_pages mm/page_alloc.c:4692 [inline]
__alloc_pages_nodemask+0x52d/0x910 mm/page_alloc.c:4744
alloc_pages_vma+0xdd/0x620 mm/mempolicy.c:2170
shmem_alloc_page+0xc0/0x180 mm/shmem.c:1499
shmem_alloc_and_acct_page+0x165/0x990 mm/shmem.c:1524
shmem_getpage_gfp+0x56d/0x29a0 mm/shmem.c:1838
shmem_getpage mm/shmem.c:154 [inline]
shmem_write_begin+0x105/0x1e0 mm/shmem.c:2487
generic_perform_write+0x23b/0x540 mm/filemap.c:3309
__generic_file_write_iter+0x25e/0x630 mm/filemap.c:3438
generic_file_write_iter+0x420/0x68e mm/filemap.c:3470
call_write_iter include/linux/fs.h:1902 [inline]
new_sync_write+0x4d3/0x770 fs/read_write.c:483
__vfs_write+0xe1/0x110 fs/read_write.c:496
vfs_write+0x268/0x5d0 fs/read_write.c:558
ksys_write+0x14f/0x290 fs/read_write.c:611
__do_sys_write fs/read_write.c:623 [inline]
__se_sys_write fs/read_write.c:620 [inline]
__x64_sys_write+0x73/0xb0 fs/read_write.c:620
do_syscall_64+0xfa/0x790 arch/x86/entry/common.c:294
entry_SYSCALL_64_after_hwframe+0x49/0xbe

-> #0 (&sb->s_type->i_mutex_key#13){+.+.}:
check_prev_add kernel/locking/lockdep.c:2476 [inline]
check_prevs_add kernel/locking/lockdep.c:2581 [inline]
validate_chain kernel/locking/lockdep.c:2971 [inline]
__lock_acquire+0x2596/0x4a00 kernel/locking/lockdep.c:3955
lock_acquire+0x190/0x410 kernel/locking/lockdep.c:4485
down_write+0x93/0x150 kernel/locking/rwsem.c:1534
inode_lock include/linux/fs.h:791 [inline]
shmem_fallocate+0x15a/0xd40 mm/shmem.c:2735
ashmem_shrink_scan drivers/staging/android/ashmem.c:462 [inline]
ashmem_shrink_scan+0x370/0x510 drivers/staging/android/ashmem.c:437
do_shrink_slab+0x40f/0xad0 mm/vmscan.c:526
shrink_slab mm/vmscan.c:687 [inline]
shrink_slab+0x19a/0x680 mm/vmscan.c:660
shrink_node_memcgs mm/vmscan.c:2687 [inline]
shrink_node+0x46a/0x1ad0 mm/vmscan.c:2791
kswapd_shrink_node mm/vmscan.c:3539 [inline]
balance_pgdat+0x7c8/0x11f0 mm/vmscan.c:3697
kswapd+0x5c3/0xf30 mm/vmscan.c:3948
kthread+0x361/0x430 kernel/kthread.c:255
ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352

other info that might help us debug this:

Possible unsafe locking scenario:

CPU0 CPU1
---- ----
lock(fs_reclaim);
lock(&sb->s_type->i_mutex_key#13);
lock(fs_reclaim);
lock(&sb->s_type->i_mutex_key#13);

*** DEADLOCK ***

2 locks held by kswapd0/1852:
#0: ffffffff89a41e00 (fs_reclaim){+.+.}, at: __fs_reclaim_acquire+0x0/0x30
mm/page_alloc.c:4922
#1: ffffffff89a1f948 (shrinker_rwsem){++++}, at: shrink_slab
mm/vmscan.c:677 [inline]
#1: ffffffff89a1f948 (shrinker_rwsem){++++}, at: shrink_slab+0xe6/0x680
mm/vmscan.c:660

stack backtrace:
CPU: 0 PID: 1852 Comm: kswapd0 Not tainted 5.5.0-rc3-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x197/0x210 lib/dump_stack.c:118
print_circular_bug.isra.0.cold+0x163/0x172 kernel/locking/lockdep.c:1685
check_noncircular+0x32e/0x3e0 kernel/locking/lockdep.c:1809
check_prev_add kernel/locking/lockdep.c:2476 [inline]
check_prevs_add kernel/locking/lockdep.c:2581 [inline]
validate_chain kernel/locking/lockdep.c:2971 [inline]
__lock_acquire+0x2596/0x4a00 kernel/locking/lockdep.c:3955
lock_acquire+0x190/0x410 kernel/locking/lockdep.c:4485
down_write+0x93/0x150 kernel/locking/rwsem.c:1534
inode_lock include/linux/fs.h:791 [inline]
shmem_fallocate+0x15a/0xd40 mm/shmem.c:2735
ashmem_shrink_scan drivers/staging/android/ashmem.c:462 [inline]
ashmem_shrink_scan+0x370/0x510 drivers/staging/android/ashmem.c:437
do_shrink_slab+0x40f/0xad0 mm/vmscan.c:526
shrink_slab mm/vmscan.c:687 [inline]
shrink_slab+0x19a/0x680 mm/vmscan.c:660
shrink_node_memcgs mm/vmscan.c:2687 [inline]
shrink_node+0x46a/0x1ad0 mm/vmscan.c:2791
kswapd_shrink_node mm/vmscan.c:3539 [inline]
balance_pgdat+0x7c8/0x11f0 mm/vmscan.c:3697
kswapd+0x5c3/0xf30 mm/vmscan.c:3948
kthread+0x361/0x430 kernel/kthread.c:255
ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352

---
This bug is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at [email protected].

syzbot will keep track of this bug report. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

2020-03-07 21:44:33

by Eric Biggers

[permalink] [raw]

Subject: Re: [ashmem] possible deadlock in shmem_fallocate (4)

ashmem is calling shmem_fallocate() during memory reclaim, which can deadlock on
inode_lock(). +Cc maintainers of drivers/staging/android/ashmem.c.

On Thu, Dec 26, 2019 at 01:25:09PM -0800, syzbot wrote:
> Hello,
>
> syzbot found the following crash on:
>
> HEAD commit: 46cf053e Linux 5.5-rc3
> git tree: upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=162124aee00000
> kernel config: https://syzkaller.appspot.com/x/.config?x=ed9d672709340e35
> dashboard link: https://syzkaller.appspot.com/bug?extid=7a0d9d0b26efefe61780
> compiler: gcc (GCC) 9.0.0 20181231 (experimental)
>
> Unfortunately, I don't have any reproducer for this crash yet.
>
> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: [email protected]
>
> ======================================================
> WARNING: possible circular locking dependency detected
> 5.5.0-rc3-syzkaller #0 Not tainted
> ------------------------------------------------------
> kswapd0/1852 is trying to acquire lock:
> ffff888098919cd8 (&sb->s_type->i_mutex_key#13){+.+.}, at: inode_lock
> include/linux/fs.h:791 [inline]
> ffff888098919cd8 (&sb->s_type->i_mutex_key#13){+.+.}, at:
> shmem_fallocate+0x15a/0xd40 mm/shmem.c:2735
>
> but task is already holding lock:
> ffffffff89a41e00 (fs_reclaim){+.+.}, at: __fs_reclaim_acquire+0x0/0x30
> mm/page_alloc.c:4922
>
> which lock already depends on the new lock.
>
>
> the existing dependency chain (in reverse order) is:
>
> -> #1 (fs_reclaim){+.+.}:
> __fs_reclaim_acquire mm/page_alloc.c:4084 [inline]
> fs_reclaim_acquire.part.0+0x24/0x30 mm/page_alloc.c:4095
> fs_reclaim_acquire mm/page_alloc.c:4695 [inline]
> prepare_alloc_pages mm/page_alloc.c:4692 [inline]
> __alloc_pages_nodemask+0x52d/0x910 mm/page_alloc.c:4744
> alloc_pages_vma+0xdd/0x620 mm/mempolicy.c:2170
> shmem_alloc_page+0xc0/0x180 mm/shmem.c:1499
> shmem_alloc_and_acct_page+0x165/0x990 mm/shmem.c:1524
> shmem_getpage_gfp+0x56d/0x29a0 mm/shmem.c:1838
> shmem_getpage mm/shmem.c:154 [inline]
> shmem_write_begin+0x105/0x1e0 mm/shmem.c:2487
> generic_perform_write+0x23b/0x540 mm/filemap.c:3309
> __generic_file_write_iter+0x25e/0x630 mm/filemap.c:3438
> generic_file_write_iter+0x420/0x68e mm/filemap.c:3470
> call_write_iter include/linux/fs.h:1902 [inline]
> new_sync_write+0x4d3/0x770 fs/read_write.c:483
> __vfs_write+0xe1/0x110 fs/read_write.c:496
> vfs_write+0x268/0x5d0 fs/read_write.c:558
> ksys_write+0x14f/0x290 fs/read_write.c:611
> __do_sys_write fs/read_write.c:623 [inline]
> __se_sys_write fs/read_write.c:620 [inline]
> __x64_sys_write+0x73/0xb0 fs/read_write.c:620
> do_syscall_64+0xfa/0x790 arch/x86/entry/common.c:294
> entry_SYSCALL_64_after_hwframe+0x49/0xbe
>
> -> #0 (&sb->s_type->i_mutex_key#13){+.+.}:
> check_prev_add kernel/locking/lockdep.c:2476 [inline]
> check_prevs_add kernel/locking/lockdep.c:2581 [inline]
> validate_chain kernel/locking/lockdep.c:2971 [inline]
> __lock_acquire+0x2596/0x4a00 kernel/locking/lockdep.c:3955
> lock_acquire+0x190/0x410 kernel/locking/lockdep.c:4485
> down_write+0x93/0x150 kernel/locking/rwsem.c:1534
> inode_lock include/linux/fs.h:791 [inline]
> shmem_fallocate+0x15a/0xd40 mm/shmem.c:2735
> ashmem_shrink_scan drivers/staging/android/ashmem.c:462 [inline]
> ashmem_shrink_scan+0x370/0x510 drivers/staging/android/ashmem.c:437
> do_shrink_slab+0x40f/0xad0 mm/vmscan.c:526
> shrink_slab mm/vmscan.c:687 [inline]
> shrink_slab+0x19a/0x680 mm/vmscan.c:660
> shrink_node_memcgs mm/vmscan.c:2687 [inline]
> shrink_node+0x46a/0x1ad0 mm/vmscan.c:2791
> kswapd_shrink_node mm/vmscan.c:3539 [inline]
> balance_pgdat+0x7c8/0x11f0 mm/vmscan.c:3697
> kswapd+0x5c3/0xf30 mm/vmscan.c:3948
> kthread+0x361/0x430 kernel/kthread.c:255
> ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352
>
> other info that might help us debug this:
>
> Possible unsafe locking scenario:
>
> CPU0 CPU1
> ---- ----
> lock(fs_reclaim);
> lock(&sb->s_type->i_mutex_key#13);
> lock(fs_reclaim);
> lock(&sb->s_type->i_mutex_key#13);
>
> *** DEADLOCK ***
>
> 2 locks held by kswapd0/1852:
> #0: ffffffff89a41e00 (fs_reclaim){+.+.}, at: __fs_reclaim_acquire+0x0/0x30
> mm/page_alloc.c:4922
> #1: ffffffff89a1f948 (shrinker_rwsem){++++}, at: shrink_slab
> mm/vmscan.c:677 [inline]
> #1: ffffffff89a1f948 (shrinker_rwsem){++++}, at: shrink_slab+0xe6/0x680
> mm/vmscan.c:660
>
> stack backtrace:
> CPU: 0 PID: 1852 Comm: kswapd0 Not tainted 5.5.0-rc3-syzkaller #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> Google 01/01/2011
> Call Trace:
> __dump_stack lib/dump_stack.c:77 [inline]
> dump_stack+0x197/0x210 lib/dump_stack.c:118
> print_circular_bug.isra.0.cold+0x163/0x172 kernel/locking/lockdep.c:1685
> check_noncircular+0x32e/0x3e0 kernel/locking/lockdep.c:1809
> check_prev_add kernel/locking/lockdep.c:2476 [inline]
> check_prevs_add kernel/locking/lockdep.c:2581 [inline]
> validate_chain kernel/locking/lockdep.c:2971 [inline]
> __lock_acquire+0x2596/0x4a00 kernel/locking/lockdep.c:3955
> lock_acquire+0x190/0x410 kernel/locking/lockdep.c:4485
> down_write+0x93/0x150 kernel/locking/rwsem.c:1534
> inode_lock include/linux/fs.h:791 [inline]
> shmem_fallocate+0x15a/0xd40 mm/shmem.c:2735
> ashmem_shrink_scan drivers/staging/android/ashmem.c:462 [inline]
> ashmem_shrink_scan+0x370/0x510 drivers/staging/android/ashmem.c:437
> do_shrink_slab+0x40f/0xad0 mm/vmscan.c:526
> shrink_slab mm/vmscan.c:687 [inline]
> shrink_slab+0x19a/0x680 mm/vmscan.c:660
> shrink_node_memcgs mm/vmscan.c:2687 [inline]
> shrink_node+0x46a/0x1ad0 mm/vmscan.c:2791
> kswapd_shrink_node mm/vmscan.c:3539 [inline]
> balance_pgdat+0x7c8/0x11f0 mm/vmscan.c:3697
> kswapd+0x5c3/0xf30 mm/vmscan.c:3948
> kthread+0x361/0x430 kernel/kthread.c:255
> ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352
>
>
> ---
> This bug is generated by a bot. It may contain errors.
> See https://goo.gl/tpsmEJ for more information about syzbot.
> syzbot engineers can be reached at [email protected].
>
> syzbot will keep track of this bug report. See:
> https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
>
> --

2020-07-14 00:33:29

by syzbot

[permalink] [raw]

Subject: Re: possible deadlock in shmem_fallocate (4)

syzbot has found a reproducer for the following crash on:

HEAD commit: 11ba4688 Linux 5.8-rc5
git tree: upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=13f1bf47100000
kernel config: https://syzkaller.appspot.com/x/.config?x=a160d1053fc89af5
dashboard link: https://syzkaller.appspot.com/bug?extid=7a0d9d0b26efefe61780
compiler: gcc (GCC) 10.1.0-syz 20200507
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=1181004f100000

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: [email protected]

======================================================
WARNING: possible circular locking dependency detected
5.8.0-rc5-syzkaller #0 Not tainted
------------------------------------------------------
khugepaged/1157 is trying to acquire lock:
ffff88809272e910 (&sb->s_type->i_mutex_key#15){+.+.}-{3:3}, at: inode_lock include/linux/fs.h:800 [inline]
ffff88809272e910 (&sb->s_type->i_mutex_key#15){+.+.}-{3:3}, at: shmem_fallocate+0x153/0xd90 mm/shmem.c:2707

but task is already holding lock:
ffffffff89c6c260 (fs_reclaim){+.+.}-{0:0}, at: fs_reclaim_release mm/page_alloc.c:4202 [inline]
ffffffff89c6c260 (fs_reclaim){+.+.}-{0:0}, at: fs_reclaim_release mm/page_alloc.c:4198 [inline]
ffffffff89c6c260 (fs_reclaim){+.+.}-{0:0}, at: __perform_reclaim mm/page_alloc.c:4227 [inline]
ffffffff89c6c260 (fs_reclaim){+.+.}-{0:0}, at: __alloc_pages_direct_reclaim mm/page_alloc.c:4244 [inline]
ffffffff89c6c260 (fs_reclaim){+.+.}-{0:0}, at: __alloc_pages_slowpath.constprop.0+0x1554/0x2780 mm/page_alloc.c:4650

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #1 (fs_reclaim){+.+.}-{0:0}:
__fs_reclaim_acquire mm/page_alloc.c:4183 [inline]
fs_reclaim_acquire mm/page_alloc.c:4194 [inline]
prepare_alloc_pages mm/page_alloc.c:4780 [inline]
__alloc_pages_nodemask+0x3d1/0x930 mm/page_alloc.c:4832
alloc_pages_vma+0xdd/0x720 mm/mempolicy.c:2255
shmem_alloc_page+0x11f/0x1f0 mm/shmem.c:1502
shmem_alloc_and_acct_page+0x161/0x8a0 mm/shmem.c:1527
shmem_getpage_gfp+0x511/0x2450 mm/shmem.c:1823
shmem_getpage mm/shmem.c:153 [inline]
shmem_write_begin+0xf9/0x1d0 mm/shmem.c:2459
generic_perform_write+0x20a/0x4f0 mm/filemap.c:3318
__generic_file_write_iter+0x24b/0x610 mm/filemap.c:3447
generic_file_write_iter+0x3a6/0x5c0 mm/filemap.c:3479
call_write_iter include/linux/fs.h:1908 [inline]
new_sync_write+0x422/0x650 fs/read_write.c:503
vfs_write+0x59d/0x6b0 fs/read_write.c:578
ksys_write+0x12d/0x250 fs/read_write.c:631
do_syscall_64+0x60/0xe0 arch/x86/entry/common.c:384
entry_SYSCALL_64_after_hwframe+0x44/0xa9

-> #0 (&sb->s_type->i_mutex_key#15){+.+.}-{3:3}:
check_prev_add kernel/locking/lockdep.c:2496 [inline]
check_prevs_add kernel/locking/lockdep.c:2601 [inline]
validate_chain kernel/locking/lockdep.c:3218 [inline]
__lock_acquire+0x2acb/0x56e0 kernel/locking/lockdep.c:4380
lock_acquire+0x1f1/0xad0 kernel/locking/lockdep.c:4959
down_write+0x8d/0x150 kernel/locking/rwsem.c:1531
inode_lock include/linux/fs.h:800 [inline]
shmem_fallocate+0x153/0xd90 mm/shmem.c:2707
ashmem_shrink_scan.part.0+0x2e9/0x490 drivers/staging/android/ashmem.c:490
ashmem_shrink_scan+0x6c/0xa0 drivers/staging/android/ashmem.c:473
do_shrink_slab+0x3c6/0xab0 mm/vmscan.c:518
shrink_slab+0x16f/0x5c0 mm/vmscan.c:679
shrink_node_memcgs mm/vmscan.c:2658 [inline]
shrink_node+0x519/0x1b60 mm/vmscan.c:2770
shrink_zones mm/vmscan.c:2973 [inline]
do_try_to_free_pages+0x38b/0x1340 mm/vmscan.c:3026
try_to_free_pages+0x29a/0x8b0 mm/vmscan.c:3265
__perform_reclaim mm/page_alloc.c:4223 [inline]
__alloc_pages_direct_reclaim mm/page_alloc.c:4244 [inline]
__alloc_pages_slowpath.constprop.0+0x949/0x2780 mm/page_alloc.c:4650
__alloc_pages_nodemask+0x68f/0x930 mm/page_alloc.c:4863
__alloc_pages include/linux/gfp.h:509 [inline]
__alloc_pages_node include/linux/gfp.h:522 [inline]
khugepaged_alloc_page+0xa0/0x170 mm/khugepaged.c:867
collapse_huge_page mm/khugepaged.c:1056 [inline]
khugepaged_scan_pmd mm/khugepaged.c:1349 [inline]
khugepaged_scan_mm_slot mm/khugepaged.c:2110 [inline]
khugepaged_do_scan mm/khugepaged.c:2193 [inline]
khugepaged+0x3093/0x5a10 mm/khugepaged.c:2238
kthread+0x3b5/0x4a0 kernel/kthread.c:291
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:293

other info that might help us debug this:

Possible unsafe locking scenario:

CPU0 CPU1
---- ----
lock(fs_reclaim);
lock(&sb->s_type->i_mutex_key#15);
lock(fs_reclaim);
lock(&sb->s_type->i_mutex_key#15);

*** DEADLOCK ***

2 locks held by khugepaged/1157:
#0: ffffffff89c6c260 (fs_reclaim){+.+.}-{0:0}, at: fs_reclaim_release mm/page_alloc.c:4202 [inline]
#0: ffffffff89c6c260 (fs_reclaim){+.+.}-{0:0}, at: fs_reclaim_release mm/page_alloc.c:4198 [inline]
#0: ffffffff89c6c260 (fs_reclaim){+.+.}-{0:0}, at: __perform_reclaim mm/page_alloc.c:4227 [inline]
#0: ffffffff89c6c260 (fs_reclaim){+.+.}-{0:0}, at: __alloc_pages_direct_reclaim mm/page_alloc.c:4244 [inline]
#0: ffffffff89c6c260 (fs_reclaim){+.+.}-{0:0}, at: __alloc_pages_slowpath.constprop.0+0x1554/0x2780 mm/page_alloc.c:4650
#1: ffffffff89c46a90 (shrinker_rwsem){++++}-{3:3}, at: shrink_slab+0xc7/0x5c0 mm/vmscan.c:669

stack backtrace:
CPU: 0 PID: 1157 Comm: khugepaged Not tainted 5.8.0-rc5-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x18f/0x20d lib/dump_stack.c:118
check_noncircular+0x324/0x3e0 kernel/locking/lockdep.c:1827
check_prev_add kernel/locking/lockdep.c:2496 [inline]
check_prevs_add kernel/locking/lockdep.c:2601 [inline]
validate_chain kernel/locking/lockdep.c:3218 [inline]
__lock_acquire+0x2acb/0x56e0 kernel/locking/lockdep.c:4380
lock_acquire+0x1f1/0xad0 kernel/locking/lockdep.c:4959
down_write+0x8d/0x150 kernel/locking/rwsem.c:1531
inode_lock include/linux/fs.h:800 [inline]
shmem_fallocate+0x153/0xd90 mm/shmem.c:2707
ashmem_shrink_scan.part.0+0x2e9/0x490 drivers/staging/android/ashmem.c:490
ashmem_shrink_scan+0x6c/0xa0 drivers/staging/android/ashmem.c:473
do_shrink_slab+0x3c6/0xab0 mm/vmscan.c:518
shrink_slab+0x16f/0x5c0 mm/vmscan.c:679
shrink_node_memcgs mm/vmscan.c:2658 [inline]
shrink_node+0x519/0x1b60 mm/vmscan.c:2770
shrink_zones mm/vmscan.c:2973 [inline]
do_try_to_free_pages+0x38b/0x1340 mm/vmscan.c:3026
try_to_free_pages+0x29a/0x8b0 mm/vmscan.c:3265
__perform_reclaim mm/page_alloc.c:4223 [inline]
__alloc_pages_direct_reclaim mm/page_alloc.c:4244 [inline]
__alloc_pages_slowpath.constprop.0+0x949/0x2780 mm/page_alloc.c:4650
__alloc_pages_nodemask+0x68f/0x930 mm/page_alloc.c:4863
__alloc_pages include/linux/gfp.h:509 [inline]
__alloc_pages_node include/linux/gfp.h:522 [inline]
khugepaged_alloc_page+0xa0/0x170 mm/khugepaged.c:867
collapse_huge_page mm/khugepaged.c:1056 [inline]
khugepaged_scan_pmd mm/khugepaged.c:1349 [inline]
khugepaged_scan_mm_slot mm/khugepaged.c:2110 [inline]
khugepaged_do_scan mm/khugepaged.c:2193 [inline]
khugepaged+0x3093/0x5a10 mm/khugepaged.c:2238
kthread+0x3b5/0x4a0 kernel/kthread.c:291
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:293

2020-07-14 03:07:58

by syzbot

[permalink] [raw]

Subject: Re: possible deadlock in shmem_fallocate (4)

syzbot has found a reproducer for the following crash on:

HEAD commit: 11ba4688 Linux 5.8-rc5
git tree: upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=175391fb100000
kernel config: https://syzkaller.appspot.com/x/.config?x=a160d1053fc89af5
dashboard link: https://syzkaller.appspot.com/bug?extid=7a0d9d0b26efefe61780
compiler: gcc (GCC) 10.1.0-syz 20200507
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=104d9c77100000
C reproducer: https://syzkaller.appspot.com/x/repro.c?x=1343e95d100000

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: [email protected]

======================================================
WARNING: possible circular locking dependency detected
5.8.0-rc5-syzkaller #0 Not tainted
------------------------------------------------------
khugepaged/1158 is trying to acquire lock:
ffff8882071865b0 (&sb->s_type->i_mutex_key#15){+.+.}-{3:3}, at: inode_lock include/linux/fs.h:800 [inline]
ffff8882071865b0 (&sb->s_type->i_mutex_key#15){+.+.}-{3:3}, at: shmem_fallocate+0x153/0xd90 mm/shmem.c:2707

but task is already holding lock:
ffffffff89c6c260 (fs_reclaim){+.+.}-{0:0}, at: fs_reclaim_release mm/page_alloc.c:4202 [inline]
ffffffff89c6c260 (fs_reclaim){+.+.}-{0:0}, at: fs_reclaim_release mm/page_alloc.c:4198 [inline]
ffffffff89c6c260 (fs_reclaim){+.+.}-{0:0}, at: __perform_reclaim mm/page_alloc.c:4227 [inline]
ffffffff89c6c260 (fs_reclaim){+.+.}-{0:0}, at: __alloc_pages_direct_reclaim mm/page_alloc.c:4244 [inline]
ffffffff89c6c260 (fs_reclaim){+.+.}-{0:0}, at: __alloc_pages_slowpath.constprop.0+0x1554/0x2780 mm/page_alloc.c:4650

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #1 (fs_reclaim){+.+.}-{0:0}:
__fs_reclaim_acquire mm/page_alloc.c:4183 [inline]
fs_reclaim_acquire mm/page_alloc.c:4194 [inline]
prepare_alloc_pages mm/page_alloc.c:4780 [inline]
__alloc_pages_nodemask+0x3d1/0x930 mm/page_alloc.c:4832
alloc_pages_vma+0xdd/0x720 mm/mempolicy.c:2255
shmem_alloc_page+0x11f/0x1f0 mm/shmem.c:1502
shmem_alloc_and_acct_page+0x161/0x8a0 mm/shmem.c:1527
shmem_getpage_gfp+0x511/0x2450 mm/shmem.c:1823
shmem_getpage mm/shmem.c:153 [inline]
shmem_write_begin+0xf9/0x1d0 mm/shmem.c:2459
generic_perform_write+0x20a/0x4f0 mm/filemap.c:3318
__generic_file_write_iter+0x24b/0x610 mm/filemap.c:3447
generic_file_write_iter+0x3a6/0x5c0 mm/filemap.c:3479
call_write_iter include/linux/fs.h:1908 [inline]
new_sync_write+0x422/0x650 fs/read_write.c:503
vfs_write+0x59d/0x6b0 fs/read_write.c:578
ksys_write+0x12d/0x250 fs/read_write.c:631
do_syscall_64+0x60/0xe0 arch/x86/entry/common.c:384
entry_SYSCALL_64_after_hwframe+0x44/0xa9

-> #0 (&sb->s_type->i_mutex_key#15){+.+.}-{3:3}:
check_prev_add kernel/locking/lockdep.c:2496 [inline]
check_prevs_add kernel/locking/lockdep.c:2601 [inline]
validate_chain kernel/locking/lockdep.c:3218 [inline]
__lock_acquire+0x2acb/0x56e0 kernel/locking/lockdep.c:4380
lock_acquire+0x1f1/0xad0 kernel/locking/lockdep.c:4959
down_write+0x8d/0x150 kernel/locking/rwsem.c:1531
inode_lock include/linux/fs.h:800 [inline]
shmem_fallocate+0x153/0xd90 mm/shmem.c:2707
ashmem_shrink_scan.part.0+0x2e9/0x490 drivers/staging/android/ashmem.c:490
ashmem_shrink_scan+0x6c/0xa0 drivers/staging/android/ashmem.c:473
do_shrink_slab+0x3c6/0xab0 mm/vmscan.c:518
shrink_slab+0x16f/0x5c0 mm/vmscan.c:679
shrink_node_memcgs mm/vmscan.c:2658 [inline]
shrink_node+0x519/0x1b60 mm/vmscan.c:2770
shrink_zones mm/vmscan.c:2973 [inline]
do_try_to_free_pages+0x38b/0x1340 mm/vmscan.c:3026
try_to_free_pages+0x29a/0x8b0 mm/vmscan.c:3265
__perform_reclaim mm/page_alloc.c:4223 [inline]
__alloc_pages_direct_reclaim mm/page_alloc.c:4244 [inline]
__alloc_pages_slowpath.constprop.0+0x949/0x2780 mm/page_alloc.c:4650
__alloc_pages_nodemask+0x68f/0x930 mm/page_alloc.c:4863
__alloc_pages include/linux/gfp.h:509 [inline]
__alloc_pages_node include/linux/gfp.h:522 [inline]
khugepaged_alloc_page+0xa0/0x170 mm/khugepaged.c:867
collapse_huge_page mm/khugepaged.c:1056 [inline]
khugepaged_scan_pmd mm/khugepaged.c:1349 [inline]
khugepaged_scan_mm_slot mm/khugepaged.c:2110 [inline]
khugepaged_do_scan mm/khugepaged.c:2193 [inline]
khugepaged+0x3093/0x5a10 mm/khugepaged.c:2238
kthread+0x3b5/0x4a0 kernel/kthread.c:291
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:293

other info that might help us debug this:

Possible unsafe locking scenario:

CPU0 CPU1
---- ----
lock(fs_reclaim);
lock(&sb->s_type->i_mutex_key#15);
lock(fs_reclaim);
lock(&sb->s_type->i_mutex_key#15);

*** DEADLOCK ***

2 locks held by khugepaged/1158:
#0: ffffffff89c6c260 (fs_reclaim){+.+.}-{0:0}, at: fs_reclaim_release mm/page_alloc.c:4202 [inline]
#0: ffffffff89c6c260 (fs_reclaim){+.+.}-{0:0}, at: fs_reclaim_release mm/page_alloc.c:4198 [inline]
#0: ffffffff89c6c260 (fs_reclaim){+.+.}-{0:0}, at: __perform_reclaim mm/page_alloc.c:4227 [inline]
#0: ffffffff89c6c260 (fs_reclaim){+.+.}-{0:0}, at: __alloc_pages_direct_reclaim mm/page_alloc.c:4244 [inline]
#0: ffffffff89c6c260 (fs_reclaim){+.+.}-{0:0}, at: __alloc_pages_slowpath.constprop.0+0x1554/0x2780 mm/page_alloc.c:4650
#1: ffffffff89c46a90 (shrinker_rwsem){++++}-{3:3}, at: shrink_slab+0xc7/0x5c0 mm/vmscan.c:669

stack backtrace:
CPU: 1 PID: 1158 Comm: khugepaged Not tainted 5.8.0-rc5-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x18f/0x20d lib/dump_stack.c:118
check_noncircular+0x324/0x3e0 kernel/locking/lockdep.c:1827
check_prev_add kernel/locking/lockdep.c:2496 [inline]
check_prevs_add kernel/locking/lockdep.c:2601 [inline]
validate_chain kernel/locking/lockdep.c:3218 [inline]
__lock_acquire+0x2acb/0x56e0 kernel/locking/lockdep.c:4380
lock_acquire+0x1f1/0xad0 kernel/locking/lockdep.c:4959
down_write+0x8d/0x150 kernel/locking/rwsem.c:1531
inode_lock include/linux/fs.h:800 [inline]
shmem_fallocate+0x153/0xd90 mm/shmem.c:2707
ashmem_shrink_scan.part.0+0x2e9/0x490 drivers/staging/android/ashmem.c:490
ashmem_shrink_scan+0x6c/0xa0 drivers/staging/android/ashmem.c:473
do_shrink_slab+0x3c6/0xab0 mm/vmscan.c:518
shrink_slab+0x16f/0x5c0 mm/vmscan.c:679
shrink_node_memcgs mm/vmscan.c:2658 [inline]
shrink_node+0x519/0x1b60 mm/vmscan.c:2770
shrink_zones mm/vmscan.c:2973 [inline]
do_try_to_free_pages+0x38b/0x1340 mm/vmscan.c:3026
try_to_free_pages+0x29a/0x8b0 mm/vmscan.c:3265
__perform_reclaim mm/page_alloc.c:4223 [inline]
__alloc_pages_direct_reclaim mm/page_alloc.c:4244 [inline]
__alloc_pages_slowpath.constprop.0+0x949/0x2780 mm/page_alloc.c:4650
__alloc_pages_nodemask+0x68f/0x930 mm/page_alloc.c:4863
__alloc_pages include/linux/gfp.h:509 [inline]
__alloc_pages_node include/linux/gfp.h:522 [inline]
khugepaged_alloc_page+0xa0/0x170 mm/khugepaged.c:867
collapse_huge_page mm/khugepaged.c:1056 [inline]
khugepaged_scan_pmd mm/khugepaged.c:1349 [inline]
khugepaged_scan_mm_slot mm/khugepaged.c:2110 [inline]
khugepaged_do_scan mm/khugepaged.c:2193 [inline]
khugepaged+0x3093/0x5a10 mm/khugepaged.c:2238
kthread+0x3b5/0x4a0 kernel/kthread.c:291
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:293

2020-07-14 03:41:56

by Eric Biggers

[permalink] [raw]

Subject: Re: possible deadlock in shmem_fallocate (4)

On Tue, Jul 14, 2020 at 11:32:52AM +0800, Hillf Danton wrote:
>
> Add FALLOC_FL_NOBLOCK and on the shmem side try to lock inode upon the
> new flag. And the overall upside is to keep the current gfp either in
> the khugepaged context or not.
>
> --- a/include/uapi/linux/falloc.h
> +++ b/include/uapi/linux/falloc.h
> @@ -77,4 +77,6 @@
> */
> #define FALLOC_FL_UNSHARE_RANGE 0x40
>
> +#define FALLOC_FL_NOBLOCK 0x80
> +

You can't add a new UAPI flag to fix a kernel-internal problem like this.

- Eric

2020-07-14 08:27:45

by Michal Hocko

[permalink] [raw]

Subject: Re: possible deadlock in shmem_fallocate (4)

On Tue 14-07-20 13:32:05, Hillf Danton wrote:
>
> On Mon, 13 Jul 2020 20:41:11 -0700 Eric Biggers wrote:
> > On Tue, Jul 14, 2020 at 11:32:52AM +0800, Hillf Danton wrote:
> > >
> > > Add FALLOC_FL_NOBLOCK and on the shmem side try to lock inode upon the
> > > new flag. And the overall upside is to keep the current gfp either in
> > > the khugepaged context or not.
> > >
> > > --- a/include/uapi/linux/falloc.h
> > > +++ b/include/uapi/linux/falloc.h
> > > @@ -77,4 +77,6 @@
> > > */
> > > #define FALLOC_FL_UNSHARE_RANGE 0x40
> > >
> > > +#define FALLOC_FL_NOBLOCK 0x80
> > > +
> >
> > You can't add a new UAPI flag to fix a kernel-internal problem like this.
>
> Sounds fair, see below.
>
> What the report indicates is a missing PF_MEMALLOC_NOFS and it's
> checked on the ashmem side and added as an exception before going
> to filesystem. On shmem side, no more than a best effort is paid
> on the inteded exception.
>
> --- a/drivers/staging/android/ashmem.c
> +++ b/drivers/staging/android/ashmem.c
> @@ -437,6 +437,7 @@ static unsigned long
> ashmem_shrink_scan(struct shrinker *shrink, struct shrink_control *sc)
> {
> unsigned long freed = 0;
> + bool nofs;
>
> /* We might recurse into filesystem code, so bail out if necessary */
> if (!(sc->gfp_mask & __GFP_FS))
> @@ -445,6 +446,11 @@ ashmem_shrink_scan(struct shrinker *shri
> if (!mutex_trylock(&ashmem_mutex))
> return -1;
>
> + /* enter filesystem with caution: nonblock on locking */
> + nofs = current->flags & PF_MEMALLOC_NOFS;
> + if (!nofs)
> + current->flags |= PF_MEMALLOC_NOFS;
> +
> while (!list_empty(&ashmem_lru_list)) {
> struct ashmem_range *range =
> list_first_entry(&ashmem_lru_list, typeof(*range), lru);

I do not think this is an appropriate fix. First of all is this a real
deadlock or a lockdep false positive? Is it possible that ashmem just
needs to properly annotate its shmem inodes? Or is it possible that
the internal backing shmem file is visible to the userspace so the write
path would be possible?

If this a real problem then the proper fix would be to set internal
shmem mapping's gfp_mask to drop __GFP_FS.
--
Michal Hocko
SUSE Labs

2020-07-14 14:21:17

by Michal Hocko

[permalink] [raw]

Subject: Re: possible deadlock in shmem_fallocate (4)

On Tue 14-07-20 22:08:59, Hillf Danton wrote:
>
> On Tue, 14 Jul 2020 10:26:29 +0200 Michal Hocko wrote:
> > On Tue 14-07-20 13:32:05, Hillf Danton wrote:
> > >
> > > On Mon, 13 Jul 2020 20:41:11 -0700 Eric Biggers wrote:
> > > > On Tue, Jul 14, 2020 at 11:32:52AM +0800, Hillf Danton wrote:
> > > > >
> > > > > Add FALLOC_FL_NOBLOCK and on the shmem side try to lock inode upon the
> > > > > new flag. And the overall upside is to keep the current gfp either in
> > > > > the khugepaged context or not.
> > > > >
> > > > > --- a/include/uapi/linux/falloc.h
> > > > > +++ b/include/uapi/linux/falloc.h
> > > > > @@ -77,4 +77,6 @@
> > > > > */
> > > > > #define FALLOC_FL_UNSHARE_RANGE 0x40
> > > > >
> > > > > +#define FALLOC_FL_NOBLOCK 0x80
> > > > > +
> > > >
> > > > You can't add a new UAPI flag to fix a kernel-internal problem like this.
> > >
> > > Sounds fair, see below.
> > >
> > > What the report indicates is a missing PF_MEMALLOC_NOFS and it's
> > > checked on the ashmem side and added as an exception before going
> > > to filesystem. On shmem side, no more than a best effort is paid
> > > on the inteded exception.
> > >
> > > --- a/drivers/staging/android/ashmem.c
> > > +++ b/drivers/staging/android/ashmem.c
> > > @@ -437,6 +437,7 @@ static unsigned long
> > > ashmem_shrink_scan(struct shrinker *shrink, struct shrink_control *sc)
> > > {
> > > unsigned long freed = 0;
> > > + bool nofs;
> > >
> > > /* We might recurse into filesystem code, so bail out if necessary */
> > > if (!(sc->gfp_mask & __GFP_FS))
> > > @@ -445,6 +446,11 @@ ashmem_shrink_scan(struct shrinker *shri
> > > if (!mutex_trylock(&ashmem_mutex))
> > > return -1;
> > >
> > > + /* enter filesystem with caution: nonblock on locking */
> > > + nofs = current->flags & PF_MEMALLOC_NOFS;
> > > + if (!nofs)
> > > + current->flags |= PF_MEMALLOC_NOFS;
> > > +
> > > while (!list_empty(&ashmem_lru_list)) {
> > > struct ashmem_range *range =
> > > list_first_entry(&ashmem_lru_list, typeof(*range), lru);
> >
> > I do not think this is an appropriate fix. First of all is this a real
> > deadlock or a lockdep false positive? Is it possible that ashmem just
>
> The warning matters and we can do something to quiesce it.

The underlying issue should be fixed rather than _something_ done to
silence it.

> > needs to properly annotate its shmem inodes? Or is it possible that
> > the internal backing shmem file is visible to the userspace so the write
> > path would be possible?
> >
> > If this a real problem then the proper fix would be to set internal
> > shmem mapping's gfp_mask to drop __GFP_FS.
>
> Thanks for the tip, see below.
>
> Can you expand a bit on how it helps direct reclaimers like khugepaged
> in the syzbot report wrt deadlock?

I do not understand your question.

> TBH I have difficult time following
> up after staring at the chart below for quite a while.

Yes, lockdep reports are quite hard to follow and they tend to confuse
one hell out of me. But this one says that there is a reclaim dependency
between the shmem inode lock and the reclaim context.

> Possible unsafe locking scenario:
>
> CPU0 CPU1
> ---- ----
> lock(fs_reclaim);
> lock(&sb->s_type->i_mutex_key#15);
> lock(fs_reclaim);
>
> lock(&sb->s_type->i_mutex_key#15);

Please refrain from proposing fixes until the actual problem is
understood. I suspect that this might be just false positive because the
lockdep cannot tell the backing shmem which is internal to ashmem(?)
with any general shmem. But somebody really familiar with ashmem code
should have a look I believe.

--
Michal Hocko
SUSE Labs

2020-07-14 15:48:01

by Todd Kjos

[permalink] [raw]

Subject: Re: possible deadlock in shmem_fallocate (4)

+Suren Baghdasaryan +Hridya Valsaraju who support the ashmem driver.

On Tue, Jul 14, 2020 at 7:18 AM Michal Hocko <[email protected]> wrote:
>
> On Tue 14-07-20 22:08:59, Hillf Danton wrote:
> >
> > On Tue, 14 Jul 2020 10:26:29 +0200 Michal Hocko wrote:
> > > On Tue 14-07-20 13:32:05, Hillf Danton wrote:
> > > >
> > > > On Mon, 13 Jul 2020 20:41:11 -0700 Eric Biggers wrote:
> > > > > On Tue, Jul 14, 2020 at 11:32:52AM +0800, Hillf Danton wrote:
> > > > > >
> > > > > > Add FALLOC_FL_NOBLOCK and on the shmem side try to lock inode upon the
> > > > > > new flag. And the overall upside is to keep the current gfp either in
> > > > > > the khugepaged context or not.
> > > > > >
> > > > > > --- a/include/uapi/linux/falloc.h
> > > > > > +++ b/include/uapi/linux/falloc.h
> > > > > > @@ -77,4 +77,6 @@
> > > > > > */
> > > > > > #define FALLOC_FL_UNSHARE_RANGE 0x40
> > > > > >
> > > > > > +#define FALLOC_FL_NOBLOCK 0x80
> > > > > > +
> > > > >
> > > > > You can't add a new UAPI flag to fix a kernel-internal problem like this.
> > > >
> > > > Sounds fair, see below.
> > > >
> > > > What the report indicates is a missing PF_MEMALLOC_NOFS and it's
> > > > checked on the ashmem side and added as an exception before going
> > > > to filesystem. On shmem side, no more than a best effort is paid
> > > > on the inteded exception.
> > > >
> > > > --- a/drivers/staging/android/ashmem.c
> > > > +++ b/drivers/staging/android/ashmem.c
> > > > @@ -437,6 +437,7 @@ static unsigned long
> > > > ashmem_shrink_scan(struct shrinker *shrink, struct shrink_control *sc)
> > > > {
> > > > unsigned long freed = 0;
> > > > + bool nofs;
> > > >
> > > > /* We might recurse into filesystem code, so bail out if necessary */
> > > > if (!(sc->gfp_mask & __GFP_FS))
> > > > @@ -445,6 +446,11 @@ ashmem_shrink_scan(struct shrinker *shri
> > > > if (!mutex_trylock(&ashmem_mutex))
> > > > return -1;
> > > >
> > > > + /* enter filesystem with caution: nonblock on locking */
> > > > + nofs = current->flags & PF_MEMALLOC_NOFS;
> > > > + if (!nofs)
> > > > + current->flags |= PF_MEMALLOC_NOFS;
> > > > +
> > > > while (!list_empty(&ashmem_lru_list)) {
> > > > struct ashmem_range *range =
> > > > list_first_entry(&ashmem_lru_list, typeof(*range), lru);
> > >
> > > I do not think this is an appropriate fix. First of all is this a real
> > > deadlock or a lockdep false positive? Is it possible that ashmem just
> >
> > The warning matters and we can do something to quiesce it.
>
> The underlying issue should be fixed rather than _something_ done to
> silence it.
>
> > > needs to properly annotate its shmem inodes? Or is it possible that
> > > the internal backing shmem file is visible to the userspace so the write
> > > path would be possible?
> > >
> > > If this a real problem then the proper fix would be to set internal
> > > shmem mapping's gfp_mask to drop __GFP_FS.
> >
> > Thanks for the tip, see below.
> >
> > Can you expand a bit on how it helps direct reclaimers like khugepaged
> > in the syzbot report wrt deadlock?
>
> I do not understand your question.
>
> > TBH I have difficult time following
> > up after staring at the chart below for quite a while.
>
> Yes, lockdep reports are quite hard to follow and they tend to confuse
> one hell out of me. But this one says that there is a reclaim dependency
> between the shmem inode lock and the reclaim context.
>
> > Possible unsafe locking scenario:
> >
> > CPU0 CPU1
> > ---- ----
> > lock(fs_reclaim);
> > lock(&sb->s_type->i_mutex_key#15);
> > lock(fs_reclaim);
> >
> > lock(&sb->s_type->i_mutex_key#15);
>
> Please refrain from proposing fixes until the actual problem is
> understood. I suspect that this might be just false positive because the
> lockdep cannot tell the backing shmem which is internal to ashmem(?)
> with any general shmem. But somebody really familiar with ashmem code
> should have a look I believe.
>
> --
> Michal Hocko
> SUSE Labs

2020-07-14 16:42:36

by Suren Baghdasaryan

[permalink] [raw]

Subject: Re: possible deadlock in shmem_fallocate (4)

On Tue, Jul 14, 2020 at 8:47 AM Todd Kjos <[email protected]> wrote:
>
> +Suren Baghdasaryan +Hridya Valsaraju who support the ashmem driver.

Thanks for looping me in.

>
>
> On Tue, Jul 14, 2020 at 7:18 AM Michal Hocko <[email protected]> wrote:
> >
> > On Tue 14-07-20 22:08:59, Hillf Danton wrote:
> > >
> > > On Tue, 14 Jul 2020 10:26:29 +0200 Michal Hocko wrote:
> > > > On Tue 14-07-20 13:32:05, Hillf Danton wrote:
> > > > >
> > > > > On Mon, 13 Jul 2020 20:41:11 -0700 Eric Biggers wrote:
> > > > > > On Tue, Jul 14, 2020 at 11:32:52AM +0800, Hillf Danton wrote:
> > > > > > >
> > > > > > > Add FALLOC_FL_NOBLOCK and on the shmem side try to lock inode upon the
> > > > > > > new flag. And the overall upside is to keep the current gfp either in
> > > > > > > the khugepaged context or not.
> > > > > > >
> > > > > > > --- a/include/uapi/linux/falloc.h
> > > > > > > +++ b/include/uapi/linux/falloc.h
> > > > > > > @@ -77,4 +77,6 @@
> > > > > > > */
> > > > > > > #define FALLOC_FL_UNSHARE_RANGE 0x40
> > > > > > >
> > > > > > > +#define FALLOC_FL_NOBLOCK 0x80
> > > > > > > +
> > > > > >
> > > > > > You can't add a new UAPI flag to fix a kernel-internal problem like this.
> > > > >
> > > > > Sounds fair, see below.
> > > > >
> > > > > What the report indicates is a missing PF_MEMALLOC_NOFS and it's
> > > > > checked on the ashmem side and added as an exception before going
> > > > > to filesystem. On shmem side, no more than a best effort is paid
> > > > > on the inteded exception.
> > > > >
> > > > > --- a/drivers/staging/android/ashmem.c
> > > > > +++ b/drivers/staging/android/ashmem.c
> > > > > @@ -437,6 +437,7 @@ static unsigned long
> > > > > ashmem_shrink_scan(struct shrinker *shrink, struct shrink_control *sc)
> > > > > {
> > > > > unsigned long freed = 0;
> > > > > + bool nofs;
> > > > >
> > > > > /* We might recurse into filesystem code, so bail out if necessary */
> > > > > if (!(sc->gfp_mask & __GFP_FS))
> > > > > @@ -445,6 +446,11 @@ ashmem_shrink_scan(struct shrinker *shri
> > > > > if (!mutex_trylock(&ashmem_mutex))
> > > > > return -1;
> > > > >
> > > > > + /* enter filesystem with caution: nonblock on locking */
> > > > > + nofs = current->flags & PF_MEMALLOC_NOFS;
> > > > > + if (!nofs)
> > > > > + current->flags |= PF_MEMALLOC_NOFS;
> > > > > +
> > > > > while (!list_empty(&ashmem_lru_list)) {
> > > > > struct ashmem_range *range =
> > > > > list_first_entry(&ashmem_lru_list, typeof(*range), lru);
> > > >
> > > > I do not think this is an appropriate fix. First of all is this a real
> > > > deadlock or a lockdep false positive? Is it possible that ashmem just
> > >
> > > The warning matters and we can do something to quiesce it.
> >
> > The underlying issue should be fixed rather than _something_ done to
> > silence it.
> >
> > > > needs to properly annotate its shmem inodes? Or is it possible that
> > > > the internal backing shmem file is visible to the userspace so the write
> > > > path would be possible?
> > > >
> > > > If this a real problem then the proper fix would be to set internal
> > > > shmem mapping's gfp_mask to drop __GFP_FS.
> > >
> > > Thanks for the tip, see below.
> > >
> > > Can you expand a bit on how it helps direct reclaimers like khugepaged
> > > in the syzbot report wrt deadlock?
> >
> > I do not understand your question.
> >
> > > TBH I have difficult time following
> > > up after staring at the chart below for quite a while.
> >
> > Yes, lockdep reports are quite hard to follow and they tend to confuse
> > one hell out of me. But this one says that there is a reclaim dependency
> > between the shmem inode lock and the reclaim context.
> >
> > > Possible unsafe locking scenario:
> > >
> > > CPU0 CPU1
> > > ---- ----
> > > lock(fs_reclaim);
> > > lock(&sb->s_type->i_mutex_key#15);
> > > lock(fs_reclaim);
> > >
> > > lock(&sb->s_type->i_mutex_key#15);
> >
> > Please refrain from proposing fixes until the actual problem is
> > understood. I suspect that this might be just false positive because the
> > lockdep cannot tell the backing shmem which is internal to ashmem(?)
> > with any general shmem. But somebody really familiar with ashmem code
> > should have a look I believe.

I believe the deadlock is possible if a write to ashmem fd coincides
with shrinking of ashmem caches. I just developed a possible fix here
https://android-review.googlesource.com/c/kernel/common/+/1361205 but
wanted to test it before posting upstream. The idea is to detect such
a race between write and cache shrinking operations and let
ashmem_shrink_scan bail out if the race is detected instead of taking
inode_lock. AFAIK writing ashmem files is not a usual usage for ashmem
(standard usage is to mmap it and use as shared memory), therefore
this bailing out early should not affect ashmem cache maintenance
much. Besides ashmem_shrink_scan already bails out early if a
contention on ashmem_mutex is detected, which is a much more probable
case (see: https://elixir.bootlin.com/linux/v5.8-rc4/source/drivers/staging/android/ashmem.c#L497).

I'll test and post the patch here in a day or so if there are no early
objections to it.
Thanks!

> >
> > --
> > Michal Hocko
> > SUSE Labs

2020-07-14 17:34:33

by Suren Baghdasaryan

[permalink] [raw]

Subject: Re: possible deadlock in shmem_fallocate (4)

On Tue, Jul 14, 2020 at 9:41 AM Suren Baghdasaryan <[email protected]> wrote:
>
> On Tue, Jul 14, 2020 at 8:47 AM Todd Kjos <[email protected]> wrote:
> >
> > +Suren Baghdasaryan +Hridya Valsaraju who support the ashmem driver.
>
> Thanks for looping me in.
>
> >
> >
> > On Tue, Jul 14, 2020 at 7:18 AM Michal Hocko <[email protected]> wrote:
> > >
> > > On Tue 14-07-20 22:08:59, Hillf Danton wrote:
> > > >
> > > > On Tue, 14 Jul 2020 10:26:29 +0200 Michal Hocko wrote:
> > > > > On Tue 14-07-20 13:32:05, Hillf Danton wrote:
> > > > > >
> > > > > > On Mon, 13 Jul 2020 20:41:11 -0700 Eric Biggers wrote:
> > > > > > > On Tue, Jul 14, 2020 at 11:32:52AM +0800, Hillf Danton wrote:
> > > > > > > >
> > > > > > > > Add FALLOC_FL_NOBLOCK and on the shmem side try to lock inode upon the
> > > > > > > > new flag. And the overall upside is to keep the current gfp either in
> > > > > > > > the khugepaged context or not.
> > > > > > > >
> > > > > > > > --- a/include/uapi/linux/falloc.h
> > > > > > > > +++ b/include/uapi/linux/falloc.h
> > > > > > > > @@ -77,4 +77,6 @@
> > > > > > > > */
> > > > > > > > #define FALLOC_FL_UNSHARE_RANGE 0x40
> > > > > > > >
> > > > > > > > +#define FALLOC_FL_NOBLOCK 0x80
> > > > > > > > +
> > > > > > >
> > > > > > > You can't add a new UAPI flag to fix a kernel-internal problem like this.
> > > > > >
> > > > > > Sounds fair, see below.
> > > > > >
> > > > > > What the report indicates is a missing PF_MEMALLOC_NOFS and it's
> > > > > > checked on the ashmem side and added as an exception before going
> > > > > > to filesystem. On shmem side, no more than a best effort is paid
> > > > > > on the inteded exception.
> > > > > >
> > > > > > --- a/drivers/staging/android/ashmem.c
> > > > > > +++ b/drivers/staging/android/ashmem.c
> > > > > > @@ -437,6 +437,7 @@ static unsigned long
> > > > > > ashmem_shrink_scan(struct shrinker *shrink, struct shrink_control *sc)
> > > > > > {
> > > > > > unsigned long freed = 0;
> > > > > > + bool nofs;
> > > > > >
> > > > > > /* We might recurse into filesystem code, so bail out if necessary */
> > > > > > if (!(sc->gfp_mask & __GFP_FS))
> > > > > > @@ -445,6 +446,11 @@ ashmem_shrink_scan(struct shrinker *shri
> > > > > > if (!mutex_trylock(&ashmem_mutex))
> > > > > > return -1;
> > > > > >
> > > > > > + /* enter filesystem with caution: nonblock on locking */
> > > > > > + nofs = current->flags & PF_MEMALLOC_NOFS;
> > > > > > + if (!nofs)
> > > > > > + current->flags |= PF_MEMALLOC_NOFS;
> > > > > > +
> > > > > > while (!list_empty(&ashmem_lru_list)) {
> > > > > > struct ashmem_range *range =
> > > > > > list_first_entry(&ashmem_lru_list, typeof(*range), lru);
> > > > >
> > > > > I do not think this is an appropriate fix. First of all is this a real
> > > > > deadlock or a lockdep false positive? Is it possible that ashmem just
> > > >
> > > > The warning matters and we can do something to quiesce it.
> > >
> > > The underlying issue should be fixed rather than _something_ done to
> > > silence it.
> > >
> > > > > needs to properly annotate its shmem inodes? Or is it possible that
> > > > > the internal backing shmem file is visible to the userspace so the write
> > > > > path would be possible?
> > > > >
> > > > > If this a real problem then the proper fix would be to set internal
> > > > > shmem mapping's gfp_mask to drop __GFP_FS.
> > > >
> > > > Thanks for the tip, see below.
> > > >
> > > > Can you expand a bit on how it helps direct reclaimers like khugepaged
> > > > in the syzbot report wrt deadlock?
> > >
> > > I do not understand your question.
> > >
> > > > TBH I have difficult time following
> > > > up after staring at the chart below for quite a while.
> > >
> > > Yes, lockdep reports are quite hard to follow and they tend to confuse
> > > one hell out of me. But this one says that there is a reclaim dependency
> > > between the shmem inode lock and the reclaim context.
> > >
> > > > Possible unsafe locking scenario:
> > > >
> > > > CPU0 CPU1
> > > > ---- ----
> > > > lock(fs_reclaim);
> > > > lock(&sb->s_type->i_mutex_key#15);
> > > > lock(fs_reclaim);
> > > >
> > > > lock(&sb->s_type->i_mutex_key#15);
> > >
> > > Please refrain from proposing fixes until the actual problem is
> > > understood. I suspect that this might be just false positive because the
> > > lockdep cannot tell the backing shmem which is internal to ashmem(?)
> > > with any general shmem.

Actually looking some more into this, I think you are right. Ashmem
currently does not redirect writes into the backing shmem and
fallocate call from ashmem_shrink_scan is always performed against
asma->file, which is the backing shmem. IOW writes into the backing
shmem are not supported, therefore this concurrent locking can't
happen.

I'm not sure how we can annotate the fact that the inode_lock in
generic_file_write_iter and in shmem_fallocate always operate on
different inodes. Ideas?

> > > But somebody really familiar with ashmem code
> > > should have a look I believe.
>
> I believe the deadlock is possible if a write to ashmem fd coincides
> with shrinking of ashmem caches. I just developed a possible fix here
> https://android-review.googlesource.com/c/kernel/common/+/1361205 but
> wanted to test it before posting upstream. The idea is to detect such
> a race between write and cache shrinking operations and let
> ashmem_shrink_scan bail out if the race is detected instead of taking
> inode_lock. AFAIK writing ashmem files is not a usual usage for ashmem
> (standard usage is to mmap it and use as shared memory), therefore
> this bailing out early should not affect ashmem cache maintenance
> much. Besides ashmem_shrink_scan already bails out early if a
> contention on ashmem_mutex is detected, which is a much more probable
> case (see: https://elixir.bootlin.com/linux/v5.8-rc4/source/drivers/staging/android/ashmem.c#L497).
>
> I'll test and post the patch here in a day or so if there are no early
> objections to it.
> Thanks!
>
> > >
> > > --
> > > Michal Hocko
> > > SUSE Labs

2020-07-15 06:37:37

by Michal Hocko

[permalink] [raw]

Subject: Re: possible deadlock in shmem_fallocate (4)

On Tue 14-07-20 10:32:20, Suren Baghdasaryan wrote:
[...]
> I'm not sure how we can annotate the fact that the inode_lock in
> generic_file_write_iter and in shmem_fallocate always operate on
> different inodes. Ideas?

I believe that the standard way is to use lockdep_set_class on your
backing inode. But asking lockdep experts would be better than relying
on my vague recollection

--
Michal Hocko
SUSE Labs

2020-07-16 02:52:48

by Suren Baghdasaryan

[permalink] [raw]

Subject: Re: possible deadlock in shmem_fallocate (4)

On Tue, Jul 14, 2020 at 11:36 PM Michal Hocko <[email protected]> wrote:
>
> On Tue 14-07-20 10:32:20, Suren Baghdasaryan wrote:
> [...]
> > I'm not sure how we can annotate the fact that the inode_lock in
> > generic_file_write_iter and in shmem_fallocate always operate on
> > different inodes. Ideas?
>
> I believe that the standard way is to use lockdep_set_class on your
> backing inode. But asking lockdep experts would be better than relying
> on my vague recollection

Thanks Michal! I think https://lkml.org/lkml/2020/7/15/1390 should fix
it, however I was unable to reproduce the lockdep warning to confirm
that this warning gets fixed by the patch.

>
> --
> Michal Hocko
> SUSE Labs