2021-02-18 07:07:54

by bugzilla-daemon

[permalink] [raw]
Subject: [Bug 211819] New: Processes stuck in D-state after accessing ext4+fast_commit+fscrypt

https://bugzilla.kernel.org/show_bug.cgi?id=211819

Bug ID: 211819
Summary: Processes stuck in D-state after accessing
ext4+fast_commit+fscrypt
Product: File System
Version: 2.5
Kernel Version: 5.11.0 (5.10?)
Hardware: All
OS: Linux
Tree: Mainline
Status: NEW
Severity: normal
Priority: P1
Component: ext4
Assignee: [email protected]
Reporter: [email protected]
Regression: No

I have several ext4, / and /home among them. /home fs do have user folder(s)
and mine is encrypted with fscrypt (v2 keying).

Recently I enabled fast_commit for all my ext4 fs with tune2fs -O fast_commit.
That time I used 5.10.x kernel. Once 5.11 was released, I tried it immediately.

Seconds after decrypting and using encrypted home folder, I was no longer able
to start new processes, which were not in FS cache -- all of them were in
D-state. I.e. encrypted folder was at /home/user at fs /home, executalbes are
in /bin/ at / fs. For example, if I run dmesg before decrypting and using
/home/user folder, it was (likely) cached, once I'm using /home/user folder
dmesg works, but other tools which were not previously used - doesn't.

There were no any messages in dmesg, logs etc. I used SysRq to reboot the
system. Such situation happend every time. After reset FS had corruptions,
which were fixed by fsck. After several attempts I rollback to 5.10. It worked,
but hanged in same manner somewhere in 12 hours of usage. After that, I made
assumption, that this may be because of fast_commit (or my NVMe dying). I
checked smart status, it was fine. I disabled fast_commit just for /home fs,
rebooted to 5.11, and it works for 24h already (previously it hanged
immediately). So, maybe, this was because of fast_commit.

Maybe it's worth to test this combination somehow..

--
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.


2022-03-25 23:02:35

by bugzilla-daemon

[permalink] [raw]
Subject: [Bug 211819] Processes stuck in D-state after accessing ext4+fast_commit+fscrypt

https://bugzilla.kernel.org/show_bug.cgi?id=211819

AT ([email protected]) changed:

What |Removed |Added
----------------------------------------------------------------------------
CC| |[email protected]

--- Comment #1 from AT ([email protected]) ---
This affects kernels 5.16.15-arch1 and 5.16.16-arch1 at least (distro suggested
I add to this report). While running 5.16.16 with my home directory encrypted
with fscrypt (automatically unlocked on login), I enabled EXT4 fast_commit on
my root drive with:

sudo tune2fs -O fast_commit /dev/nvme0n1p2

( Other filesystem features: has_journal ext_attr resize_inode dir_index
filetype needs_recovery extent flex_bg encrypt sparse_super large_file
huge_file dir_nlink extra_isize metadata_csum )

The system remained responsive but after that there were repeated episodes like
this in the log while running Firefox:

1 Mar 23 22:19:15 hostname kernel: INFO: task IndexedDB #19:3350 blocked for
more than 122 seconds.
1 Mar 23 22:19:15 hostname kernel: Not tainted 5.16.16-arch1-1 #1
2 Mar 23 22:19:15 hostname kernel: "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
3 Mar 23 22:19:15 hostname kernel: task:IndexedDB #19 state:D stack: 0 pid:
3350 ppid: 1393 flags:0x00000002
4 Mar 23 22:19:15 hostname kernel: Call Trace:
5 Mar 23 22:19:15 hostname kernel: <TASK>
6 Mar 23 22:19:15 hostname kernel: __schedule+0x2f6/0xf80
7 Mar 23 22:19:15 hostname kernel: ? touch_atime+0x44/0x190
8 Mar 23 22:19:15 hostname kernel: schedule+0x4b/0xc0
9 Mar 23 22:19:15 hostname kernel: ext4_fc_wait_committing_inode+0xad/0xe0
[ext4 ef9ba7b6856d0b378c7bc2df0cae9987df17f5c0]
10 Mar 23 22:19:15 hostname kernel: ? var_wake_function+0x20/0x20
11 Mar 23 22:19:15 hostname kernel: ext4_fc_start_update+0x43/0x90 [ext4
ef9ba7b6856d0b378c7bc2df0cae9987df17f5c0]
12 Mar 23 22:19:15 hostname kernel: ext4_buffered_write_iter+0x35/0x120 [ext4
ef9ba7b6856d0b378c7bc2df0cae9987df17f5c0]
13 Mar 23 22:19:15 hostname kernel: new_sync_write+0x15c/0x1f0
14 Mar 23 22:19:15 hostname kernel: vfs_write+0x1eb/0x280
15 Mar 23 22:19:15 hostname kernel: ksys_write+0x67/0xe0
16 Mar 23 22:19:15 hostname kernel: do_syscall_64+0x5c/0x80
17 Mar 23 22:19:15 hostname kernel: ? do_syscall_64+0x69/0x80
18 Mar 23 22:19:15 hostname kernel: ? syscall_exit_to_user_mode+0x23/0x40
19 Mar 23 22:19:15 hostname kernel: ? do_syscall_64+0x69/0x80
20 Mar 23 22:19:15 hostname kernel: entry_SYSCALL_64_after_hwframe+0x44/0xae
21 Mar 23 22:19:15 hostname kernel: RIP: 0033:0x7f5c0222928f
22 Mar 23 22:19:15 hostname kernel: RSP: 002b:00007f5becaac970 EFLAGS: 00000293
ORIG_RAX: 0000000000000001
23 Mar 23 22:19:15 hostname kernel: RAX: ffffffffffffffda RBX: 00007f5bbcfcffa0
RCX: 00007f5c0222928f
24 Mar 23 22:19:15 hostname kernel: RDX: 0000000000001000 RSI: 00007f5bb8e89000
RDI: 0000000000000038
25 Mar 23 22:19:15 hostname kernel: RBP: 0000000000000038 R08: 0000000000000000
R09: 0000000000000001
26 Mar 23 22:19:15 hostname kernel: R10: 00007ffea35cb080 R11: 0000000000000293
R12: 00007f5bb8e89000
27 Mar 23 22:19:15 hostname kernel: R13: 0000000000001000 R14: 00007f5bbcfcffa0
R15: 0000000000012000
28 Mar 23 22:19:15 hostname kernel: </TASK>

Shutdown was held up by Firefox/Thunderbird processes hanging, after an expired
timeout I had to forcefully reboot with Magic SysRq key. At one point the
system wouldn't boot with disk errors and I had to fsck.ext4 in the initrd,
luckily managed to get booting again. Log from the latter part of an attempted
suspend:

Mar 24 00:41:16 hostname kernel: PM: suspend entry (s2idle)
Mar 24 00:41:16 hostname kernel: Filesystems sync: 0.009 seconds
Mar 24 00:41:36 hostname kernel: Freezing user space processes ...
Mar 24 00:41:36 hostname kernel: Freezing of tasks failed after 20.009 seconds
(3 tasks refusing to freeze, wq_busy=0):
Mar 24 00:41:36 hostname kernel: task:IndexedDB #19 state:D stack: 0 pid: 3350
ppid: 1393 flags:0x00000006
Mar 24 00:41:36 hostname kernel: Call Trace:
Mar 24 00:41:36 hostname kernel: <TASK>
Mar 24 00:41:36 hostname kernel: __schedule+0x2f6/0xf80
Mar 24 00:41:36 hostname kernel: ? touch_atime+0x44/0x190
Mar 24 00:41:36 hostname kernel: schedule+0x4b/0xc0
Mar 24 00:41:36 hostname kernel: ext4_fc_wait_committing_inode+0xad/0xe0 [ext4
ef9ba7b6856d0b378c7bc2df0cae9987df17f5c0]
Mar 24 00:41:36 hostname kernel: ? var_wake_function+0x20/0x20
Mar 24 00:41:36 hostname kernel: ext4_fc_start_update+0x43/0x90 [ext4
ef9ba7b6856d0b378c7bc2df0cae9987df17f5c0]
Mar 24 00:41:36 hostname kernel: ext4_buffered_write_iter+0x35/0x120 [ext4
ef9ba7b6856d0b378c7bc2df0cae9987df17f5c0]
Mar 24 00:41:36 hostname kernel: new_sync_write+0x15c/0x1f0
Mar 24 00:41:36 hostname kernel: vfs_write+0x1eb/0x280
Mar 24 00:41:36 hostname kernel: ksys_write+0x67/0xe0
Mar 24 00:41:36 hostname kernel: do_syscall_64+0x5c/0x80
Mar 24 00:41:36 hostname kernel: ? do_syscall_64+0x69/0x80
Mar 24 00:41:36 hostname kernel: ? syscall_exit_to_user_mode+0x23/0x40
Mar 24 00:41:36 hostname kernel: ? do_syscall_64+0x69/0x80
Mar 24 00:41:36 hostname kernel: entry_SYSCALL_64_after_hwframe+0x44/0xae
Mar 24 00:41:36 hostname kernel: RIP: 0033:0x7f5c0222928f
Mar 24 00:41:36 hostname kernel: RSP: 002b:00007f5becaac970 EFLAGS: 00000293
ORIG_RAX: 0000000000000001
Mar 24 00:41:36 hostname kernel: RAX: ffffffffffffffda RBX: 00007f5bbcfcffa0
RCX: 00007f5c0222928f
Mar 24 00:41:36 hostname kernel: RDX: 0000000000001000 RSI: 00007f5bb8e89000
RDI: 0000000000000038
Mar 24 00:41:36 hostname kernel: RBP: 0000000000000038 R08: 0000000000000000
R09: 0000000000000001
Mar 24 00:41:36 hostname kernel: R10: 00007ffea35cb080 R11: 0000000000000293
R12: 00007f5bb8e89000
Mar 24 00:41:36 hostname kernel: R13: 0000000000001000 R14: 00007f5bbcfcffa0
R15: 0000000000012000
Mar 24 00:41:36 hostname kernel: </TASK>
Mar 24 00:41:36 hostname kernel: task:Indexed~ Mnt #1 state:D stack: 0
pid:64999 ppid: 1393 flags:0x00000006
Mar 24 00:41:36 hostname kernel: Call Trace:
Mar 24 00:41:36 hostname kernel: <TASK>
Mar 24 00:41:36 hostname kernel: __schedule+0x2f6/0xf80
Mar 24 00:41:36 hostname kernel: schedule+0x4b/0xc0
Mar 24 00:41:36 hostname kernel: ext4_fc_wait_committing_inode+0xad/0xe0 [ext4
ef9ba7b6856d0b378c7bc2df0cae9987df17f5c0]
Mar 24 00:41:36 hostname kernel: ? var_wake_function+0x20/0x20
Mar 24 00:41:36 hostname kernel: ext4_fc_start_update+0x43/0x90 [ext4
ef9ba7b6856d0b378c7bc2df0cae9987df17f5c0]
Mar 24 00:41:36 hostname kernel: ext4_buffered_write_iter+0x35/0x120 [ext4
ef9ba7b6856d0b378c7bc2df0cae9987df17f5c0]
Mar 24 00:41:36 hostname kernel: new_sync_write+0x15c/0x1f0
Mar 24 00:41:36 hostname kernel: vfs_write+0x1eb/0x280
Mar 24 00:41:36 hostname kernel: ksys_write+0x67/0xe0
Mar 24 00:41:36 hostname kernel: do_syscall_64+0x5c/0x80
Mar 24 00:41:36 hostname kernel: ? __fget_light+0x8f/0x110
Mar 24 00:41:36 hostname kernel: ? syscall_exit_to_user_mode+0x23/0x40
Mar 24 00:41:36 hostname kernel: ? do_syscall_64+0x69/0x80
Mar 24 00:41:36 hostname kernel: ? do_syscall_64+0x69/0x80
Mar 24 00:41:36 hostname kernel: ? syscall_exit_to_user_mode+0x23/0x40
Mar 24 00:41:36 hostname kernel: ? do_syscall_64+0x69/0x80
Mar 24 00:41:36 hostname kernel: ? do_syscall_64+0x69/0x80
Mar 24 00:41:36 hostname kernel: entry_SYSCALL_64_after_hwframe+0x44/0xae
Mar 24 00:41:36 hostname kernel: RIP: 0033:0x7f5c0222928f
Mar 24 00:41:36 hostname kernel: RSP: 002b:00007f5bbc0ebcf0 EFLAGS: 00000293
ORIG_RAX: 0000000000000001
Mar 24 00:41:36 hostname kernel: RAX: ffffffffffffffda RBX: 00007f5ba7fa5608
RCX: 00007f5c0222928f
Mar 24 00:41:36 hostname kernel: RDX: 0000000000000018 RSI: 00007f5bbc0ebeb0
RDI: 00000000000000ce
Mar 24 00:41:36 hostname kernel: RBP: 00000000000000ce R08: 0000000000000000
R09: 0000000000000000
Mar 24 00:41:36 hostname kernel: R10: 00007ffea35cb080 R11: 0000000000000293
R12: 00007f5bbc0ebeb0
Mar 24 00:41:36 hostname kernel: R13: 0000000000000018 R14: 00007f5ba7fa5608
R15: 0000000000058860
Mar 24 00:41:36 hostname kernel: </TASK>
Mar 24 00:41:36 hostname kernel: task:Indexed~ Mnt #2 state:D stack: 0
pid:65000 ppid: 1393 flags:0x00000006
Mar 24 00:41:36 hostname kernel: Call Trace:
Mar 24 00:41:36 hostname kernel: <TASK>
Mar 24 00:41:36 hostname kernel: __schedule+0x2f6/0xf80
Mar 24 00:41:36 hostname kernel: ? atime_needs_update+0x82/0x100
Mar 24 00:41:36 hostname kernel: schedule+0x4b/0xc0
Mar 24 00:41:36 hostname kernel: ext4_fc_wait_committing_inode+0xad/0xe0 [ext4
ef9ba7b6856d0b378c7bc2df0cae9987df17f5c0]
Mar 24 00:41:36 hostname kernel: ? var_wake_function+0x20/0x20
Mar 24 00:41:36 hostname kernel: ext4_fc_start_update+0x43/0x90 [ext4
ef9ba7b6856d0b378c7bc2df0cae9987df17f5c0]
Mar 24 00:41:36 hostname kernel: ext4_buffered_write_iter+0x35/0x120 [ext4
ef9ba7b6856d0b378c7bc2df0cae9987df17f5c0]
Mar 24 00:41:36 hostname kernel: new_sync_write+0x15c/0x1f0
Mar 24 00:41:36 hostname kernel: vfs_write+0x1eb/0x280
Mar 24 00:41:36 hostname kernel: ksys_write+0x67/0xe0
Mar 24 00:41:36 hostname kernel: do_syscall_64+0x5c/0x80
Mar 24 00:41:36 hostname kernel: ? syscall_exit_to_user_mode+0x23/0x40
Mar 24 00:41:36 hostname kernel: ? do_syscall_64+0x69/0x80
Mar 24 00:41:36 hostname kernel: ? syscall_exit_to_user_mode+0x23/0x40
Mar 24 00:41:36 hostname kernel: ? do_syscall_64+0x69/0x80
Mar 24 00:41:36 hostname kernel: ? syscall_exit_to_user_mode+0x23/0x40
Mar 24 00:41:36 hostname kernel: ? do_syscall_64+0x69/0x80
Mar 24 00:41:36 hostname kernel: ? native_flush_tlb_local+0x31/0x40
Mar 24 00:41:36 hostname kernel: ? flush_tlb_func+0xc8/0x1d0
Mar 24 00:41:36 hostname kernel: ? sched_clock_cpu+0x9/0xb0
Mar 24 00:41:36 hostname kernel: ? irqtime_account_irq+0x38/0xb0
Mar 24 00:41:36 hostname kernel: entry_SYSCALL_64_after_hwframe+0x44/0xae
Mar 24 00:41:36 hostname kernel: RIP: 0033:0x7f5c0222928f
Mar 24 00:41:36 hostname kernel: RSP: 002b:00007f5baf502130 EFLAGS: 00000293
ORIG_RAX: 0000000000000001
Mar 24 00:41:36 hostname kernel: RAX: ffffffffffffffda RBX: 00007f5b98511da0
RCX: 00007f5c0222928f
Mar 24 00:41:36 hostname kernel: RDX: 0000000000001000 RSI: 00007f5baaf4c000
RDI: 00000000000000a2
Mar 24 00:41:36 hostname kernel: RBP: 00000000000000a2 R08: 0000000000000000
R09: 0000000000000001
Mar 24 00:41:36 hostname kernel: R10: 00007ffea35cb080 R11: 0000000000000293
R12: 00007f5baaf4c000
Mar 24 00:41:36 hostname kernel: R13: 0000000000001000 R14: 00007f5b98511da0
R15: 0000000000007000
Mar 24 00:41:36 hostname kernel: </TASK>
Mar 24 00:41:36 hostname kernel:
Mar 24 00:41:36 hostname kernel: OOM killer enabled.
Mar 24 00:41:36 hostname kernel: Restarting tasks ... done.
Mar 24 00:41:36 hostname kernel: PM: suspend exit
Mar 24 00:41:36 hostname systemd-sleep[65047]: Failed to put system to sleep.
System resumed again: Device or resource busy
Mar 24 00:41:36 hostname systemd[1]: systemd-suspend.service: Main process
exited, code=exited, status=1/FAILURE
Mar 24 00:41:36 hostname audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295
ses=4294967295 subj==unconfined msg='unit=systemd-suspend comm="systemd"
exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=failed'
Mar 24 00:41:36 hostname kernel: audit: type=1130 audit(1648035696.209:166):
pid=1 uid=0 auid=4294967295 ses=4294967295 subj==unconfined
msg='unit=systemd-suspend comm="systemd" exe="/usr/lib/systemd/systemd"
hostname=? addr=? terminal=? res=failed'
Mar 24 00:41:36 hostname systemd[1]: systemd-suspend.service: Failed with
result 'exit-code'.
Mar 24 00:41:36 hostname systemd[1]: Failed to start System Suspend.
Mar 24 00:41:36 hostname systemd[1]: Dependency failed for Suspend.
Mar 24 00:41:36 hostname systemd[1]: suspend.target: Job suspend.target/start
failed with result 'dependency'.
Mar 24 00:41:36 hostname systemd[1]: Stopped target Sleep.
Mar 24 00:41:36 hostname systemd-logind[595]: Operation 'sleep' finished.

I tried both kernels with the same issue. Then to remove it I ran:

sudo tune2fs -O ^fast_commit /dev/nvme0n1p2

which still resulted in errors until I rebooted and appended "fsck.mode=force"
to the Linux boot cmdline; the system now runs OK. Hope this helps debug the
fscrypt/fast_commit interaction.

--
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.