Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751969AbdGYMSN (ORCPT ); Tue, 25 Jul 2017 08:18:13 -0400 Received: from mail-ua0-f195.google.com ([209.85.217.195]:35242 "EHLO mail-ua0-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751933AbdGYMSJ (ORCPT ); Tue, 25 Jul 2017 08:18:09 -0400 MIME-Version: 1.0 In-Reply-To: <20170625230756.68b4de21.alex_y_xu@yahoo.ca> References: <20170625230756.68b4de21.alex_y_xu@yahoo.ca> From: Ming Lei Date: Tue, 25 Jul 2017 20:18:07 +0800 Message-ID: Subject: Re: bfq/ext4 disk IO hangs forever on resume To: Alex Xu Cc: Linux Kernel Mailing List , "open list:EXT4 FILE SYSTEM" Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4231 Lines: 118 On Mon, Jun 26, 2017 at 11:07 AM, Alex Xu wrote: > Hi, > > I get hangs when resuming when using bfq-mq with ext4 on 4.12-rc6+ > (currently a4fd8b3accf43d407472e34403d4b0a4df5c0e71). Please test the following patch to see if it can help your issue: https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git/commit/?h=for-linus&id=765e40b675a9566459ddcb8358ad16f3b8344bbe > > Steps to reproduce: > 1. boot computer > 2. systemctl suspend > 3. wait few seconds > 4. press power button > 5. type "ls" into console or SSH or do anything that does disk IO > > Expected results: > Command is executed. > > Actual results: > Command hangs. > > lockdep has no comments, but sysrq-d shows that i_mutex_dir_key and > jbd2_handle are held by multiple processes, leading me to suspect that > ext4 is at least partially involved. [0] > > sysrq-w lists many blocked processes [1] > > This happens consistently, every time I resume the system from > suspend-to-RAM using this configuration. Switching to noop IO scheduler > makes it stop happening. I haven't tried switching filesystems yet. > > I can do more debugging (enable KASAN or whatever), but usually when I > bother doing that I find someone has already sent a patch for the issue. > > Please CC me on replies. > > Cheers, > Alex. > > [0] > > 4 locks held by systemd/384: > #0: (sb_writers#3){.+.+.+}, at: [] mnt_want_write+0x1f/0x50 > #1: (&type->i_mutex_dir_key/1){+.+.+.}, at: [] do_rmdir+0x15e/0x1e0 > #2: (&type->i_mutex_dir_key){++++++}, at: [] vfs_rmdir+0x50/0x130 > #3: (jbd2_handle){++++..}, at: [] start_this_handle+0xff/0x430 > 4 locks held by syncthing/279: > #0: (&f->f_pos_lock){+.+.+.}, at: [] __fdget_pos+0x3e/0x50 > #1: (sb_writers#3){.+.+.+}, at: [] vfs_write+0x17c/0x1d0 > #2: (&sb->s_type->i_mutex_key#9){+.+.+.}, at: [] ext4_file_write_iter+0x57/0x350 > #3: (jbd2_handle){++++..}, at: [] start_this_handle+0xff/0x430 > 2 locks held by zsh/238: > #0: (&tty->ldisc_sem){++++.+}, at: [] ldsem_down_read+0x1f/0x30 > #1: (&ldata->atomic_read_lock){+.+...}, at: [] n_tty_read+0xb0/0x8b0 > 2 locks held by sddm-greeter/267: > #0: (sb_writers#3){.+.+.+}, at: [] mnt_want_write+0x1f/0x50 > #1: (&type->i_mutex_dir_key){++++++}, at: [] path_openat+0x2d8/0xa10 > 2 locks held by kworker/u16:28/330: > #0: ("events_unbound"){.+.+.+}, at: [] process_one_work+0x1c3/0x420 > #1: ((&entry->work)){+.+.+.}, at: [] process_one_work+0x1c3/0x420 > 1 lock held by zsh/382: > #0: (&sig->cred_guard_mutex){+.+.+.}, at: [] prepare_bprm_creds+0x30/0x70 > > [1] > > task PC stack pid father > systemd D 0 384 0 0x00000000 > Call Trace: > __schedule+0x295/0x7c0 > ? bit_wait+0x50/0x50 > ? bit_wait+0x50/0x50 > schedule+0x31/0x80 > io_schedule+0x11/0x40 > bit_wait_io+0xc/0x50 > __wait_on_bit+0x53/0x80 > ? bit_wait+0x50/0x50 > out_of_line_wait_on_bit+0x6e/0x80 > ? autoremove_wake_function+0x30/0x30 > do_get_write_access+0x20b/0x420 > jbd2_journal_get_write_access+0x2c/0x60 > __ext4_journal_get_write_access+0x55/0xa0 > ext4_delete_entry+0x8c/0x140 > ? __ext4_journal_start_sb+0x4e/0xa0 > ext4_rmdir+0x114/0x250 > vfs_rmdir+0x6e/0x130 > do_rmdir+0x1a3/0x1e0 > SyS_unlinkat+0x1d/0x30 > entry_SYSCALL_64_fastpath+0x18/0xad > jbd2/sda1-8 D 0 81 2 0x00000000 > Call Trace: > __schedule+0x295/0x7c0 > ? bit_wait+0x50/0x50 > schedule+0x31/0x80 > io_schedule+0x11/0x40 > bit_wait_io+0xc/0x50 > __wait_on_bit+0x53/0x80 > ? bit_wait+0x50/0x50 > out_of_line_wait_on_bit+0x6e/0x80 > ? autoremove_wake_function+0x30/0x30 > __wait_on_buffer+0x2d/0x30 > jbd2_journal_commit_transaction+0xe6a/0x1700 > kjournald2+0xc8/0x270 > ? kjournald2+0xc8/0x270 > ? wake_atomic_t_function+0x50/0x50 > kthread+0xfe/0x130 > ? commit_timeout+0x10/0x10 > ? kthread_create_on_node+0x40/0x40 > ret_from_fork+0x27/0x40 > [ more processes follow, some different tracebacks ] -- Ming Lei