From: Theodore Ts'o Subject: Re: [PATCH 2/6] jbd2/log_wait_for_space: drop checkpoint mutex when waiting Date: Wed, 12 Jun 2013 22:55:26 -0400 Message-ID: <20130613025526.GD16959@thunk.org> References: <1370892723-30860-1-git-send-email-paul.gortmaker@windriver.com> <1370990670-49249-1-git-send-email-paul.gortmaker@windriver.com> <1370990670-49249-3-git-send-email-paul.gortmaker@windriver.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: linux-ext4@vger.kernel.org, linux-rt-users@vger.kernel.org To: Paul Gortmaker Return-path: Received: from li9-11.members.linode.com ([67.18.176.11]:57473 "EHLO imap.thunk.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932284Ab3FMCza (ORCPT ); Wed, 12 Jun 2013 22:55:30 -0400 Content-Disposition: inline In-Reply-To: <1370990670-49249-3-git-send-email-paul.gortmaker@windriver.com> Sender: linux-ext4-owner@vger.kernel.org List-ID: On Tue, Jun 11, 2013 at 06:44:26PM -0400, Paul Gortmaker wrote: > While trying to debug an an issue under extreme I/O loading > on preempt-rt kernels, the following backtrace was observed > via SysRQ output: > > rm D ffff8802203afbc0 4600 4878 4748 0x00000000 > ffff8802217bfb78 0000000000000082 ffff88021fc2bb80 ffff88021fc2bb80 > ffff88021fc2bb80 ffff8802217bffd8 ffff8802217bffd8 ffff8802217bffd8 > ffff88021f1d4c80 ffff88021fc2bb80 ffff8802217bfb88 ffff88022437b000 > Call Trace: > [] schedule+0x24/0x70 > [] jbd2_log_wait_commit+0xbd/0x140 > [] ? __init_waitqueue_head+0x50/0x50 > [] jbd2_log_do_checkpoint+0xf5/0x520 > [] __jbd2_log_wait_for_space+0xa9/0x1f0 > [] start_this_handle.isra.10+0x2e0/0x530 > [] ? __init_waitqueue_head+0x50/0x50 > [] jbd2__journal_start+0xc3/0x110 > [] ? ext4_rmdir+0x6e/0x230 > [] jbd2_journal_start+0xe/0x10 > [] ext4_journal_start_sb+0x5b/0x160 > [] ext4_rmdir+0x6e/0x230 > [] vfs_rmdir+0xd5/0x140 > [] do_rmdir+0xdf/0x120 > [] ? task_work_run+0x44/0x80 > [] ? do_notify_resume+0x89/0x100 > [] ? int_signal+0x12/0x17 > [] sys_unlinkat+0x25/0x40 > [] system_call_fastpath+0x16/0x1b > > What is interesting here, is that we call log_wait_commit, from > within wait_for_space, but we are still holding the checkpoint_mutex > as it surrounds mostly the whole of wait_for_space. And then, as we > are waiting, journal_commit_transaction can run, and if the JBD2_FLUSHED > bit is set, then we will also try to take the same checkpoint_mutex. > > It seems that we need to drop the checkpoint_mutex while sitting in > jbd2_log_wait_commit, if we want to guarantee that progress can be made > by jbd2_journal_commit_transaction(). There does not seem to be > anything preempt-rt specific about this, other then perhaps increasing > the odds of it happening. > > Signed-off-by: Paul Gortmaker Applied, thanks. - Ted