Return-Path: Received: from out30-132.freemail.mail.aliyun.com ([115.124.30.132]:42088 "EHLO out30-132.freemail.mail.aliyun.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1732875AbeKMW6f (ORCPT ); Tue, 13 Nov 2018 17:58:35 -0500 Subject: Re: [PATCH] ext4: fix deadlock while checkpoint thread waits commit thread to finish To: Jan Kara Cc: linux-ext4@vger.kernel.org, jack@suse.com, tytso@mit.edu References: <20181113085907.22545-1-xiaoguang.wang@linux.alibaba.com> <20181113123917.GB1292@quack2.suse.cz> From: Xiaoguang Wang Message-ID: <9157e14b-c341-7089-f17b-1c282137940f@linux.alibaba.com> Date: Tue, 13 Nov 2018 21:00:20 +0800 MIME-Version: 1.0 In-Reply-To: <20181113123917.GB1292@quack2.suse.cz> Content-Type: text/plain; charset=gbk; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-ext4-owner@vger.kernel.org List-ID: hi, > On Tue 13-11-18 16:59:07, Xiaoguang Wang wrote: >> This issue was found when I tried to put checkpoint work in a separate thread, >> the deadlock below happened: >> Thread1 | Thread2 >> __jbd2_log_wait_for_space | >> jbd2_log_do_checkpoint (hold j_checkpoint_mutex)| >> if (jh->b_transaction != NULL) | >> ... | >> jbd2_log_start_commit(journal, tid); |jbd2_update_log_tail >> | will lock j_checkpoint_mutex, >> | but will be blocked here. >> | >> jbd2_log_wait_commit(journal, tid); | >> wait_event(journal->j_wait_done_commit, | >> !tid_gt(tid, journal->j_commit_sequence)); | >> ... |wake_up(j_wait_done_commit) >> } | >> >> then deadlock occurs, Thread1 will never be waken up. >> >> To fix this issue, here we introduce a new j_loginfo_mutex to protect >> concurrent modifications to journal log tail info. >> >> Signed-off-by: Xiaoguang Wang > > Thanks for the analysis and the patch. I agree this deadlock is possible > however I'm not sure your solution is quite correct. There are other places > besides __jbd2_update_log_tail() that do update log tail. The most common > one is jbd2_mark_journal_empty() and the functions calling it but there are > a few other places as well. All these need to be synchronized properly so > that journal tail does not get corrupted. The your patch needs some more > work. Yes, thanks for pointing this out. > > But in general I like the idea of a special lock protecting journal tail > (I'd just call it j_tail_mutex to explicitely express that) so that > pushing of the journal tail possibly done during transaction commit would not > be blocked by checkpointing. That can be a performance win as well. Agree with you :) > > Since proper locking change is going to be a bit more involved, can you > perhaps fix this deadlock by just dropping j_checkpoint_mutex in > log_do_checkpoint() when we are going to wait for transaction commit. I've > checked and that should be fine and that is going to be much easier change > to backport into stable kernels... OK, I'll try this method and have a test, thanks. Regards, Xiaoguang Wang > > Honza > > >> --- >> fs/jbd2/checkpoint.c | 2 +- >> fs/jbd2/commit.c | 6 +++--- >> fs/jbd2/journal.c | 16 +++++++++------- >> include/linux/jbd2.h | 9 ++++++++- >> 4 files changed, 21 insertions(+), 12 deletions(-) >> >> diff --git a/fs/jbd2/checkpoint.c b/fs/jbd2/checkpoint.c >> index c125d662777c..1729d6298895 100644 >> --- a/fs/jbd2/checkpoint.c >> +++ b/fs/jbd2/checkpoint.c >> @@ -404,7 +404,7 @@ int jbd2_cleanup_journal_tail(journal_t *journal) >> if (journal->j_flags & JBD2_BARRIER) >> blkdev_issue_flush(journal->j_fs_dev, GFP_NOFS, NULL); >> >> - return __jbd2_update_log_tail(journal, first_tid, blocknr); >> + return jbd2_update_log_tail(journal, first_tid, blocknr); >> } >> >> >> diff --git a/fs/jbd2/commit.c b/fs/jbd2/commit.c >> index 150cc030b4d7..f139f5465687 100644 >> --- a/fs/jbd2/commit.c >> +++ b/fs/jbd2/commit.c >> @@ -383,9 +383,9 @@ void jbd2_journal_commit_transaction(journal_t *journal) >> /* Do we need to erase the effects of a prior jbd2_journal_flush? */ >> if (journal->j_flags & JBD2_FLUSHED) { >> jbd_debug(3, "super block updated\n"); >> - mutex_lock_io(&journal->j_checkpoint_mutex); >> + mutex_lock_io(&journal->j_loginfo_mutex); >> /* >> - * We hold j_checkpoint_mutex so tail cannot change under us. >> + * We hold j_loginfo_mutex so tail cannot change under us. >> * We don't need any special data guarantees for writing sb >> * since journal is empty and it is ok for write to be >> * flushed only with transaction commit. >> @@ -394,7 +394,7 @@ void jbd2_journal_commit_transaction(journal_t *journal) >> journal->j_tail_sequence, >> journal->j_tail, >> REQ_SYNC); >> - mutex_unlock(&journal->j_checkpoint_mutex); >> + mutex_unlock(&journal->j_loginfo_mutex); >> } else { >> jbd_debug(3, "superblock not updated\n"); >> } >> diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c >> index 8ef6b6daaa7a..be2c10ff5bae 100644 >> --- a/fs/jbd2/journal.c >> +++ b/fs/jbd2/journal.c >> @@ -940,8 +940,6 @@ int __jbd2_update_log_tail(journal_t *journal, tid_t tid, unsigned long block) >> unsigned long freed; >> int ret; >> >> - BUG_ON(!mutex_is_locked(&journal->j_checkpoint_mutex)); >> - >> /* >> * We cannot afford for write to remain in drive's caches since as >> * soon as we update j_tail, next transaction can start reusing journal >> @@ -978,12 +976,16 @@ int __jbd2_update_log_tail(journal_t *journal, tid_t tid, unsigned long block) >> * provided log tail and locks j_checkpoint_mutex. So it is safe against races >> * with other threads updating log tail. >> */ >> -void jbd2_update_log_tail(journal_t *journal, tid_t tid, unsigned long block) >> +int jbd2_update_log_tail(journal_t *journal, tid_t tid, unsigned long block) >> { >> - mutex_lock_io(&journal->j_checkpoint_mutex); >> + int ret = 0; >> + >> + mutex_lock_io(&journal->j_loginfo_mutex); >> if (tid_gt(tid, journal->j_tail_sequence)) >> - __jbd2_update_log_tail(journal, tid, block); >> - mutex_unlock(&journal->j_checkpoint_mutex); >> + ret = __jbd2_update_log_tail(journal, tid, block); >> + mutex_unlock(&journal->j_loginfo_mutex); >> + >> + return ret; >> } >> >> struct jbd2_stats_proc_session { >> @@ -1147,6 +1149,7 @@ static journal_t *journal_init_common(struct block_device *bdev, >> init_waitqueue_head(&journal->j_wait_reserved); >> mutex_init(&journal->j_barrier); >> mutex_init(&journal->j_checkpoint_mutex); >> + mutex_init(&journal->j_loginfo_mutex); >> spin_lock_init(&journal->j_revoke_lock); >> spin_lock_init(&journal->j_list_lock); >> rwlock_init(&journal->j_state_lock); >> @@ -1420,7 +1423,6 @@ int jbd2_journal_update_sb_log_tail(journal_t *journal, tid_t tail_tid, >> if (is_journal_aborted(journal)) >> return -EIO; >> >> - BUG_ON(!mutex_is_locked(&journal->j_checkpoint_mutex)); >> jbd_debug(1, "JBD2: updating superblock (start %lu, seq %u)\n", >> tail_block, tail_tid); >> >> diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h >> index b708e5169d1d..a9c2928aea35 100644 >> --- a/include/linux/jbd2.h >> +++ b/include/linux/jbd2.h >> @@ -862,6 +862,13 @@ struct journal_s >> */ >> struct buffer_head *j_chkpt_bhs[JBD2_NR_BATCH]; >> >> + /** >> + * @j_loginfo_mutex: >> + * >> + * Semaphore for locking against concurrent update journal info. >> + */ >> + struct mutex j_loginfo_mutex; >> + >> /** >> * @j_head: >> * >> @@ -1265,7 +1272,7 @@ int jbd2_journal_next_log_block(journal_t *, unsigned long long *); >> int jbd2_journal_get_log_tail(journal_t *journal, tid_t *tid, >> unsigned long *block); >> int __jbd2_update_log_tail(journal_t *journal, tid_t tid, unsigned long block); >> -void jbd2_update_log_tail(journal_t *journal, tid_t tid, unsigned long block); >> +int jbd2_update_log_tail(journal_t *journal, tid_t tid, unsigned long block); >> >> /* Commit management */ >> extern void jbd2_journal_commit_transaction(journal_t *); >> -- >> 2.14.4.44.g2045bb6 >> >>