From: jiayingz@google.com (Jiaying Zhang) Subject: [PATCH] ext4: call ext4_ioend_wait and ext4_flush_completed_IO in ext4_evict_inode Date: Thu, 11 Aug 2011 19:31:04 -0700 (PDT) Message-ID: <20110812023105.18FBB4207C@ruihe.smo.corp.google.com> Cc: linux-ext4@vger.kernel.org To: tytso@mit.edu Return-path: Received: from smtp-out.google.com ([216.239.44.51]:38659 "EHLO smtp-out.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751821Ab1HLCbI (ORCPT ); Thu, 11 Aug 2011 22:31:08 -0400 Sender: linux-ext4-owner@vger.kernel.org List-ID: Flush inode's i_completed_io_list before calling ext4_io_wait to prevent the following deadlock scenario: A page fault happens while some process is writing inode A. During page fault, shrink_icache_memory is called that in turn evicts another inode B. Inode B has some pending io_end work so it calls ext4_ioend_wait() that waits for inode B's i_ioend_count to become zero. However, inode B's ioend work was queued behind some of inode A's ioend work on the same cpu's ext4-dio-unwritten workqueue. As the ext4-dio-unwritten thread on that cpu is processing inode A's ioend work, it tries to grab inode A's i_mutex lock. Since the i_mutex lock of inode A is still hold before the page fault happened, we enter a deadlock. Also moves ext4_flush_completed_IO and ext4_ioend_wait from ext4_destroy_inode() to ext4_evict_inode(). During inode deleteion, ext4_evict_inode() is called before ext4_destroy_inode() and in ext4_evict_inode(), we may call ext4_truncate() without holding i_mutex lock. As a result, there is a race between flush_completed_IO that is called from ext4_ext_truncate() and ext4_end_io_work, which may cause corruption on an io_end structure. This change moves ext4_flush_completed_IO and ext4_ioend_wait from ext4_destroy_inode() to ext4_evict_inode() to resolve the race between ext4_truncate() and ext4_end_io_work during inode deletion. Signed-off-by: Jiaying Zhang --- fs/ext4/inode.c | 6 ++++++ fs/ext4/super.c | 1 - 2 files changed, 6 insertions(+), 1 deletions(-) diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index de50b16..2aba49e 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -120,6 +120,12 @@ void ext4_evict_inode(struct inode *inode) int err; trace_ext4_evict_inode(inode); + + mutex_lock(&inode->i_mutex); + ext4_flush_completed_IO(inode); + mutex_unlock(&inode->i_mutex); + ext4_ioend_wait(inode); + if (inode->i_nlink) { truncate_inode_pages(&inode->i_data, 0); goto no_delete; diff --git a/fs/ext4/super.c b/fs/ext4/super.c index 9ea71aa..111ed9d 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -892,7 +892,6 @@ static void ext4_i_callback(struct rcu_head *head) static void ext4_destroy_inode(struct inode *inode) { - ext4_ioend_wait(inode); if (!list_empty(&(EXT4_I(inode)->i_orphan))) { ext4_msg(inode->i_sb, KERN_ERR, "Inode %lu (%p): orphan list check failed!", -- 1.7.3.1