Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756397AbdGCOai (ORCPT ); Mon, 3 Jul 2017 10:30:38 -0400 Received: from mail.linuxfoundation.org ([140.211.169.12]:36342 "EHLO mail.linuxfoundation.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932597AbdGCNnY (ORCPT ); Mon, 3 Jul 2017 09:43:24 -0400 From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Chandan Rajendra , Filipe Manana , David Sterba , Sasha Levin Subject: [PATCH 4.9 080/172] Btrfs: Fix deadlock between direct IO and fast fsync Date: Mon, 3 Jul 2017 15:34:20 +0200 Message-Id: <20170703133418.036371604@linuxfoundation.org> X-Mailer: git-send-email 2.13.2 In-Reply-To: <20170703133414.260777365@linuxfoundation.org> References: <20170703133414.260777365@linuxfoundation.org> User-Agent: quilt/0.65 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4185 Lines: 96 4.9-stable review patch. If anyone has any objections, please let me know. ------------------ From: Chandan Rajendra [ Upstream commit 97dcdea076ecef41ea4aaa23d4397c2f622e4265 ] The following deadlock is seen when executing generic/113 test, ---------------------------------------------------------+---------------------------------------------------- Direct I/O task Fast fsync task ---------------------------------------------------------+---------------------------------------------------- btrfs_direct_IO __blockdev_direct_IO do_blockdev_direct_IO do_direct_IO btrfs_get_blocks_direct while (blocks needs to written) get_more_blocks (first iteration) btrfs_get_blocks_direct btrfs_create_dio_extent down_read(&BTRFS_I(inode) >dio_sem) Create and add extent map and ordered extent up_read(&BTRFS_I(inode) >dio_sem) btrfs_sync_file btrfs_log_dentry_safe btrfs_log_inode_parent btrfs_log_inode btrfs_log_changed_extents down_write(&BTRFS_I(inode) >dio_sem) Collect new extent maps and ordered extents wait for ordered extent completion get_more_blocks (second iteration) btrfs_get_blocks_direct btrfs_create_dio_extent down_read(&BTRFS_I(inode) >dio_sem) -------------------------------------------------------------------------------------------------------------- In the above description, Btrfs direct I/O code path has not yet started submitting bios for file range covered by the initial ordered extent. Meanwhile, The fast fsync task obtains the write semaphore and waits for I/O on the ordered extent to get completed. However, the Direct I/O task is now blocked on obtaining the read semaphore. To resolve the deadlock, this commit modifies the Direct I/O code path to obtain the read semaphore before invoking __blockdev_direct_IO(). The semaphore is then given up after __blockdev_direct_IO() returns. This allows the Direct I/O code to complete I/O on all the ordered extents it creates. Signed-off-by: Chandan Rajendra Reviewed-by: Filipe Manana Signed-off-by: David Sterba Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- fs/btrfs/inode.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -7235,7 +7235,6 @@ static struct extent_map *btrfs_create_d struct extent_map *em = NULL; int ret; - down_read(&BTRFS_I(inode)->dio_sem); if (type != BTRFS_ORDERED_NOCOW) { em = create_pinned_em(inode, start, len, orig_start, block_start, block_len, orig_block_len, @@ -7254,7 +7253,6 @@ static struct extent_map *btrfs_create_d em = ERR_PTR(ret); } out: - up_read(&BTRFS_I(inode)->dio_sem); return em; } @@ -8707,6 +8705,7 @@ static ssize_t btrfs_direct_IO(struct ki dio_data.unsubmitted_oe_range_start = (u64)offset; dio_data.unsubmitted_oe_range_end = (u64)offset; current->journal_info = &dio_data; + down_read(&BTRFS_I(inode)->dio_sem); } else if (test_bit(BTRFS_INODE_READDIO_NEED_LOCK, &BTRFS_I(inode)->runtime_flags)) { inode_dio_end(inode); @@ -8719,6 +8718,7 @@ static ssize_t btrfs_direct_IO(struct ki iter, btrfs_get_blocks_direct, NULL, btrfs_submit_direct, flags); if (iov_iter_rw(iter) == WRITE) { + up_read(&BTRFS_I(inode)->dio_sem); current->journal_info = NULL; if (ret < 0 && ret != -EIOCBQUEUED) { if (dio_data.reserve)