Received: by 2002:a25:31c3:0:0:0:0:0 with SMTP id x186csp6568318ybx; Mon, 11 Nov 2019 11:05:43 -0800 (PST) X-Google-Smtp-Source: APXvYqx0dSNO4RdQ9KKh2zrAzq3LNFFxBtfF2qOqS0EREs9s/qdf16tqV2SHZsr3MQGJ7zr4hHpD X-Received: by 2002:aa7:d147:: with SMTP id r7mr28288471edo.198.1573499143748; Mon, 11 Nov 2019 11:05:43 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1573499143; cv=none; d=google.com; s=arc-20160816; b=g5ky8AKF3oRUJB+EcOrTy8E2YCrzmUw2R1eCGrvyy9YIIQmKGC7JnfYaWmQuIHJt6O 7KDGxmHTqW4uYRufXaryUiFEbzOYYvgIIaJBMPYsRx+jZ+72HMNWrRcjYa7Tt6HReejS roaReMeuqXlAGsRntQZiKS+J6r9qtvE6ImtU0rD9NNtQ2nFR5jdI0AGndWRsVhitzZ73 Jml0USokyGsl0t1UeD/+PMBE9tR+TLoRqq6asJzQKbAZb6BiGWiNREr2ylP6f2o/7/y/ nvkCK7ZA/dLucyCy9xsc7wgPfJbsRxliNlIW4eAkFeXUv4VX0KdFJNzTL9yWY2zEI9oL r1ZQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature; bh=gqwkqq8NEYITOUDOm13XA0v5oPcO7eh6JwYVVdtU0YI=; b=NWlCpJZ4Hwk3LnWyi0juHX0I0dWrvH8JBJOwm70WFt8cBd7iBFwjmVSwXfndzw2eUI PaQEJHYKnTxlePchrWhfyJBDyU3cOeEPXOTUSrLLciFioTkbBj9ao8ewrBR2v4COUT6B fpHzJ6cnCxkn0yoSA+TLt1nqP2G70EUVN2f2rmvDH0Zo20R4jd0xQ/5HE8+beLIPf89t CeMGLZK1RyPO3CN4ik4GdJUbNbjP5JW/PJ6EwnZ4YPK6yIzRB7JjR9eKUMhUMTWoLD3Y 1qCNKqjiNZEThaO7/v3agu0MRUYqFEDr+n/LPurKKCwBDncjh88i63jMjh7GJ85ntiAI FaZQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=DCR+gme4; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id f22si5246020eds.54.2019.11.11.11.05.18; Mon, 11 Nov 2019 11:05:43 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=DCR+gme4; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729668AbfKKSqv (ORCPT + 99 others); Mon, 11 Nov 2019 13:46:51 -0500 Received: from mail.kernel.org ([198.145.29.99]:39412 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729363AbfKKSqt (ORCPT ); Mon, 11 Nov 2019 13:46:49 -0500 Received: from localhost (83-86-89-107.cable.dynamic.v4.ziggo.nl [83.86.89.107]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 6C5BD214E0; Mon, 11 Nov 2019 18:46:47 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1573498008; bh=PxPloVUBQNwyEpFll7yVJyA/csSRiobBtB83cf4k2wk=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=DCR+gme44/qrQxxvrpAZKyDeM+6Kh3QKPniI2EU3ySKoLJawgbztq0MiUtuiuTUvr 8hbU70/YGoPaIvr/iQG3CFhrpZEb74Wx1TxwTwVKzHCyxHgZkU9ewEooaLoxPZyvXJ sgdFGTcTAzwJurd4q6wX+FYB+bwoAf8rFexRpP+Q= From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Shuning Zhang , Junxiao Bi , Gang He , Mark Fasheh , Joel Becker , Joseph Qi , Changwei Ge , Jun Piao , Andrew Morton , Linus Torvalds , Sasha Levin Subject: [PATCH 4.19 119/125] ocfs2: protect extent tree in ocfs2_prepare_inode_for_write() Date: Mon, 11 Nov 2019 19:29:18 +0100 Message-Id: <20191111181455.644829883@linuxfoundation.org> X-Mailer: git-send-email 2.24.0 In-Reply-To: <20191111181438.945353076@linuxfoundation.org> References: <20191111181438.945353076@linuxfoundation.org> User-Agent: quilt/0.66 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Shuning Zhang [ Upstream commit e74540b285569d2b1e14fe7aee92297078f235ce ] When the extent tree is modified, it should be protected by inode cluster lock and ip_alloc_sem. The extent tree is accessed and modified in the ocfs2_prepare_inode_for_write, but isn't protected by ip_alloc_sem. The following is a case. The function ocfs2_fiemap is accessing the extent tree, which is modified at the same time. kernel BUG at fs/ocfs2/extent_map.c:475! invalid opcode: 0000 [#1] SMP Modules linked in: tun ocfs2 ocfs2_nodemanager configfs ocfs2_stackglue [...] CPU: 16 PID: 14047 Comm: o2info Not tainted 4.1.12-124.23.1.el6uek.x86_64 #2 Hardware name: Oracle Corporation ORACLE SERVER X7-2L/ASM, MB MECH, X7-2L, BIOS 42040600 10/19/2018 task: ffff88019487e200 ti: ffff88003daa4000 task.ti: ffff88003daa4000 RIP: ocfs2_get_clusters_nocache.isra.11+0x390/0x550 [ocfs2] Call Trace: ocfs2_fiemap+0x1e3/0x430 [ocfs2] do_vfs_ioctl+0x155/0x510 SyS_ioctl+0x81/0xa0 system_call_fastpath+0x18/0xd8 Code: 18 48 c7 c6 60 7f 65 a0 31 c0 bb e2 ff ff ff 48 8b 4a 40 48 8b 7a 28 48 c7 c2 78 2d 66 a0 e8 38 4f 05 00 e9 28 fe ff ff 0f 1f 00 <0f> 0b 66 0f 1f 44 00 00 bb 86 ff ff ff e9 13 fe ff ff 66 0f 1f RIP ocfs2_get_clusters_nocache.isra.11+0x390/0x550 [ocfs2] ---[ end trace c8aa0c8180e869dc ]--- Kernel panic - not syncing: Fatal exception Kernel Offset: disabled This issue can be reproduced every week in a production environment. This issue is related to the usage mode. If others use ocfs2 in this mode, the kernel will panic frequently. [akpm@linux-foundation.org: coding style fixes] [Fix new warning due to unused function by removing said function - Linus ] Link: http://lkml.kernel.org/r/1568772175-2906-2-git-send-email-sunny.s.zhang@oracle.com Signed-off-by: Shuning Zhang Reviewed-by: Junxiao Bi Reviewed-by: Gang He Cc: Mark Fasheh Cc: Joel Becker Cc: Joseph Qi Cc: Changwei Ge Cc: Jun Piao Cc: Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Sasha Levin --- fs/ocfs2/file.c | 134 ++++++++++++++++++++++++++++++++---------------- 1 file changed, 90 insertions(+), 44 deletions(-) diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c index 9fa35cb6f6e0b..a847fe52c56ee 100644 --- a/fs/ocfs2/file.c +++ b/fs/ocfs2/file.c @@ -2106,54 +2106,90 @@ static int ocfs2_is_io_unaligned(struct inode *inode, size_t count, loff_t pos) return 0; } -static int ocfs2_prepare_inode_for_refcount(struct inode *inode, - struct file *file, - loff_t pos, size_t count, - int *meta_level) +static int ocfs2_inode_lock_for_extent_tree(struct inode *inode, + struct buffer_head **di_bh, + int meta_level, + int overwrite_io, + int write_sem, + int wait) { - int ret; - struct buffer_head *di_bh = NULL; - u32 cpos = pos >> OCFS2_SB(inode->i_sb)->s_clustersize_bits; - u32 clusters = - ocfs2_clusters_for_bytes(inode->i_sb, pos + count) - cpos; + int ret = 0; - ret = ocfs2_inode_lock(inode, &di_bh, 1); - if (ret) { - mlog_errno(ret); + if (wait) + ret = ocfs2_inode_lock(inode, NULL, meta_level); + else + ret = ocfs2_try_inode_lock(inode, + overwrite_io ? NULL : di_bh, meta_level); + if (ret < 0) goto out; + + if (wait) { + if (write_sem) + down_write(&OCFS2_I(inode)->ip_alloc_sem); + else + down_read(&OCFS2_I(inode)->ip_alloc_sem); + } else { + if (write_sem) + ret = down_write_trylock(&OCFS2_I(inode)->ip_alloc_sem); + else + ret = down_read_trylock(&OCFS2_I(inode)->ip_alloc_sem); + + if (!ret) { + ret = -EAGAIN; + goto out_unlock; + } } - *meta_level = 1; + return ret; - ret = ocfs2_refcount_cow(inode, di_bh, cpos, clusters, UINT_MAX); - if (ret) - mlog_errno(ret); +out_unlock: + brelse(*di_bh); + ocfs2_inode_unlock(inode, meta_level); out: - brelse(di_bh); return ret; } +static void ocfs2_inode_unlock_for_extent_tree(struct inode *inode, + struct buffer_head **di_bh, + int meta_level, + int write_sem) +{ + if (write_sem) + up_write(&OCFS2_I(inode)->ip_alloc_sem); + else + up_read(&OCFS2_I(inode)->ip_alloc_sem); + + brelse(*di_bh); + *di_bh = NULL; + + if (meta_level >= 0) + ocfs2_inode_unlock(inode, meta_level); +} + static int ocfs2_prepare_inode_for_write(struct file *file, loff_t pos, size_t count, int wait) { int ret = 0, meta_level = 0, overwrite_io = 0; + int write_sem = 0; struct dentry *dentry = file->f_path.dentry; struct inode *inode = d_inode(dentry); struct buffer_head *di_bh = NULL; loff_t end; + u32 cpos; + u32 clusters; /* * We start with a read level meta lock and only jump to an ex * if we need to make modifications here. */ for(;;) { - if (wait) - ret = ocfs2_inode_lock(inode, NULL, meta_level); - else - ret = ocfs2_try_inode_lock(inode, - overwrite_io ? NULL : &di_bh, meta_level); + ret = ocfs2_inode_lock_for_extent_tree(inode, + &di_bh, + meta_level, + overwrite_io, + write_sem, + wait); if (ret < 0) { - meta_level = -1; if (ret != -EAGAIN) mlog_errno(ret); goto out; @@ -2165,15 +2201,8 @@ static int ocfs2_prepare_inode_for_write(struct file *file, */ if (!wait && !overwrite_io) { overwrite_io = 1; - if (!down_read_trylock(&OCFS2_I(inode)->ip_alloc_sem)) { - ret = -EAGAIN; - goto out_unlock; - } ret = ocfs2_overwrite_io(inode, di_bh, pos, count); - brelse(di_bh); - di_bh = NULL; - up_read(&OCFS2_I(inode)->ip_alloc_sem); if (ret < 0) { if (ret != -EAGAIN) mlog_errno(ret); @@ -2192,7 +2221,10 @@ static int ocfs2_prepare_inode_for_write(struct file *file, * set inode->i_size at the end of a write. */ if (should_remove_suid(dentry)) { if (meta_level == 0) { - ocfs2_inode_unlock(inode, meta_level); + ocfs2_inode_unlock_for_extent_tree(inode, + &di_bh, + meta_level, + write_sem); meta_level = 1; continue; } @@ -2208,18 +2240,32 @@ static int ocfs2_prepare_inode_for_write(struct file *file, ret = ocfs2_check_range_for_refcount(inode, pos, count); if (ret == 1) { - ocfs2_inode_unlock(inode, meta_level); - meta_level = -1; - - ret = ocfs2_prepare_inode_for_refcount(inode, - file, - pos, - count, - &meta_level); + ocfs2_inode_unlock_for_extent_tree(inode, + &di_bh, + meta_level, + write_sem); + ret = ocfs2_inode_lock_for_extent_tree(inode, + &di_bh, + meta_level, + overwrite_io, + 1, + wait); + write_sem = 1; + if (ret < 0) { + if (ret != -EAGAIN) + mlog_errno(ret); + goto out; + } + + cpos = pos >> OCFS2_SB(inode->i_sb)->s_clustersize_bits; + clusters = + ocfs2_clusters_for_bytes(inode->i_sb, pos + count) - cpos; + ret = ocfs2_refcount_cow(inode, di_bh, cpos, clusters, UINT_MAX); } if (ret < 0) { - mlog_errno(ret); + if (ret != -EAGAIN) + mlog_errno(ret); goto out_unlock; } @@ -2230,10 +2276,10 @@ out_unlock: trace_ocfs2_prepare_inode_for_write(OCFS2_I(inode)->ip_blkno, pos, count, wait); - brelse(di_bh); - - if (meta_level >= 0) - ocfs2_inode_unlock(inode, meta_level); + ocfs2_inode_unlock_for_extent_tree(inode, + &di_bh, + meta_level, + write_sem); out: return ret; -- 2.20.1