Received: by 2002:ab2:6991:0:b0:1f7:f6c3:9cb1 with SMTP id v17csp169543lqo; Tue, 7 May 2024 16:27:21 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCVm1RSdFy7bDNrE1vbC7oOYKXuqM/CofauvzYWkmE3mjH7EE8JZ7tYH8G8hU0DFcUsVVFEIUsc/JBq6dEIZeD6tlaaQmD6SjLUJ4+mqYw== X-Google-Smtp-Source: AGHT+IFiz5ouECwEhPl1iMF8o9vKdv2UuF5HNTZGiKOILQ++PeawPHz1UBLaPne2Ub1IumSmtKCC X-Received: by 2002:a17:903:245:b0:1ee:a09e:c7af with SMTP id d9443c01a7336-1eeb09a0015mr8947715ad.54.1715124441496; Tue, 07 May 2024 16:27:21 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1715124441; cv=pass; d=google.com; s=arc-20160816; b=cYUHwozfrr6ysa+L5rc0D8of27MDBx8Z9COFYf8IW/4RCdrkUA50wz5ahJ7eFpvHC+ WGTe+MapcLCxa0StFaand2T9dIC1SupPcO6bOkkq5ZipRB0dlTWZC7c8cS5VpgKXdjnL S/hSmVX2diI1UtVuW16hVbx5Myujr5UwZybzXpbXRcHWtHUYbWs1c7XZYBVdnFIj/lDK 8dsPIFqLpXfiHERssH1hguDoJO/gSwJSK6TAlejfD64VJE03gVLqzkwkqchgeSKYjibQ jhX6LojdpRmVk4YM8D1DG9kxPVG/vAceJ0xHAdc3gQpMbd30wbqjjQFXfUi+pzNT4LqA sJzA== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:in-reply-to:message-id :date:subject:cc:to:from:dkim-signature; bh=AzIgl+H1R+XQN/4K678AC3O0CzFCaD077fnAVyy1fdc=; fh=ETRDyeJfF1h6kpoy+l8lvfguw2fX2oohKo4KhyvmRws=; b=T1faBvnzSuJapgk6B+ZDcLuaePSHVnL6rR4OT3N0v5SyFtYdNDWVpCeAUADNG+hb2M 9hPvxkkQHXJ0fsoL1zYMDOc2UXORQczaYTLR96zpZwaUR5Kz+iBcirr5GaEUwcBXUbsW SMPBCV+ECTBYIa/eZKyPzWqaUxPTshP2PNqT3237l0nKmsHjSXBqcE24BSBVVp/MIrcG UhvVku1MRlJX8BtU4mhX1wBN0v+zKz6n8m+lYgepK8p7iEFTYuobuxAiLFL8bgDnGXKu NE/bJceeDnYbsEi8ahLJnoCJV1XtZpRO9DXUDi8qmJSDqFIX+pu2iuM0zw+l4PFxb2Rl bT2g==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=UMqG+1Mw; arc=pass (i=1 dkim=pass dkdomain=kernel.org); spf=pass (google.com: domain of linux-kernel+bounces-172362-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-172362-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from sv.mirrors.kernel.org (sv.mirrors.kernel.org. [2604:1380:45e3:2400::1]) by mx.google.com with ESMTPS id y7-20020a17090322c700b001edc7d1e16dsi6392104plg.199.2024.05.07.16.27.21 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 07 May 2024 16:27:21 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-172362-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) client-ip=2604:1380:45e3:2400::1; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=UMqG+1Mw; arc=pass (i=1 dkim=pass dkdomain=kernel.org); spf=pass (google.com: domain of linux-kernel+bounces-172362-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-172362-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sv.mirrors.kernel.org (Postfix) with ESMTPS id EB19F282D9D for ; Tue, 7 May 2024 23:26:49 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 427DA13DDC0; Tue, 7 May 2024 23:08:40 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="UMqG+1Mw" Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 16C2113DDA2; Tue, 7 May 2024 23:08:39 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1715123319; cv=none; b=pKwZZqqzPE4+n9492aR41i2o69VUP8NGd4Sgj7ANxr5GMG4NtGWmYvloILrI45rzlAFlxfv4k6e2ZMfB7cLvnMVRvmHgSQnygURRHnoxEMebCWsQrLG5Jx+JiD89HwiOz7+UU+Ks+aPLXhRXIPE3IsoaGjP6FGbrvHyvxZ+7WiU= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1715123319; c=relaxed/simple; bh=EWWYwbIVCZ66WJjawg5ySQK7sSXV/yL0Mpy1THGUOf8=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=jVd4TQczuK9uHgRfdTUA/u6vzzA9pslg6PkexYzQ4Fcy4bgj0plLm+bSpa0b8wyfVHglu4mSTY+nlIQ0CWgXrLKSdr2Q8fLo+Se9UVClSrjJhu3dUt8jufR0iNkh4zDU94oHRoKGRVxDwFCsi8pjT+4FwzU3KheTALRjbJDi7es= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=UMqG+1Mw; arc=none smtp.client-ip=10.30.226.201 Received: by smtp.kernel.org (Postfix) with ESMTPSA id 0CCF7C3277B; Tue, 7 May 2024 23:08:37 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1715123319; bh=EWWYwbIVCZ66WJjawg5ySQK7sSXV/yL0Mpy1THGUOf8=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=UMqG+1Mwq6iUYyAujCYBH8JQSlWARViK1BBua8/rwZ+ordEV+FhmDSdb2WkMlAYxx AXpESyoq9tl0Ln9lRPFTHu/OdjJUtKzwEL/Ji9W4+dAMsaCoNARnJcVpG2l8FtgZrg vd3EDB2B28N3Bl4PFPd/Nt8fjE9S7dwLhNnohCq0IRPu2tdYxRtlj6snLm+qQBCNSU wV+UYpQKhITxme4up5ClrRjMrleXmwko78YaISv24aWUpP5iMPDO71Vzqs2xhVkv8Y t1omAtGshISHv3GXVaeYrjf90JKCybkor/Ib2LDMpSE605KFtLcEzi5tLbxtvZPdL4 HvNG3SSyRhT7w== From: Sasha Levin To: linux-kernel@vger.kernel.org, stable@vger.kernel.org Cc: Josef Bacik , Filipe Manana , David Sterba , Sasha Levin , clm@fb.com, linux-btrfs@vger.kernel.org Subject: [PATCH AUTOSEL 6.8 20/52] btrfs: take the cleaner_mutex earlier in qgroup disable Date: Tue, 7 May 2024 19:06:46 -0400 Message-ID: <20240507230800.392128-20-sashal@kernel.org> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240507230800.392128-1-sashal@kernel.org> References: <20240507230800.392128-1-sashal@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-stable: review X-Patchwork-Hint: Ignore X-stable-base: Linux 6.8.9 Content-Transfer-Encoding: 8bit From: Josef Bacik [ Upstream commit 0f2b8098d72a93890e69aa24ec549ef4bc34f4db ] One of my CI runs popped the following lockdep splat ====================================================== WARNING: possible circular locking dependency detected 6.9.0-rc4+ #1 Not tainted ------------------------------------------------------ btrfs/471533 is trying to acquire lock: ffff92ba46980850 (&fs_info->cleaner_mutex){+.+.}-{3:3}, at: btrfs_quota_disable+0x54/0x4c0 but task is already holding lock: ffff92ba46980bd0 (&fs_info->subvol_sem){++++}-{3:3}, at: btrfs_ioctl+0x1c8f/0x2600 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #2 (&fs_info->subvol_sem){++++}-{3:3}: down_read+0x42/0x170 btrfs_rename+0x607/0xb00 btrfs_rename2+0x2e/0x70 vfs_rename+0xaf8/0xfc0 do_renameat2+0x586/0x600 __x64_sys_rename+0x43/0x50 do_syscall_64+0x95/0x180 entry_SYSCALL_64_after_hwframe+0x76/0x7e -> #1 (&sb->s_type->i_mutex_key#16){++++}-{3:3}: down_write+0x3f/0xc0 btrfs_inode_lock+0x40/0x70 prealloc_file_extent_cluster+0x1b0/0x370 relocate_file_extent_cluster+0xb2/0x720 relocate_data_extent+0x107/0x160 relocate_block_group+0x442/0x550 btrfs_relocate_block_group+0x2cb/0x4b0 btrfs_relocate_chunk+0x50/0x1b0 btrfs_balance+0x92f/0x13d0 btrfs_ioctl+0x1abf/0x2600 __x64_sys_ioctl+0x97/0xd0 do_syscall_64+0x95/0x180 entry_SYSCALL_64_after_hwframe+0x76/0x7e -> #0 (&fs_info->cleaner_mutex){+.+.}-{3:3}: __lock_acquire+0x13e7/0x2180 lock_acquire+0xcb/0x2e0 __mutex_lock+0xbe/0xc00 btrfs_quota_disable+0x54/0x4c0 btrfs_ioctl+0x206b/0x2600 __x64_sys_ioctl+0x97/0xd0 do_syscall_64+0x95/0x180 entry_SYSCALL_64_after_hwframe+0x76/0x7e other info that might help us debug this: Chain exists of: &fs_info->cleaner_mutex --> &sb->s_type->i_mutex_key#16 --> &fs_info->subvol_sem Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&fs_info->subvol_sem); lock(&sb->s_type->i_mutex_key#16); lock(&fs_info->subvol_sem); lock(&fs_info->cleaner_mutex); *** DEADLOCK *** 2 locks held by btrfs/471533: #0: ffff92ba4319e420 (sb_writers#14){.+.+}-{0:0}, at: btrfs_ioctl+0x3b5/0x2600 #1: ffff92ba46980bd0 (&fs_info->subvol_sem){++++}-{3:3}, at: btrfs_ioctl+0x1c8f/0x2600 stack backtrace: CPU: 1 PID: 471533 Comm: btrfs Kdump: loaded Not tainted 6.9.0-rc4+ #1 Call Trace: dump_stack_lvl+0x77/0xb0 check_noncircular+0x148/0x160 ? lock_acquire+0xcb/0x2e0 __lock_acquire+0x13e7/0x2180 lock_acquire+0xcb/0x2e0 ? btrfs_quota_disable+0x54/0x4c0 ? lock_is_held_type+0x9a/0x110 __mutex_lock+0xbe/0xc00 ? btrfs_quota_disable+0x54/0x4c0 ? srso_return_thunk+0x5/0x5f ? lock_acquire+0xcb/0x2e0 ? btrfs_quota_disable+0x54/0x4c0 ? btrfs_quota_disable+0x54/0x4c0 btrfs_quota_disable+0x54/0x4c0 btrfs_ioctl+0x206b/0x2600 ? srso_return_thunk+0x5/0x5f ? __do_sys_statfs+0x61/0x70 __x64_sys_ioctl+0x97/0xd0 do_syscall_64+0x95/0x180 ? srso_return_thunk+0x5/0x5f ? reacquire_held_locks+0xd1/0x1f0 ? do_user_addr_fault+0x307/0x8a0 ? srso_return_thunk+0x5/0x5f ? lock_acquire+0xcb/0x2e0 ? srso_return_thunk+0x5/0x5f ? srso_return_thunk+0x5/0x5f ? find_held_lock+0x2b/0x80 ? srso_return_thunk+0x5/0x5f ? lock_release+0xca/0x2a0 ? srso_return_thunk+0x5/0x5f ? do_user_addr_fault+0x35c/0x8a0 ? srso_return_thunk+0x5/0x5f ? trace_hardirqs_off+0x4b/0xc0 ? srso_return_thunk+0x5/0x5f ? lockdep_hardirqs_on_prepare+0xde/0x190 ? srso_return_thunk+0x5/0x5f This happens because when we call rename we already have the inode mutex held, and then we acquire the subvol_sem if we are a subvolume. This makes the dependency inode lock -> subvol sem When we're running data relocation we will preallocate space for the data relocation inode, and we always run the relocation under the ->cleaner_mutex. This now creates the dependency of cleaner_mutex -> inode lock (from the prealloc) -> subvol_sem Qgroup delete is doing this in the opposite order, it is acquiring the subvol_sem and then it is acquiring the cleaner_mutex, which results in this lockdep splat. This deadlock can't happen in reality, because we won't ever rename the data reloc inode, nor is the data reloc inode a subvolume. However this is fairly easy to fix, simply take the cleaner mutex in the case where we are disabling qgroups before we take the subvol_sem. This resolves the lockdep splat. Reviewed-by: Filipe Manana Signed-off-by: Josef Bacik Reviewed-by: David Sterba Signed-off-by: David Sterba Signed-off-by: Sasha Levin --- fs/btrfs/ioctl.c | 33 ++++++++++++++++++++++++++++++--- fs/btrfs/qgroup.c | 21 ++++++++------------- 2 files changed, 38 insertions(+), 16 deletions(-) diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index 6b93fae74403d..8851ba7a1e271 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -3722,15 +3722,43 @@ static long btrfs_ioctl_quota_ctl(struct file *file, void __user *arg) goto drop_write; } - down_write(&fs_info->subvol_sem); - switch (sa->cmd) { case BTRFS_QUOTA_CTL_ENABLE: case BTRFS_QUOTA_CTL_ENABLE_SIMPLE_QUOTA: + down_write(&fs_info->subvol_sem); ret = btrfs_quota_enable(fs_info, sa); + up_write(&fs_info->subvol_sem); break; case BTRFS_QUOTA_CTL_DISABLE: + /* + * Lock the cleaner mutex to prevent races with concurrent + * relocation, because relocation may be building backrefs for + * blocks of the quota root while we are deleting the root. This + * is like dropping fs roots of deleted snapshots/subvolumes, we + * need the same protection. + * + * This also prevents races between concurrent tasks trying to + * disable quotas, because we will unlock and relock + * qgroup_ioctl_lock across BTRFS_FS_QUOTA_ENABLED changes. + * + * We take this here because we have the dependency of + * + * inode_lock -> subvol_sem + * + * because of rename. With relocation we can prealloc extents, + * so that makes the dependency chain + * + * cleaner_mutex -> inode_lock -> subvol_sem + * + * so we must take the cleaner_mutex here before we take the + * subvol_sem. The deadlock can't actually happen, but this + * quiets lockdep. + */ + mutex_lock(&fs_info->cleaner_mutex); + down_write(&fs_info->subvol_sem); ret = btrfs_quota_disable(fs_info); + up_write(&fs_info->subvol_sem); + mutex_unlock(&fs_info->cleaner_mutex); break; default: ret = -EINVAL; @@ -3738,7 +3766,6 @@ static long btrfs_ioctl_quota_ctl(struct file *file, void __user *arg) } kfree(sa); - up_write(&fs_info->subvol_sem); drop_write: mnt_drop_write_file(file); return ret; diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c index 132802bd80999..9b8a61aa390eb 100644 --- a/fs/btrfs/qgroup.c +++ b/fs/btrfs/qgroup.c @@ -1342,16 +1342,10 @@ int btrfs_quota_disable(struct btrfs_fs_info *fs_info) lockdep_assert_held_write(&fs_info->subvol_sem); /* - * Lock the cleaner mutex to prevent races with concurrent relocation, - * because relocation may be building backrefs for blocks of the quota - * root while we are deleting the root. This is like dropping fs roots - * of deleted snapshots/subvolumes, we need the same protection. - * - * This also prevents races between concurrent tasks trying to disable - * quotas, because we will unlock and relock qgroup_ioctl_lock across - * BTRFS_FS_QUOTA_ENABLED changes. + * Relocation will mess with backrefs, so make sure we have the + * cleaner_mutex held to protect us from relocate. */ - mutex_lock(&fs_info->cleaner_mutex); + lockdep_assert_held(&fs_info->cleaner_mutex); mutex_lock(&fs_info->qgroup_ioctl_lock); if (!fs_info->quota_root) @@ -1373,9 +1367,13 @@ int btrfs_quota_disable(struct btrfs_fs_info *fs_info) clear_bit(BTRFS_FS_QUOTA_ENABLED, &fs_info->flags); btrfs_qgroup_wait_for_completion(fs_info, false); + /* + * We have nothing held here and no trans handle, just return the error + * if there is one. + */ ret = flush_reservations(fs_info); if (ret) - goto out_unlock_cleaner; + return ret; /* * 1 For the root item @@ -1439,9 +1437,6 @@ int btrfs_quota_disable(struct btrfs_fs_info *fs_info) btrfs_end_transaction(trans); else if (trans) ret = btrfs_commit_transaction(trans); -out_unlock_cleaner: - mutex_unlock(&fs_info->cleaner_mutex); - return ret; } -- 2.43.0