Received: by 2002:a25:ab43:0:0:0:0:0 with SMTP id u61csp253063ybi; Wed, 29 May 2019 20:42:36 -0700 (PDT) X-Google-Smtp-Source: APXvYqzceXQvhVfJZ2TktuQnHJo8zmWSt7qUy3NLxCE1jlCrOp9V9QzL9X0aGcTCXLqI7jAojoW5 X-Received: by 2002:a17:90a:2561:: with SMTP id j88mr1449869pje.121.1559187756301; Wed, 29 May 2019 20:42:36 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1559187756; cv=none; d=google.com; s=arc-20160816; b=UIYdUXRmfTuJKNAapuVShiNO00PmPXLWwlkdsQVLWvLYj0Xb+7Ci1bj1kYuKJ5TmSR BEcUw/1O4y+at9VnwNUaHPYGibYmhy8+bjrhOvoYIxRSWwU2gt65a57ChXd0TvGXX5pf pX8FzXZ2gi1BzYIXA/31rYMVNlFKMjlfzA7N7Q9LIAIacH2A0PVtTXukOEgYTVFE2A0P pzjt4y5jdxeJbZZDuVSczCMOpG6hzc9FFje/rcKlb43V9gsrCgkhVv8WtXrETepxFHy3 hhYvodgmYqdOZgHCjHnzkJ6QINWd4G07WtkFWDtnE8YFnLx3nK/6xwkx8sTu5V7qQt+T ZKlQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature; bh=FFDZ96xgab7t6KdLmzhNQDBeXYctNSIW541bAixzwLA=; b=LMbnJjxkHcDN0FTJW1mpvD4v2u3bTlNTVZpSgeCDHG5C9NbwfZenEK8NqIg1mFebeI khemTVVtOZcyT1BvE7DRfDpSDXWsQ3My69S2zoUyo3K71wEHKGwvgMCPlrsL/7P0L6eS XzRTUD9EMSmxFVy1uP1r0Sw3cHX9D6gxL/3cKX59L6HT1gfg31Gz8j1QbNU3NbwaWEZn BKnYj0tAaqYV1wID020OL9JBt1cO4DbFhBTLlgeyeEXv0cdTNPHOE5WQpm/P0NN28aIF fEmce3M3d7aeKopydHj5lUBlrj6M+ALY57xGq9InxInaW+yHXGM0blG6lL9DXwmcmvU7 T96Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=ax9lve9D; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 132si2027500pgb.210.2019.05.29.20.42.19; Wed, 29 May 2019 20:42:36 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=ax9lve9D; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732626AbfE3DVn (ORCPT + 99 others); Wed, 29 May 2019 23:21:43 -0400 Received: from mail.kernel.org ([198.145.29.99]:42744 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730689AbfE3DQa (ORCPT ); Wed, 29 May 2019 23:16:30 -0400 Received: from localhost (ip67-88-213-2.z213-88-67.customer.algx.net [67.88.213.2]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id D545D24598; Thu, 30 May 2019 03:16:28 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1559186189; bh=tAgy56GAZeKXoQye7Cl73ptMrxp8/LSuliqo4RLdHi4=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=ax9lve9Drp1DwLlmPMOzwB57IfkuzdM4GYB3wSxbq+ljAt3NfJTgMoDA71GUt/9Gx +hY/CzLkvtfHYxXGru/cBVyAz7BWyA/3MIsqEltMyxhPoHirgzxnGP4DSbhUIkx1Gt 3CkV1ekv53nzGZ4IgHWkOaDbinoe2R38XgAf8wHo= From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Filipe Manana , Qu Wenruo , Josef Bacik , David Sterba Subject: [PATCH 4.19 031/276] btrfs: honor path->skip_locking in backref code Date: Wed, 29 May 2019 20:03:09 -0700 Message-Id: <20190530030526.154567229@linuxfoundation.org> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20190530030523.133519668@linuxfoundation.org> References: <20190530030523.133519668@linuxfoundation.org> User-Agent: quilt/0.66 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Josef Bacik commit 38e3eebff643db725633657d1d87a3be019d1018 upstream. Qgroups will do the old roots lookup at delayed ref time, which could be while walking down the extent root while running a delayed ref. This should be fine, except we specifically lock eb's in the backref walking code irrespective of path->skip_locking, which deadlocks the system. Fix up the backref code to honor path->skip_locking, nobody will be modifying the commit_root when we're searching so it's completely safe to do. This happens since fb235dc06fac ("btrfs: qgroup: Move half of the qgroup accounting time out of commit trans"), kernel may lockup with quota enabled. There is one backref trace triggered by snapshot dropping along with write operation in the source subvolume. The example can be reliably reproduced: btrfs-cleaner D 0 4062 2 0x80000000 Call Trace: schedule+0x32/0x90 btrfs_tree_read_lock+0x93/0x130 [btrfs] find_parent_nodes+0x29b/0x1170 [btrfs] btrfs_find_all_roots_safe+0xa8/0x120 [btrfs] btrfs_find_all_roots+0x57/0x70 [btrfs] btrfs_qgroup_trace_extent_post+0x37/0x70 [btrfs] btrfs_qgroup_trace_leaf_items+0x10b/0x140 [btrfs] btrfs_qgroup_trace_subtree+0xc8/0xe0 [btrfs] do_walk_down+0x541/0x5e3 [btrfs] walk_down_tree+0xab/0xe7 [btrfs] btrfs_drop_snapshot+0x356/0x71a [btrfs] btrfs_clean_one_deleted_snapshot+0xb8/0xf0 [btrfs] cleaner_kthread+0x12b/0x160 [btrfs] kthread+0x112/0x130 ret_from_fork+0x27/0x50 When dropping snapshots with qgroup enabled, we will trigger backref walk. However such backref walk at that timing is pretty dangerous, as if one of the parent nodes get WRITE locked by other thread, we could cause a dead lock. For example: FS 260 FS 261 (Dropped) node A node B / \ / \ node C node D node E / \ / \ / \ leaf F|leaf G|leaf H|leaf I|leaf J|leaf K The lock sequence would be: Thread A (cleaner) | Thread B (other writer) ----------------------------------------------------------------------- write_lock(B) | write_lock(D) | ^^^ called by walk_down_tree() | | write_lock(A) | write_lock(D) << Stall read_lock(H) << for backref walk | read_lock(D) << lock owner is | the same thread A | so read lock is OK | read_lock(A) << Stall | So thread A hold write lock D, and needs read lock A to unlock. While thread B holds write lock A, while needs lock D to unlock. This will cause a deadlock. This is not only limited to snapshot dropping case. As the backref walk, even only happens on commit trees, is breaking the normal top-down locking order, makes it deadlock prone. Fixes: fb235dc06fac ("btrfs: qgroup: Move half of the qgroup accounting time out of commit trans") CC: stable@vger.kernel.org # 4.14+ Reported-and-tested-by: David Sterba Reported-by: Filipe Manana Reviewed-by: Qu Wenruo Signed-off-by: Josef Bacik Reviewed-by: Filipe Manana [ rebase to latest branch and fix lock assert bug in btrfs/007 ] [ backport to linux-4.19.y branch, solve minor conflicts ] Signed-off-by: Qu Wenruo [ copy logs and deadlock analysis from Qu's patch ] Signed-off-by: David Sterba Signed-off-by: Greg Kroah-Hartman --- fs/btrfs/backref.c | 19 ++++++++++++------- 1 file changed, 12 insertions(+), 7 deletions(-) --- a/fs/btrfs/backref.c +++ b/fs/btrfs/backref.c @@ -710,7 +710,7 @@ out: * read tree blocks and add keys where required. */ static int add_missing_keys(struct btrfs_fs_info *fs_info, - struct preftrees *preftrees) + struct preftrees *preftrees, bool lock) { struct prelim_ref *ref; struct extent_buffer *eb; @@ -735,12 +735,14 @@ static int add_missing_keys(struct btrfs free_extent_buffer(eb); return -EIO; } - btrfs_tree_read_lock(eb); + if (lock) + btrfs_tree_read_lock(eb); if (btrfs_header_level(eb) == 0) btrfs_item_key_to_cpu(eb, &ref->key_for_search, 0); else btrfs_node_key_to_cpu(eb, &ref->key_for_search, 0); - btrfs_tree_read_unlock(eb); + if (lock) + btrfs_tree_read_unlock(eb); free_extent_buffer(eb); prelim_ref_insert(fs_info, &preftrees->indirect, ref, NULL); cond_resched(); @@ -1225,7 +1227,7 @@ again: btrfs_release_path(path); - ret = add_missing_keys(fs_info, &preftrees); + ret = add_missing_keys(fs_info, &preftrees, path->skip_locking == 0); if (ret) goto out; @@ -1286,11 +1288,14 @@ again: ret = -EIO; goto out; } - btrfs_tree_read_lock(eb); - btrfs_set_lock_blocking_rw(eb, BTRFS_READ_LOCK); + if (!path->skip_locking) { + btrfs_tree_read_lock(eb); + btrfs_set_lock_blocking_rw(eb, BTRFS_READ_LOCK); + } ret = find_extent_in_eb(eb, bytenr, *extent_item_pos, &eie, ignore_offset); - btrfs_tree_read_unlock_blocking(eb); + if (!path->skip_locking) + btrfs_tree_read_unlock_blocking(eb); free_extent_buffer(eb); if (ret < 0) goto out;