Received: by 2002:a25:1985:0:0:0:0:0 with SMTP id 127csp1397343ybz; Thu, 16 Apr 2020 08:27:37 -0700 (PDT) X-Google-Smtp-Source: APiQypJMt4y2pM267eeJxfThCTsyFQLgD6qv5WmCTISk9OTPE61CBRbTvhynew6bLqV/PdAlGPzd X-Received: by 2002:a17:907:2049:: with SMTP id pg9mr10555199ejb.248.1587050857542; Thu, 16 Apr 2020 08:27:37 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1587050857; cv=none; d=google.com; s=arc-20160816; b=F7QMQFiVTs5lN3vW++NiptCslgeBj4oPaTtTtT6aSLWOc9E2AyXMX8JZ0LKwUtJgJ/ wtOZZ/nkXzFApEKUXgRmiuUD7ETsxN57XkEroDIh1kGhp1jOApcbA/Ay6VbaL21c8gdw lqcwQHLDDt5kMsBBZqFpBmTWxA3iXqxe446Xz1u70Ic0zxt/kcKwoToo5vbs4NI3JREr qV7x6ou4q/8aE/vjtpeaC2LV7l/WzIMs1J3HsqVWTmjEi97LWxfOH6e7L4esHLZf+ZeP GxWTa6w9aOBjjJjWljqDz9aRgwvmJiNJ3T6pkm/gQ3NYZYvhB8jWttmw34toJp5Rfhdl kYQg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature; bh=LqoLwfQ+cTaDxFSx+Hjs4xe47+3kg+Bpg6mlBZ8DbTk=; b=TdmsrjWw/Z+UQ3bc36sJendbC6gtBPPOQWWN9FijVx19IfEg6xDtnP+j0c4PqYauCa itYsBPGosyexewWHR6sT9noVd2TAVhwe4HbTId0BrGTHrtRQn2xJGeoecLhgxYwk8mga 2kH4RdBCqsGKH5ZDCXUqGIxrFZWSBat5O2dCl6uu5pbWv1EWod1idhbwCRkgL38OxUYp OleT2YNjanJgIm9Uo8qj54FZ2Ud1ji/AkxJIsl0kBB2vGuAZPOh/DYGKqPlGsqvGxoaW by9V6+tKKW92rATzHSwgVE0hEdxuhEY9npSKULTs0H93hhanatbCVYta3+gnqXI19Cbd uU5g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=VaKLlDwI; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id a23si2857334edy.27.2020.04.16.08.27.13; Thu, 16 Apr 2020 08:27:37 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=VaKLlDwI; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2442481AbgDPPXy (ORCPT + 99 others); Thu, 16 Apr 2020 11:23:54 -0400 Received: from mail.kernel.org ([198.145.29.99]:58194 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2898470AbgDPNo7 (ORCPT ); Thu, 16 Apr 2020 09:44:59 -0400 Received: from localhost (83-86-89-107.cable.dynamic.v4.ziggo.nl [83.86.89.107]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 434D020732; Thu, 16 Apr 2020 13:44:58 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1587044698; bh=OUkSnEZhEGuu1K9o5YcKxKzW0scZgqC1euBVD3hJwOg=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=VaKLlDwIxhUI2TX4cQyqwyV2ynEV9A0Ze+J54t7g7Isip+AbQ4YrSJAdr1oQySMiH jgImE3fbuDPuq4dk8faPkex4w/K8jIHFvztAo0vdE0FkUm1lQxi+lETZHFpnUajFsU Ey4zRlsAdu3umNf9P0xQYJl+nUInmCVA+qC904/c= From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Jeff Mahoney , Qu Wenruo , David Sterba , Sasha Levin Subject: [PATCH 5.4 065/232] btrfs: qgroup: ensure qgroup_rescan_running is only set when the worker is at least queued Date: Thu, 16 Apr 2020 15:22:39 +0200 Message-Id: <20200416131323.494179904@linuxfoundation.org> X-Mailer: git-send-email 2.26.1 In-Reply-To: <20200416131316.640996080@linuxfoundation.org> References: <20200416131316.640996080@linuxfoundation.org> User-Agent: quilt/0.66 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Qu Wenruo [ Upstream commit d61acbbf54c612ea9bf67eed609494cda0857b3a ] [BUG] There are some reports about btrfs wait forever to unmount itself, with the following call trace: INFO: task umount:4631 blocked for more than 491 seconds. Tainted: G X 5.3.8-2-default #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. umount D 0 4631 3337 0x00000000 Call Trace: ([<00000000174adf7a>] __schedule+0x342/0x748) [<00000000174ae3ca>] schedule+0x4a/0xd8 [<00000000174b1f08>] schedule_timeout+0x218/0x420 [<00000000174af10c>] wait_for_common+0x104/0x1d8 [<000003ff804d6994>] btrfs_qgroup_wait_for_completion+0x84/0xb0 [btrfs] [<000003ff8044a616>] close_ctree+0x4e/0x380 [btrfs] [<0000000016fa3136>] generic_shutdown_super+0x8e/0x158 [<0000000016fa34d6>] kill_anon_super+0x26/0x40 [<000003ff8041ba88>] btrfs_kill_super+0x28/0xc8 [btrfs] [<0000000016fa39f8>] deactivate_locked_super+0x68/0x98 [<0000000016fcb198>] cleanup_mnt+0xc0/0x140 [<0000000016d6a846>] task_work_run+0xc6/0x110 [<0000000016d04f76>] do_notify_resume+0xae/0xb8 [<00000000174b30ae>] system_call+0xe2/0x2c8 [CAUSE] The problem happens when we have called qgroup_rescan_init(), but not queued the worker. It can be caused mostly by error handling. Qgroup ioctl thread | Unmount thread ----------------------------------------+----------------------------------- | btrfs_qgroup_rescan() | |- qgroup_rescan_init() | | |- qgroup_rescan_running = true; | | | |- trans = btrfs_join_transaction() | | Some error happened | | | |- btrfs_qgroup_rescan() returns error | But qgroup_rescan_running == true; | | close_ctree() | |- btrfs_qgroup_wait_for_completion() | |- running == true; | |- wait_for_completion(); btrfs_qgroup_rescan_worker is never queued, thus no one is going to wake up close_ctree() and we get a deadlock. All involved qgroup_rescan_init() callers are: - btrfs_qgroup_rescan() The example above. It's possible to trigger the deadlock when error happened. - btrfs_quota_enable() Not possible. Just after qgroup_rescan_init() we queue the work. - btrfs_read_qgroup_config() It's possible to trigger the deadlock. It only init the work, the work queueing happens in btrfs_qgroup_rescan_resume(). Thus if error happened in between, deadlock is possible. We shouldn't set fs_info->qgroup_rescan_running just in qgroup_rescan_init(), as at that stage we haven't yet queued qgroup rescan worker to run. [FIX] Set qgroup_rescan_running before queueing the work, so that we ensure the rescan work is queued when we wait for it. Fixes: 8d9eddad1946 ("Btrfs: fix qgroup rescan worker initialization") Signed-off-by: Jeff Mahoney [ Change subject and cause analyse, use a smaller fix ] Signed-off-by: Qu Wenruo Reviewed-by: David Sterba Signed-off-by: David Sterba Signed-off-by: Sasha Levin --- fs/btrfs/qgroup.c | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c index 286c8c11c8d32..590defdf88609 100644 --- a/fs/btrfs/qgroup.c +++ b/fs/btrfs/qgroup.c @@ -1030,6 +1030,7 @@ out_add_root: ret = qgroup_rescan_init(fs_info, 0, 1); if (!ret) { qgroup_rescan_zero_tracking(fs_info); + fs_info->qgroup_rescan_running = true; btrfs_queue_work(fs_info->qgroup_rescan_workers, &fs_info->qgroup_rescan_work); } @@ -3276,7 +3277,6 @@ qgroup_rescan_init(struct btrfs_fs_info *fs_info, u64 progress_objectid, sizeof(fs_info->qgroup_rescan_progress)); fs_info->qgroup_rescan_progress.objectid = progress_objectid; init_completion(&fs_info->qgroup_rescan_completion); - fs_info->qgroup_rescan_running = true; spin_unlock(&fs_info->qgroup_lock); mutex_unlock(&fs_info->qgroup_rescan_lock); @@ -3341,8 +3341,11 @@ btrfs_qgroup_rescan(struct btrfs_fs_info *fs_info) qgroup_rescan_zero_tracking(fs_info); + mutex_lock(&fs_info->qgroup_rescan_lock); + fs_info->qgroup_rescan_running = true; btrfs_queue_work(fs_info->qgroup_rescan_workers, &fs_info->qgroup_rescan_work); + mutex_unlock(&fs_info->qgroup_rescan_lock); return 0; } @@ -3378,9 +3381,13 @@ int btrfs_qgroup_wait_for_completion(struct btrfs_fs_info *fs_info, void btrfs_qgroup_rescan_resume(struct btrfs_fs_info *fs_info) { - if (fs_info->qgroup_flags & BTRFS_QGROUP_STATUS_FLAG_RESCAN) + if (fs_info->qgroup_flags & BTRFS_QGROUP_STATUS_FLAG_RESCAN) { + mutex_lock(&fs_info->qgroup_rescan_lock); + fs_info->qgroup_rescan_running = true; btrfs_queue_work(fs_info->qgroup_rescan_workers, &fs_info->qgroup_rescan_work); + mutex_unlock(&fs_info->qgroup_rescan_lock); + } } /* -- 2.20.1