Received: by 2002:a5b:505:0:0:0:0:0 with SMTP id o5csp57688ybp; Thu, 3 Oct 2019 10:08:19 -0700 (PDT) X-Google-Smtp-Source: APXvYqwn0WhfPH0Kt8vCzEkkzLNu35YuWf4BXKb+pMAI9bjsRaHuYHlmmYHYL1SBxq2dW5k/cG7+ X-Received: by 2002:a50:908c:: with SMTP id c12mr10843989eda.45.1570122499720; Thu, 03 Oct 2019 10:08:19 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1570122499; cv=none; d=google.com; s=arc-20160816; b=FlBfJ0ofeF5YuON+umv9vOpjHnaeqIyp1vaAL9Ow55dv9E5+WedHq4TiiZpsHbPb7V zbWaxZvHPf16UEktTzJSrbGdEoX6ZCLpawT0sIq1QNrow6eMEStVCbp4qSGc1B6awE0T 5EtyxPrB0F39Vh+ZT4z3Gy5hj01SHmYDQaOoFmY7gbprrIxMsTiRpJJNFYMbD6/OeF/S i79xwau83BZmSVX3gW3oOFpcXQhq61ellfiNNypWgIBQoz8hRrJAervCrjTo2Co+uJv1 a2h8yrXPrPZc5ZxasGmtAj4wfzM8II/EtxQcYDAlE8DIA5MrIOZbBFzDJVojJWIE9jTd 86ig== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature; bh=jQw8E65qlOxWIbQhZ2nSJa851eIHsrVnOxLg6YvDLwY=; b=ABeztMo5zqSeytQ97nFgAaG6BGOyHD3RvS9tGCjoFdLVcWA8OcBAA28FQEtufy56HQ A2rmwm41HHLIs2+QNp0yVuTBlrFsptwKC41jS7vFjBaBY6P2e94U2SOClabJu7oheUPT lCwOMyg0whFwAEKGK3MqXtqeRdIEHn+z8YC5W+jwImw7nhUgdd+REPI2giBobnmwJQWG Rdtehys907yDVXwsnYdBrNrF8eJeWcRJK7/x4FdRZrSSEYbxMxRsEob8+a8Q5eFgm2Uv R93zVg2eZDo6jS1svFfYn1H5uwcIap0iUxvQqt4aIzVxoXg4lM7kyqxut8XJ9B+hAhEk avuA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=jTXV705S; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id g7si1467452ejd.78.2019.10.03.10.07.53; Thu, 03 Oct 2019 10:08:19 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=jTXV705S; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2404207AbfJCQiQ (ORCPT + 99 others); Thu, 3 Oct 2019 12:38:16 -0400 Received: from mail.kernel.org ([198.145.29.99]:47866 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2404183AbfJCQiN (ORCPT ); Thu, 3 Oct 2019 12:38:13 -0400 Received: from localhost (83-86-89-107.cable.dynamic.v4.ziggo.nl [83.86.89.107]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 0E7B72070B; Thu, 3 Oct 2019 16:38:10 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1570120691; bh=uhbaKLWlkBR1MGVt+PBtDFsl8Khc+T7Mn7gXEEVFRhE=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=jTXV705SUizATEAua/CjK0sgPeoLOr4e3wNVWeBVxT1vsTZ4CQ5E9UFoE2Hxp8BNX M2u1ZrSweXSWUgh3JpgOA5jOgd5oidzUk0mes5vpH6PENE33+gLfE0EI8icm0mvYKT YKrf3/CECOHak2YqjlSex8fxb2fZHsNx6TicQ0qU= From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Josef Bacik , Filipe Manana , David Sterba Subject: [PATCH 5.2 288/313] Btrfs: fix race setting up and completing qgroup rescan workers Date: Thu, 3 Oct 2019 17:54:26 +0200 Message-Id: <20191003154601.462968395@linuxfoundation.org> X-Mailer: git-send-email 2.23.0 In-Reply-To: <20191003154533.590915454@linuxfoundation.org> References: <20191003154533.590915454@linuxfoundation.org> User-Agent: quilt/0.66 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Filipe Manana commit 13fc1d271a2e3ab8a02071e711add01fab9271f6 upstream. There is a race between setting up a qgroup rescan worker and completing a qgroup rescan worker that can lead to callers of the qgroup rescan wait ioctl to either not wait for the rescan worker to complete or to hang forever due to missing wake ups. The following diagram shows a sequence of steps that illustrates the race. CPU 1 CPU 2 CPU 3 btrfs_ioctl_quota_rescan() btrfs_qgroup_rescan() qgroup_rescan_init() mutex_lock(&fs_info->qgroup_rescan_lock) spin_lock(&fs_info->qgroup_lock) fs_info->qgroup_flags |= BTRFS_QGROUP_STATUS_FLAG_RESCAN init_completion( &fs_info->qgroup_rescan_completion) fs_info->qgroup_rescan_running = true mutex_unlock(&fs_info->qgroup_rescan_lock) spin_unlock(&fs_info->qgroup_lock) btrfs_init_work() --> starts the worker btrfs_qgroup_rescan_worker() mutex_lock(&fs_info->qgroup_rescan_lock) fs_info->qgroup_flags &= ~BTRFS_QGROUP_STATUS_FLAG_RESCAN mutex_unlock(&fs_info->qgroup_rescan_lock) starts transaction, updates qgroup status item, etc btrfs_ioctl_quota_rescan() btrfs_qgroup_rescan() qgroup_rescan_init() mutex_lock(&fs_info->qgroup_rescan_lock) spin_lock(&fs_info->qgroup_lock) fs_info->qgroup_flags |= BTRFS_QGROUP_STATUS_FLAG_RESCAN init_completion( &fs_info->qgroup_rescan_completion) fs_info->qgroup_rescan_running = true mutex_unlock(&fs_info->qgroup_rescan_lock) spin_unlock(&fs_info->qgroup_lock) btrfs_init_work() --> starts another worker mutex_lock(&fs_info->qgroup_rescan_lock) fs_info->qgroup_rescan_running = false mutex_unlock(&fs_info->qgroup_rescan_lock) complete_all(&fs_info->qgroup_rescan_completion) Before the rescan worker started by the task at CPU 3 completes, if another task calls btrfs_ioctl_quota_rescan(), it will get -EINPROGRESS because the flag BTRFS_QGROUP_STATUS_FLAG_RESCAN is set at fs_info->qgroup_flags, which is expected and correct behaviour. However if other task calls btrfs_ioctl_quota_rescan_wait() before the rescan worker started by the task at CPU 3 completes, it will return immediately without waiting for the new rescan worker to complete, because fs_info->qgroup_rescan_running is set to false by CPU 2. This race is making test case btrfs/171 (from fstests) to fail often: btrfs/171 9s ... - output mismatch (see /home/fdmanana/git/hub/xfstests/results//btrfs/171.out.bad) # --- tests/btrfs/171.out 2018-09-16 21:30:48.505104287 +0100 # +++ /home/fdmanana/git/hub/xfstests/results//btrfs/171.out.bad 2019-09-19 02:01:36.938486039 +0100 # @@ -1,2 +1,3 @@ # QA output created by 171 # +ERROR: quota rescan failed: Operation now in progress # Silence is golden # ... # (Run 'diff -u /home/fdmanana/git/hub/xfstests/tests/btrfs/171.out /home/fdmanana/git/hub/xfstests/results//btrfs/171.out.bad' to see the entire diff) That is because the test calls the btrfs-progs commands "qgroup quota rescan -w", "qgroup assign" and "qgroup remove" in a sequence that makes calls to the rescan start ioctl fail with -EINPROGRESS (note the "btrfs" commands 'qgroup assign' and 'qgroup remove' often call the rescan start ioctl after calling the qgroup assign ioctl, btrfs_ioctl_qgroup_assign()), since previous waits didn't actually wait for a rescan worker to complete. Another problem the race can cause is missing wake ups for waiters, since the call to complete_all() happens outside a critical section and after clearing the flag BTRFS_QGROUP_STATUS_FLAG_RESCAN. In the sequence diagram above, if we have a waiter for the first rescan task (executed by CPU 2), then fs_info->qgroup_rescan_completion.wait is not empty, and if after the rescan worker clears BTRFS_QGROUP_STATUS_FLAG_RESCAN and before it calls complete_all() against fs_info->qgroup_rescan_completion, the task at CPU 3 calls init_completion() against fs_info->qgroup_rescan_completion which re-initilizes its wait queue to an empty queue, therefore causing the rescan worker at CPU 2 to call complete_all() against an empty queue, never waking up the task waiting for that rescan worker. Fix this by clearing BTRFS_QGROUP_STATUS_FLAG_RESCAN and setting fs_info->qgroup_rescan_running to false in the same critical section, delimited by the mutex fs_info->qgroup_rescan_lock, as well as doing the call to complete_all() in that same critical section. This gives the protection needed to avoid rescan wait ioctl callers not waiting for a running rescan worker and the lost wake ups problem, since setting that rescan flag and boolean as well as initializing the wait queue is done already in a critical section delimited by that mutex (at qgroup_rescan_init()). Fixes: 57254b6ebce4ce ("Btrfs: add ioctl to wait for qgroup rescan completion") Fixes: d2c609b834d62f ("btrfs: properly track when rescan worker is running") CC: stable@vger.kernel.org # 4.4+ Reviewed-by: Josef Bacik Signed-off-by: Filipe Manana Signed-off-by: David Sterba Signed-off-by: Greg Kroah-Hartman --- fs/btrfs/qgroup.c | 33 +++++++++++++++++++-------------- 1 file changed, 19 insertions(+), 14 deletions(-) --- a/fs/btrfs/qgroup.c +++ b/fs/btrfs/qgroup.c @@ -3154,9 +3154,6 @@ out: btrfs_free_path(path); mutex_lock(&fs_info->qgroup_rescan_lock); - if (!btrfs_fs_closing(fs_info)) - fs_info->qgroup_flags &= ~BTRFS_QGROUP_STATUS_FLAG_RESCAN; - if (err > 0 && fs_info->qgroup_flags & BTRFS_QGROUP_STATUS_FLAG_INCONSISTENT) { fs_info->qgroup_flags &= ~BTRFS_QGROUP_STATUS_FLAG_INCONSISTENT; @@ -3172,16 +3169,30 @@ out: trans = btrfs_start_transaction(fs_info->quota_root, 1); if (IS_ERR(trans)) { err = PTR_ERR(trans); + trans = NULL; btrfs_err(fs_info, "fail to start transaction for status update: %d", err); - goto done; } - ret = update_qgroup_status_item(trans); - if (ret < 0) { - err = ret; - btrfs_err(fs_info, "fail to update qgroup status: %d", err); + + mutex_lock(&fs_info->qgroup_rescan_lock); + if (!btrfs_fs_closing(fs_info)) + fs_info->qgroup_flags &= ~BTRFS_QGROUP_STATUS_FLAG_RESCAN; + if (trans) { + ret = update_qgroup_status_item(trans); + if (ret < 0) { + err = ret; + btrfs_err(fs_info, "fail to update qgroup status: %d", + err); + } } + fs_info->qgroup_rescan_running = false; + complete_all(&fs_info->qgroup_rescan_completion); + mutex_unlock(&fs_info->qgroup_rescan_lock); + + if (!trans) + return; + btrfs_end_transaction(trans); if (btrfs_fs_closing(fs_info)) { @@ -3192,12 +3203,6 @@ out: } else { btrfs_err(fs_info, "qgroup scan failed with %d", err); } - -done: - mutex_lock(&fs_info->qgroup_rescan_lock); - fs_info->qgroup_rescan_running = false; - mutex_unlock(&fs_info->qgroup_rescan_lock); - complete_all(&fs_info->qgroup_rescan_completion); } /*