Received: by 2002:a25:ad19:0:0:0:0:0 with SMTP id y25csp7140010ybi; Mon, 8 Jul 2019 15:26:39 -0700 (PDT) X-Google-Smtp-Source: APXvYqz/AMd+5Ene0CTNS9W3NMO9Vj+Fj5hY98tXwElcwhFaEKvxf/BjWoXc1JXWIdIK+uoN6sj4 X-Received: by 2002:a17:90a:cb18:: with SMTP id z24mr27786278pjt.108.1562624799238; Mon, 08 Jul 2019 15:26:39 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1562624799; cv=none; d=google.com; s=arc-20160816; b=m9WDnwvJFVCuP1wXsJPvpuL48qb5d7WiTEmxQN6UOKAg9LqfOic9o+bN3qUNLpVorM utW1UhUvzBENFKXcwO57lLWApGKE+eZ1CJjhAQW8WJKm+y33S2UdFqWESf5SljVYCyfb efg6MdZ4C6YWy6XiPteNX6emtaBkIzlEFxKMLgH5ZdfJfEpRzcNDUECvYY2qKQfKeFiP 3Ua9pzCbsjTXIGNrEx6paPEdiSvVDZi2T+19d5eB7CwwGLb0ZRGh7kTvORbWHoFj89l1 yq7sMl3i0CKuhgtQKYaJj42tC6cS3gavRCURJ3E16BzSTbyTiKkb8d9kyKBSjJVoOIMQ 3xtg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature; bh=J+N94uELBz27t3M0c3J4HsEW95cOuzRoF/OhtfZVHqg=; b=h9V6qzzxyMK5uejYLOVRpSsYiWyx9X+t2C1W9MgX5WUHUTAIAZE5+UH0pAOkBThV3Y 2FCBChkGuU9LEGcQjWe0vkPc0KPy6cqM6YFIsmsB4Mg5zp7jIBdoS23EepKsH0bUP0mP eDM3nwRzR6XHad9+OVdQ8CgVGyplIVTBo6LqN5bK/r7eq06HG1arFn47uYFMbQk0vn+P 7NuvkmH2ezaVZe5GBoshlqOn++d5DyuzztQ/S4/H/fHJo3WNzIWMYNuQSQ93d7bpeDrg oWBR2AhJVWeuAO3VXwDtjqLI81B21FGY+2/XhTY5Y2UCusDpnXfuyKS27J3tgJ6iLx8t wiqw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=Xe761iyR; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id e123si21037590pfa.252.2019.07.08.15.26.24; Mon, 08 Jul 2019 15:26:39 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=Xe761iyR; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732918AbfGHPRz (ORCPT + 99 others); Mon, 8 Jul 2019 11:17:55 -0400 Received: from mail.kernel.org ([198.145.29.99]:41466 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1732890AbfGHPRy (ORCPT ); Mon, 8 Jul 2019 11:17:54 -0400 Received: from localhost (83-86-89-107.cable.dynamic.v4.ziggo.nl [83.86.89.107]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 85EC82173E; Mon, 8 Jul 2019 15:17:52 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1562599073; bh=Pr10lBXTCXfhz8TBC39By1kx7n25cADHhVfMpU5SvRc=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=Xe761iyRfDaxUsGNY4OWEJtPSKZqfXG3OlSqg2GWUQd88lNCupDfC6J6YZEUNgcaT r+psYeFznkCQLOK8k1XTwqqpFlBhUoyFRW3KkRjppz0sVYyLyJUlnx6/E/jtUBdKAT goilTK2Xm5z7mtQLjwbCNs9/8ZyZ8fpZ6gcTUZ2s= From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, David Sterba , Nikolay Borisov Subject: [PATCH 4.4 68/73] btrfs: Ensure replaced device doesnt have pending chunk allocation Date: Mon, 8 Jul 2019 17:13:18 +0200 Message-Id: <20190708150524.847065903@linuxfoundation.org> X-Mailer: git-send-email 2.22.0 In-Reply-To: <20190708150513.136580595@linuxfoundation.org> References: <20190708150513.136580595@linuxfoundation.org> User-Agent: quilt/0.66 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Nikolay Borisov commit debd1c065d2037919a7da67baf55cc683fee09f0 upstream. Recent FITRIM work, namely bbbf7243d62d ("btrfs: combine device update operations during transaction commit") combined the way certain operations are recoded in a transaction. As a result an ASSERT was added in dev_replace_finish to ensure the new code works correctly. Unfortunately I got reports that it's possible to trigger the assert, meaning that during a device replace it's possible to have an unfinished chunk allocation on the source device. This is supposed to be prevented by the fact that a transaction is committed before finishing the replace oepration and alter acquiring the chunk mutex. This is not sufficient since by the time the transaction is committed and the chunk mutex acquired it's possible to allocate a chunk depending on the workload being executed on the replaced device. This bug has been present ever since device replace was introduced but there was never code which checks for it. The correct way to fix is to ensure that there is no pending device modification operation when the chunk mutex is acquire and if there is repeat transaction commit. Unfortunately it's not possible to just exclude the source device from btrfs_fs_devices::dev_alloc_list since this causes ENOSPC to be hit in transaction commit. Fixing that in another way would need to add special cases to handle the last writes and forbid new ones. The looped transaction fix is more obvious, and can be easily backported. The runtime of dev-replace is long so there's no noticeable delay caused by that. Reported-by: David Sterba Fixes: 391cd9df81ac ("Btrfs: fix unprotected alloc list insertion during the finishing procedure of replace") CC: stable@vger.kernel.org # 4.4+ Signed-off-by: Nikolay Borisov Reviewed-by: David Sterba Signed-off-by: David Sterba Signed-off-by: Greg Kroah-Hartman --- fs/btrfs/dev-replace.c | 29 +++++++++++++++++++---------- fs/btrfs/volumes.c | 2 ++ fs/btrfs/volumes.h | 5 +++++ 3 files changed, 26 insertions(+), 10 deletions(-) --- a/fs/btrfs/dev-replace.c +++ b/fs/btrfs/dev-replace.c @@ -495,18 +495,27 @@ static int btrfs_dev_replace_finishing(s } btrfs_wait_ordered_roots(root->fs_info, -1); - trans = btrfs_start_transaction(root, 0); - if (IS_ERR(trans)) { - mutex_unlock(&dev_replace->lock_finishing_cancel_unmount); - return PTR_ERR(trans); + while (1) { + trans = btrfs_start_transaction(root, 0); + if (IS_ERR(trans)) { + mutex_unlock(&dev_replace->lock_finishing_cancel_unmount); + return PTR_ERR(trans); + } + ret = btrfs_commit_transaction(trans, root); + WARN_ON(ret); + mutex_lock(&uuid_mutex); + /* keep away write_all_supers() during the finishing procedure */ + mutex_lock(&root->fs_info->fs_devices->device_list_mutex); + mutex_lock(&root->fs_info->chunk_mutex); + if (src_device->has_pending_chunks) { + mutex_unlock(&root->fs_info->chunk_mutex); + mutex_unlock(&root->fs_info->fs_devices->device_list_mutex); + mutex_unlock(&uuid_mutex); + } else { + break; + } } - ret = btrfs_commit_transaction(trans, root); - WARN_ON(ret); - mutex_lock(&uuid_mutex); - /* keep away write_all_supers() during the finishing procedure */ - mutex_lock(&root->fs_info->fs_devices->device_list_mutex); - mutex_lock(&root->fs_info->chunk_mutex); btrfs_dev_replace_lock(dev_replace); dev_replace->replace_state = scrub_ret ? BTRFS_IOCTL_DEV_REPLACE_STATE_CANCELED --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -4760,6 +4760,7 @@ static int __btrfs_alloc_chunk(struct bt for (i = 0; i < map->num_stripes; i++) { num_bytes = map->stripes[i].dev->bytes_used + stripe_size; btrfs_device_set_bytes_used(map->stripes[i].dev, num_bytes); + map->stripes[i].dev->has_pending_chunks = true; } spin_lock(&extent_root->fs_info->free_chunk_lock); @@ -7064,6 +7065,7 @@ void btrfs_update_commit_device_bytes_us for (i = 0; i < map->num_stripes; i++) { dev = map->stripes[i].dev; dev->commit_bytes_used = dev->bytes_used; + dev->has_pending_chunks = false; } } unlock_chunks(root); --- a/fs/btrfs/volumes.h +++ b/fs/btrfs/volumes.h @@ -62,6 +62,11 @@ struct btrfs_device { spinlock_t io_lock ____cacheline_aligned; int running_pending; + /* When true means this device has pending chunk alloc in + * current transaction. Protected by chunk_mutex. + */ + bool has_pending_chunks; + /* regular prio bios */ struct btrfs_pending_bios pending_bios; /* WRITE_SYNC bios */