Received: by 2002:a05:6a10:f347:0:0:0:0 with SMTP id d7csp9104806pxu; Mon, 28 Dec 2020 06:45:31 -0800 (PST) X-Google-Smtp-Source: ABdhPJwOD7AUvO3bdQa4xt+5AslmTUIQPwBHAW1GEHfEtO2PJ3QpqbslmZxKVN97WQUXb97TOBjG X-Received: by 2002:a17:906:780c:: with SMTP id u12mr16149094ejm.125.1609166731460; Mon, 28 Dec 2020 06:45:31 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1609166731; cv=none; d=google.com; s=arc-20160816; b=o+TtPecZMUCAGCPiZJXWMwZw56hPsLLr71s1+L+MVfsx5WdZsskxORcr8+oCNufVBS xWQXThimYttNRQyyppS9iRZ/kSZ6gmOvc+nPCPlZPg8KVDMufsronMz4zBEOxcQOrT9Z k8+9SlthPmQk2YVHfejxCHsikiuV1Mbtp9gGTwok8eWgxtSUdkdKGtBD1YuZCV/iSqZa R6aFjzlzpDYQxx+G3/cFusjAnYwN2kHqUThIdqXSct97u+0fjlf7t6ykkyo7tZW5zHLx nKvj+qtF0g2fzNqivJQmyIEJk5TVJp0jHdtnuqdKB17VHuXRo0k97uYM5VffZKrPs6yJ Aj+Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature; bh=yH9WrApKXMG14w9WcO3EQLke55RHD9WrkotO1TKDdG0=; b=Fdmby/ku9oyjpN7hnXSO73k8eEQTE+njPMC2+nWglmOafHPStwg3ooGXfWiSYVXeTp MiOjeg49uZ6fXBPYCh+4NGZus4gQhBA2SnR2RZ2VyyBuzwpzfdW7PWBbMiujsguxWnDY g+fF2BvJaufB2WB6kWZDZY+HEwzuvvFkBUdYP3eZYseljQYFBPdBNFDv2o1qCdvz2hxk HuimGy/wcdCtvkAJcZAuHuTsk5oKxm75f3IG2+5G0pw6y27kfObFDpP4TRofYtLwXjZ0 C7FnyApQeMOMij1m26RSPqLl7+bJwZrIBR8C4X3LMEHqwm4oNmZ/uQpUAzetYTQ6jEUB 280w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linuxfoundation.org header.s=korg header.b=R2ZH0bu8; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linuxfoundation.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id bm3si19206382edb.188.2020.12.28.06.45.06; Mon, 28 Dec 2020 06:45:31 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@linuxfoundation.org header.s=korg header.b=R2ZH0bu8; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linuxfoundation.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2502497AbgL1O1t (ORCPT + 99 others); Mon, 28 Dec 2020 09:27:49 -0500 Received: from mail.kernel.org ([198.145.29.99]:35134 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2502253AbgL1O1U (ORCPT ); Mon, 28 Dec 2020 09:27:20 -0500 Received: by mail.kernel.org (Postfix) with ESMTPSA id 4DBB622B2C; Mon, 28 Dec 2020 14:26:39 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linuxfoundation.org; s=korg; t=1609165600; bh=0/3qRoxR9KL2kG9m6r72SRIejD0G7iMdTuCEwu+ey0s=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=R2ZH0bu8fPDm7Sd4/ekuWnrsFFDHgk/PMINg3FUbcr5lB/V7CFrOqDeyr7gxsFabt 3jh14zyOCg1KhDhx39GEtSljjAGkfENIpzwfRNCJ0kzLrn6KSrpuGJUBW3TybI5V5a XwL5IY4i+qI1oiuGc/O48BH/GYDXYhq9rlkgCuqI= From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Josef Bacik , Filipe Manana , David Sterba Subject: [PATCH 5.10 589/717] btrfs: fix race when defragmenting leads to unnecessary IO Date: Mon, 28 Dec 2020 13:49:47 +0100 Message-Id: <20201228125049.127499593@linuxfoundation.org> X-Mailer: git-send-email 2.29.2 In-Reply-To: <20201228125020.963311703@linuxfoundation.org> References: <20201228125020.963311703@linuxfoundation.org> User-Agent: quilt/0.66 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Filipe Manana commit 7f458a3873ae94efe1f37c8b96c97e7298769e98 upstream. When defragmenting we skip ranges that have holes or inline extents, so that we don't do unnecessary IO and waste space. We do this check when calling should_defrag_range() at btrfs_defrag_file(). However we do it without holding the inode's lock. The reason we do it like this is to avoid blocking other tasks for too long, that possibly want to operate on other file ranges, since after the call to should_defrag_range() and before locking the inode, we trigger a synchronous page cache readahead. However before we were able to lock the inode, some other task might have punched a hole in our range, or we may now have an inline extent there, in which case we should not set the range for defrag anymore since that would cause unnecessary IO and make us waste space (i.e. allocating extents to contain zeros for a hole). So after we locked the inode and the range in the iotree, check again if we have holes or an inline extent, and if we do, just skip the range. I hit this while testing my next patch that fixes races when updating an inode's number of bytes (subject "btrfs: update the number of bytes used by an inode atomically"), and it depends on this change in order to work correctly. Alternatively I could rework that other patch to detect holes and flag their range with the 'new delalloc' bit, but this itself fixes an efficiency problem due a race that from a functional point of view is not harmful (it could be triggered with btrfs/062 from fstests). CC: stable@vger.kernel.org # 5.4+ Reviewed-by: Josef Bacik Signed-off-by: Filipe Manana Signed-off-by: David Sterba Signed-off-by: Greg Kroah-Hartman --- fs/btrfs/ioctl.c | 39 +++++++++++++++++++++++++++++++++++++++ 1 file changed, 39 insertions(+) --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -1275,6 +1275,7 @@ static int cluster_pages_for_defrag(stru u64 page_end; u64 page_cnt; u64 start = (u64)start_index << PAGE_SHIFT; + u64 search_start; int ret; int i; int i_done; @@ -1371,6 +1372,40 @@ again: lock_extent_bits(&BTRFS_I(inode)->io_tree, page_start, page_end - 1, &cached_state); + + /* + * When defragmenting we skip ranges that have holes or inline extents, + * (check should_defrag_range()), to avoid unnecessary IO and wasting + * space. At btrfs_defrag_file(), we check if a range should be defragged + * before locking the inode and then, if it should, we trigger a sync + * page cache readahead - we lock the inode only after that to avoid + * blocking for too long other tasks that possibly want to operate on + * other file ranges. But before we were able to get the inode lock, + * some other task may have punched a hole in the range, or we may have + * now an inline extent, in which case we should not defrag. So check + * for that here, where we have the inode and the range locked, and bail + * out if that happened. + */ + search_start = page_start; + while (search_start < page_end) { + struct extent_map *em; + + em = btrfs_get_extent(BTRFS_I(inode), NULL, 0, search_start, + page_end - search_start); + if (IS_ERR(em)) { + ret = PTR_ERR(em); + goto out_unlock_range; + } + if (em->block_start >= EXTENT_MAP_LAST_BYTE) { + free_extent_map(em); + /* Ok, 0 means we did not defrag anything */ + ret = 0; + goto out_unlock_range; + } + search_start = extent_map_end(em); + free_extent_map(em); + } + clear_extent_bit(&BTRFS_I(inode)->io_tree, page_start, page_end - 1, EXTENT_DELALLOC | EXTENT_DO_ACCOUNTING | EXTENT_DEFRAG, 0, 0, &cached_state); @@ -1401,6 +1436,10 @@ again: btrfs_delalloc_release_extents(BTRFS_I(inode), page_cnt << PAGE_SHIFT); extent_changeset_free(data_reserved); return i_done; + +out_unlock_range: + unlock_extent_cached(&BTRFS_I(inode)->io_tree, + page_start, page_end - 1, &cached_state); out: for (i = 0; i < i_done; i++) { unlock_page(pages[i]);