Received: by 2002:a25:1985:0:0:0:0:0 with SMTP id 127csp3870851ybz; Mon, 4 May 2020 11:13:55 -0700 (PDT) X-Google-Smtp-Source: APiQypIXot1N+AVt6snNBeaXcvKKEYairbBmUv+oM8T9acLxQVNqt8rxKijaTOFSBelyCyM6U0GQ X-Received: by 2002:a17:906:1502:: with SMTP id b2mr16269148ejd.359.1588616035676; Mon, 04 May 2020 11:13:55 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1588616035; cv=none; d=google.com; s=arc-20160816; b=m+HYgkN0pTuFXz86TJ1HJDPhMNMBakKZ6vkFLMdwuWik166u+EC6Lp3GwESzrG0fs7 tZPTa1QTGMtnlOjoN1IRQ8FnlmV2QDi2M0nyFbNsZOfkP31xWZ2f3obVGjzUPKYxKxTH gdQu5YaTJZ4jKrsvcRn/nYNWLecf+o7bHNAm0jH8ET7Gk+a3s5oEbDjzePCvhXxYQi2w hqB5YRalssCcr0+/IG/WuU1SjxGqwXFhtwp0rs4Y30+t0CVpTsdiHBwX3kInmd21uZWP +U6XM61Ei1oQhuYqZEXhBAD/DUyEuMb0dtAqNW7DmwNtOD/hkeB1SzLz+T2f3m/mvKr7 iFeA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature; bh=lS089pgvXSCoNvRrO96siwf029Tjrqx3Igz3/ekuhGU=; b=whopDcftKNIWC3wDaKDerkUh5Na8LobWj3nFYtO/7E9uI9vwU3ZRK32Vc25rGmfwsR ycL9U2Wwg5ltQ7TUHHFGcMruLMZBtkOlSeIk4u3GA0ZtWSsjDdwNUNnCQUkSo8zz65/G u+KtZe12OYxVcDkxgLerkrKeoWtuBiPfPTtEUOFAUFSAxY+NBNHcH65P0D6l3R3EvnhT qnZQIsYgSFefBazXdf9+PaC3VgtIPQiNvmbv+4LTP5oABVNEOcd61WtWfiic0Eo4rcZl Erwr1N5W4B+ukgxRTOevND/HBdW1I35RX75+rZD1fuerpNCdh3rHfm9hdvOwQIiZLZf6 bfLQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b="xBf/LGTy"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id r20si8317355edv.435.2020.05.04.11.13.32; Mon, 04 May 2020 11:13:55 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b="xBf/LGTy"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732332AbgEDSKd (ORCPT + 99 others); Mon, 4 May 2020 14:10:33 -0400 Received: from mail.kernel.org ([198.145.29.99]:35054 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1731031AbgEDSFT (ORCPT ); Mon, 4 May 2020 14:05:19 -0400 Received: from localhost (83-86-89-107.cable.dynamic.v4.ziggo.nl [83.86.89.107]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 76248206B8; Mon, 4 May 2020 18:05:18 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1588615518; bh=F12sOzFhwshjn4Oeh291QGzCcqyC0yJAeCyNKWEJfK4=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=xBf/LGTy50kQ8ERbnC/9cnLUdVOhDYsAA8aYSfmnwOcaK79z06a/H3wOsXfAvAIIC EEXC6CMHyxc3UdTxh1jpi815ccW6M6Oob/Bmo1byy4z9sr4MDRVLfIzUpvTxV6RL4Q OhLdtrvgmtORrhP7MV3o6rSsB6OWDvIkE8TUb6bE= From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Filipe Manana , David Sterba Subject: [PATCH 5.6 14/73] btrfs: fix partial loss of prealloc extent past i_size after fsync Date: Mon, 4 May 2020 19:57:17 +0200 Message-Id: <20200504165504.577657162@linuxfoundation.org> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20200504165501.781878940@linuxfoundation.org> References: <20200504165501.781878940@linuxfoundation.org> User-Agent: quilt/0.66 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Filipe Manana commit f135cea30de5f74d5bfb5116682073841fb4af8f upstream. When we have an inode with a prealloc extent that starts at an offset lower than the i_size and there is another prealloc extent that starts at an offset beyond i_size, we can end up losing part of the first prealloc extent (the part that starts at i_size) and have an implicit hole if we fsync the file and then have a power failure. Consider the following example with comments explaining how and why it happens. $ mkfs.btrfs -f /dev/sdb $ mount /dev/sdb /mnt # Create our test file with 2 consecutive prealloc extents, each with a # size of 128Kb, and covering the range from 0 to 256Kb, with a file # size of 0. $ xfs_io -f -c "falloc -k 0 128K" /mnt/foo $ xfs_io -c "falloc -k 128K 128K" /mnt/foo # Fsync the file to record both extents in the log tree. $ xfs_io -c "fsync" /mnt/foo # Now do a redudant extent allocation for the range from 0 to 64Kb. # This will merely increase the file size from 0 to 64Kb. Instead we # could also do a truncate to set the file size to 64Kb. $ xfs_io -c "falloc 0 64K" /mnt/foo # Fsync the file, so we update the inode item in the log tree with the # new file size (64Kb). This also ends up setting the number of bytes # for the first prealloc extent to 64Kb. This is done by the truncation # at btrfs_log_prealloc_extents(). # This means that if a power failure happens after this, a write into # the file range 64Kb to 128Kb will not use the prealloc extent and # will result in allocation of a new extent. $ xfs_io -c "fsync" /mnt/foo # Now set the file size to 256K with a truncate and then fsync the file. # Since no changes happened to the extents, the fsync only updates the # i_size in the inode item at the log tree. This results in an implicit # hole for the file range from 64Kb to 128Kb, something which fsck will # complain when not using the NO_HOLES feature if we replay the log # after a power failure. $ xfs_io -c "truncate 256K" -c "fsync" /mnt/foo So instead of always truncating the log to the inode's current i_size at btrfs_log_prealloc_extents(), check first if there's a prealloc extent that starts at an offset lower than the i_size and with a length that crosses the i_size - if there is one, just make sure we truncate to a size that corresponds to the end offset of that prealloc extent, so that we don't lose the part of that extent that starts at i_size if a power failure happens. A test case for fstests follows soon. Fixes: 31d11b83b96f ("Btrfs: fix duplicate extents after fsync of file with prealloc extents") CC: stable@vger.kernel.org # 4.14+ Signed-off-by: Filipe Manana Signed-off-by: David Sterba Signed-off-by: Greg Kroah-Hartman --- fs/btrfs/tree-log.c | 43 ++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 40 insertions(+), 3 deletions(-) --- a/fs/btrfs/tree-log.c +++ b/fs/btrfs/tree-log.c @@ -4211,6 +4211,9 @@ static int btrfs_log_prealloc_extents(st const u64 ino = btrfs_ino(inode); struct btrfs_path *dst_path = NULL; bool dropped_extents = false; + u64 truncate_offset = i_size; + struct extent_buffer *leaf; + int slot; int ins_nr = 0; int start_slot; int ret; @@ -4225,9 +4228,43 @@ static int btrfs_log_prealloc_extents(st if (ret < 0) goto out; + /* + * We must check if there is a prealloc extent that starts before the + * i_size and crosses the i_size boundary. This is to ensure later we + * truncate down to the end of that extent and not to the i_size, as + * otherwise we end up losing part of the prealloc extent after a log + * replay and with an implicit hole if there is another prealloc extent + * that starts at an offset beyond i_size. + */ + ret = btrfs_previous_item(root, path, ino, BTRFS_EXTENT_DATA_KEY); + if (ret < 0) + goto out; + + if (ret == 0) { + struct btrfs_file_extent_item *ei; + + leaf = path->nodes[0]; + slot = path->slots[0]; + ei = btrfs_item_ptr(leaf, slot, struct btrfs_file_extent_item); + + if (btrfs_file_extent_type(leaf, ei) == + BTRFS_FILE_EXTENT_PREALLOC) { + u64 extent_end; + + btrfs_item_key_to_cpu(leaf, &key, slot); + extent_end = key.offset + + btrfs_file_extent_num_bytes(leaf, ei); + + if (extent_end > i_size) + truncate_offset = extent_end; + } + } else { + ret = 0; + } + while (true) { - struct extent_buffer *leaf = path->nodes[0]; - int slot = path->slots[0]; + leaf = path->nodes[0]; + slot = path->slots[0]; if (slot >= btrfs_header_nritems(leaf)) { if (ins_nr > 0) { @@ -4265,7 +4302,7 @@ static int btrfs_log_prealloc_extents(st ret = btrfs_truncate_inode_items(trans, root->log_root, &inode->vfs_inode, - i_size, + truncate_offset, BTRFS_EXTENT_DATA_KEY); } while (ret == -EAGAIN); if (ret)